Transcoding Verification Improvements: Fast & Full Verification

Motivation

One of the most important research areas for Livepeer is the verification of work on the network. The verification solutions used in the network have evolved in the past few years:

  • The original whitepaper proposed the use of Truebit, an off-chain computation protocol, to report whether a CPU based deterministic video transcoding task was executed correctly and the result would be used to determine whether to slash stake
  • Initially, the network used trusted “solver nodes” to report whether a CPU deterministic video transcoding task was executed correctly as a temporary trusted solution before migrating to a trust minimized solution
  • The Streamflow network upgrade introduced GPU transcoding which dramatically accelerated the capacity and performance potential of the network by allowing cryptocurrency miners to leverage the chips on their existing GPUs for transcoding. However, while there are ways to force CPU video transcoding to be deterministic (with some performance tradeoffs), it became clear that trying to force determinism with GPU video transcoding would be impractical given the lack of available configuration knobs and the diversity of hardware implementations
  • As of the Streamflow network upgrade, slashing was disabled in favor for real-time statistical verification that can be performed by broadcasters using a machine learning model powered verifier. The machine learning model was trained to detect tampered videos and a research paper on the model was published. A broadcaster would sample and verify transcoded results returned by orchestrators in real-time and stop working with orchestrators that trigger verification failures

At the moment, broadcasters can use the machine learning verifier mentioned above, but it has a number of shortcomings:

  • Increased computation burden for broadcasters because transcoded results need to be decoded (but not re-encoded) to run verification
  • Increased latency for streams because there is a delay before results are usable by video players while they are being verified
  • While individual broadcasters are able to use the verifier and filter out potentially malicious orchestrators, there is no way for misbehavior to be made transparent to the rest of the network which would protect other broadcasters and would increase the cost incurred by a misbehaving orchestrator

This post outlines a roadmap for making improvements to transcoding verification on the network through a combination of a node software upgrade called “fast verification” and a protocol upgrade called “full verification”.

Fast Verification

Fast verification improves on the current machine learning verifier approach by pushing any additional computation away from broadcasters that are typically compute resource constrained to orchestrators that typically have substantially more compute resources available via the hardware that they’re already using for transcoding.

The following algorithms are used:

  • A perceptual hash algorithm f that returns a fingerprint for a video segment
  • A collision detection algorithm d that returns a boolean indicating whether two fingerprints are sufficiently similar to be considered fingerprints for the same video segment

A broadcaster will maintain an internal list of trusted and untrusted orchestrators. A broadcaster may consider an orchestrator as trusted based on reputation measured through a combination of historical performance on the network and community interaction. A broadcaster would consider an orchestrator as untrusted if it has never interacted with the orchestrator before and has no reputation providing information for the orchestrator. Over the course of using the network, a broadcaster can update the reputation scores for orchestrators in order to update its list of trusted and untrusted orchestrators.

During a stream, a broadcaster will run fast verification on certain sampled segments using the following process:

  • A segment will be sent to multiple orchestrators, one trusted orchestrator and N untrusted orchestrators, to be transcoded
  • Orchestrators are required to return the perceptual hash of each of the transcoded renditions for a segment alongside the renditions themselves
  • The perceptual hashes of from each of the untrusted orchestrators are compared with the perceptual hashes returned from the trusted orchestrator using the collision detection algorithm
  • If an untrusted orchestrator either does not return a response within a timeout or returns perceptual hashes that do not collide with those of the trusted orchestrator, it is discarded

If there is at least one untrusted orchestrator that passed fast verification, the broadcaster will choose to use results from one of the untrusted orchestrators. Otherwise, the broadcaster will fallback to the trusted orchestrator. This mechanism allows the broadcaster to progressively interact with more and more untrusted orchestrators while using trusted orchestrators it is already aware of to run fast verification.

Any untrusted orchestrators that fail fast verification will be subject to local ban policies that a broadcaster can use to filter out the orchestrators during selection.

Full Verification

Full verification complements fast verification by re-introducing in-protocol slashing/economic penalties for orchestrator misbehavior.

The following algorithms are used:

  • A verification algorithm v that accepts a source segment and a transcoded rendition and returns true if the transcoded rendition is valid relative to the source and false if the transcoded rendition is tampered relative to the source
    • v requires the full decoding of both the source segment and transcoded rendition
    • A natural initial candidate for v is to use the perceptual hash algorithm for fast verification, compute the hashes for both the source segment and transcoded rendition and then pass the hashes to the collision detection algorithm

A broadcaster is configured with the following parameters:

  • fullVerifierList . A list of verifiers that support executing v . Initially, these verifiers can be the same entities on the trusted orchestrator list used for fast verification
  • fullVerificationFrequency . The % of segments that the broadcaster will run full verification for. Ex. 1% would result in 1 out of every 100 segments being fully verified which means running full verification roughly once every ~3 minutes (given 2 second segments)

Whenever full verification is required based on fullVerificationFrequency , the broadcaster will send a full verification request with the source segment and a transcoded rendition from an orchestrator response to a verifier from fullVerifierList. The full verification request is asynchronous meaning that the broadcaster immediately inserts transcoded renditions from the response into the playlist without waiting for full verification to complete.

If full verification fails, the broadcaster will save the following pieces of data:

  • The source segment
  • The transcoded rendition
  • The orchestrator’s digital signature bound to the source segment and transcoded rendition

The purpose of full verification is not to detect tampered video in real-time, but rather to asynchronously detect tampered video and collect cryptographically binding evidence that can be used for dispute resolution.

Dispute Resolution

An on-chain dispute resolution mechanism is defined to allow a broadcaster to present cryptographically binding evidence of tampering in order to economic penalize the offending orchestrator.

A broadcaster should only raise a dispute if they are confident that they will win the dispute. This can be achieved by establishing an operational process that involves manually reviewing transcoded renditions that have been flagged by full verification during a certain time period (similar to how certain automated content moderation systems for certain UGC platforms work). Each transcoded rendition can be eligible for disputes for a fixed time period i.e. 2 weeks giving broadcasters ample time to manually review flagged data. If there is manual confirmation that a transcoded rendition is tampered, the broadcaster or anyone else acting on the broadcaster’s behalf can then raise a dispute on-chain by staking disputeStake LPT and submitting the source segment, transcoded rendition and the orchestrator’s signature bound to these two pieces of data (in practice, of course the video data would not be posted on-chain - details on the actual approach are left out for now). The expectation would be that during the dispute window, the arbitrator(s) would inspect evidence presented for a dispute and rule in either the broadcaster or orchestrator’s favor.

For additional flexibility, broadcasters and orchestrators can agree on the arbitrator that should be used in the event that dispute resolution is required. Orchestrators can configure the arbitrators that they support either on-chain or off-chain (probably on-chain to provide visibility to their delegators) and broadcaster can select orchestrators that support their preferred arbitrators. In practice, at the very start, there will only be a single arbitrator implementation, but this architecture would provide flexibility for additional parallel arbitrator implementations to co-exist allowing broadcasters and orchestrators to choose the one that is most well suited for their own preferences. Additionally, this would allow different arbitrator implementations to be used for different video tasks beyond video transcoding such as AI inference tasks.

If the arbitrator rules in the broadcaster’s favor, the orchestrator would be economically penalized. Potential implementations of this penalty include:

  • Slashing the orchestrator’s self-delegated stake
  • Slashing all of the orchestrator’s delegated stake
  • Jailing the orchestrator by freezing all of its stake for a period of time and excluding it from rewards while it is jailed

If the arbitrator rules in the orchestrator’s favor, the broadcaster’s stake would be seized and could be burned or sent to a protocol treasury.

Candidate arbitrator implementations include:

  • A multisig operated by a governance elected council
  • A DAO governed by network stakeholders
  • A decentralized court system such as Aragon Court or Kleros that are based on “schelling coin games” where jurors are incentivized to resolve a dispute based on what they believe other jurors think. Tamper detection may actually be a good use case for schelling coin games since the tampers should be immediately obvious to the human eye (if they aren’t then it is either questionable whether a tamper actually occurred or the definition of tampering used for dispute needs to be clarified). If the tampers are immediately obvious to the human eye then that is a clear schelling point i.e. common knowledge that jurors can use to make assumptions on how other jurors will rule
  • An escalation game system such as reality.eth that allows resolutions to be challenged by paying higher and higher fees until the final arbitration is invoked which could actually be a system like Aragon Court or Kleros - for example, Gnosis’ Omen prediction market uses Kleros as the final arbitrator. Alternatively, the final arbitration implementation could be based on a system exclusive to LPT holders. The expectation is that in most cases, final arbitration is not needed, but its mere presence in the mechanism is an incentive for people to tell the truth so that it is never invoked
  • A cryptographic system for proving that a video was transformed in certain permissible ways - see PhotoProof for a precedent with authenticating image transformations

Conclusion

  • Fast verification can provide a real-time verification mechanism with security guarantees that can be good enough for certain use cases
  • Full verification with dispute resolution complement fast verification by serving as the last line of defense - if an attacker bypasses fast verification, the threat of being caught by full verification and being economically penalized during dispute resolution raises the cost of attack

Next Steps

  • Deploy fast verification
    • Publish a new version of go-livepeer with orchestrator support for computing MPEG-7 video signatures, the perceptual hash algorithm that will be used for the first version of fast verification
    • Begin to use fast verification for livepeer.com API broadcasters and collect data on performance
  • Complete research for full verification
    • Define an initial arbitrator implementation
    • Define the initial penalty implementation
    • Create a pre-proposal with the initial specs for a full verification protocol upgrade
    • Get community feedback on the pre-proposal
4 Likes

Great work @yondon!

Some questions as I was reading:

Fast verification

  1. What if the orchestrator only periodically sends back bad segments? Would we have to wait until the Full Verification process in order to “catch” the misbehaving Orchestrator?

  2. Would the fast verification process fall victim to the chance of an Orchestrator sending back simi correct segments? e.g. there is tearing or blank pixels from the transcoded results?

Full verification

  1. Again coming back to the tearing or blank pixels, to what degree can the amount of pixels be different before it is considered a problem? And is this situation considered malicous?

  2. Which Orchestrator initially gets chosen for the full verification process? And could this benefit the Fully Verified Orchestrator in the future based on this selection?

Forgive me if I misunderstood any parts of the process.

And thank you for all the hard work put into this. Very impressive stuff!

1 Like
  1. What if the orchestrator only periodically sends back bad segments? Would we have to wait until the Full Verification process in order to “catch” the misbehaving Orchestrator?

I think this can be addressed with a few things:

  • The broadcaster will randomly sample segments for fast verification based on a desired frequency so an orchestrator will be unable to predict which segments will be used in fast verification
  • The broadcaster has the option of increasing the frequency at which fast verification is run
  • If fast verification happens to miss a bad segment and that segment is actually played back somewhere for a viewer such that it meaningfully hurts the viewer experience, the broadcaster is able to determine which orchestrator produced the bad segment and this can be proven to anybody else via the orchestrator’s signature. A broadcaster could record a certain portion of the segments, along with orchestrator signatures, of a live stream to retain the ability to inspect, identify and extract evidence of a bad segment. More on what this evidence can be used for below.

Would we have to wait until the Full Verification process in order to “catch” the misbehaving Orchestrator?

While the dispute resolution portion of full verification would establish in-protocol penalties for misbehaving orchestrators, there is an opportunity to use extra-protocol penalties in the interim before dispute resolution is deployed. Deploying the non-dispute resolution portion of full verification in the node software would help with this as well.

As mentioned earlier, a broadcaster still has access to evidence of bad segments that is directly tied to orchestrators via signatures. So, a broadcaster could post the video data to a storage network like IPFS/Filecoin, add the evidence to a publicly accessible audit log and notify the community in a communication channel such as the forum and/or Discord. Anyone would be able to inspect the audit log to determine whether evidence is valid (i.e. was there really a bad segment). The log can regularly inspected to ensure that it can be trusted and the orchestrators flagged by the log can be penalized by:

  • Automatically adding the orchestrators to a block list on broadcasters
  • Removing the orchestrators from the explorer and widely adopted UIs

Ultimately, on-chain dispute resolution should be a more effective deterrent than the above options on their own, but the above options could complement dispute resolution post-deployment and establish initial penalties pre-deployment.

  1. Again coming back to the tearing or blank pixels, to what degree can the amount of pixels be different before it is considered a problem? And is this situation considered malicous?

The most important considerations are:

  • Were the requested transformations (i.e. frame scaling, frame rate adjustment) applied properly?
  • Does the transcoded video faithfully represent the source video content?

The first is easily checkable. The second is trickier.

While in certain cases, a certain number of differing pixels between two videos may indicate different content, this is not always the case. The goal is to determine whether the human eye would consider two videos as having the same visual content. For now, quality is not considered and the focus is instead on content. In the future, additional checks can be put into place to require a certain quality level as well in addition to requiring the transcoded segment to faithfully represent the source segment content.

  1. Which Orchestrator initially gets chosen for the full verification process?

This is a decision made by the broadcaster operator. They could identify a set of orchestrators to seed their list with based on a combination of factors including: personal interaction, historical performance data, community reputation, etc. Or in some cases, the broadcaster operator may run orchestrators themselves - they may not have access to sufficient hardware/bandwidth resources to do the transcoding themselves at scale, but they may have enough to initially assist with full verification.

And could this benefit the Fully Verified Orchestrator in the future based on this selection?

This orchestrator could benefit in that full verification is a service that it is still being paid for so if the orchestrator is being selected for full verification and transcoding then there are more opportunities for the orchestrator to earn. With that being said, full verification is less time sensitive than transcoding so an orchestrator that would consistently be selected for full verification may not be consistently selected for transcoding.

2 Likes

Would this be something we would want a broadcaster to be responsible for? I assume a broadcaster wants reliable and fast transcoded data and not the responsibility of verifying any of the data. Similar to an end user of current web2 products. Or are broadcasters supposed to be middle men similar to how Livepeer Inc works where the end customer does not actually have to take part in the workings of the Livepeer protocol but rather deal with a broker?

Has this type of behaviour been reported in Livepeer before? Or is this adding tools for future speculation of these challenges?

Again, thank you @yondon for this amazing work!
These are exceptionally hard challenges to solve especially in the emerging context of Web3.

1 Like

Would this be something we would want a broadcaster to be responsible for? I assume a broadcaster wants reliable and fast transcoded data and not the responsibility of verifying any of the data.

Agreed that the responsibility of recording and verifying data is not ideal. In the near term, services such as livepeer.com that operate clusters of broadcasters could take on this responsibility in exchange for greater security guarantees with the expectation that a combination of fast and full verification will automate the detection and flagging of any problematic data with just a small amount of potential manual review. I expect that the algorithms used for fast and full verification can/will also be improved over time (i.e. increased accuracy and decreased computational requirement) which would further reduce the amount of potential manual review (perhaps eventually even removing the need altogether).

An alternative could be for the broadcaster to outsource the responsibility of reporting evidence to a public audit logs and/or an on-chain dispute resolution mechanism to a third party. Additionally, the broadcaster could make any recorded data available to the third party which could earn a reward for reporting evidence for dispute resolution removing manual review responsibilities from the broadcaster operator.

Or are broadcasters supposed to be middle men similar to how Livepeer Inc works where the end customer does not actually have to take part in the workings of the Livepeer protocol but rather deal with a broker?

A few ways for operating broadcasters that I see:

  • As a part of a hosted service/ingest gateway like livepeer.com where the operator and the streamer are different entities. Note that geographically distributed ingest gateways are actually a requirement for building a video application with a global streamer base which is how livepeer.com is built. In the future, I’d love to see more independent ingest gateways being run by operators around the world resulting in a more decentralized set of ingest gateways that streamers could have access to
  • As a standalone node where the operator and the streamer are the same entity

There may be more ways. I would expect the way the broadcasters are run to differ depending on the use case.

Has this type of behaviour been reported in Livepeer before? Or is this adding tools for future speculation of these challenges?

This type of behavior has not been reported before as far as I know. The intent of these updates are to make improvements to how verification works today as described in the OP in order to provide developers on the demand side of the network with better security guarantees while minimizing performance tradeoffs when they integrate the network for their applications.

1 Like

Looking forward to this update!

What happens if an untrusted (or the trusted) orchestrator is capped? Ideally this should not result in a ban. Maybe a certain amount of sessions should always be reserved for verification purposes?

Just an idea, not sure if feasible: It would be nice if we could leverage the (decentralized) orchestrator network for this. So that the Os check each other for malicious behaviour.
Orchestrators that perform work already have access to source segments and the transcoded result - so what if they send the same source segment to other Os and check the returned result?
Of course there is the problem of payment, maybe the sent ticket can be marked as “verification” somehow to indicate that it’s not eligible for payment?
And also the results of those test would need to be aggregated somehow/somewhere - or maybe they can be queried by the Broadcaster during the selection process.

Does the verification happen on the GPU and how resource intense is the process? How much (V)RAM should an O leave available for verification?

1 Like

Does the verification happen on the GPU and how resource intense is the process? How much (V)RAM should an O leave available for verification?

Just to clarify - the machine learning verifier mentioned in the quote can currently be used by a broadcaster and it runs on a CPU, but it is not a part of the fast and full verification design.

The fast verification process described in the OP relies on an orchestrator computing perceptual hashes (the upcoming node software release will use MPEG-7 video signatures) for the transcoded results that it returns to a broadcaster. If the orchestrator is transcoding on GPU(s), this additional computation will also run on the GPU(s). The actual computation itself should be relatively lightweight (at least compared to video decoding and encoding) so while we expect GPU utilization to increase we don’t think the increase will be substantial. Additionally, the computation will be run directly on video frames that are already decoded by the NVDEC chip on a GPU which should result in little to no additional VRAM usage. Based on the latest benchmarks, we do not expect the session capacity of orchestrators to be meaningfully impacted by the introduction of this additional computation. With that being said, we’ll definitely be looking to do more benchmarks and testing after the node software release is rolled out.

3 Likes

What happens if an untrusted (or the trusted) orchestrator is capped? Ideally this should not result in a ban. Maybe a certain amount of sessions should always be reserved for verification purposes?

If the orchestrator is capped, does not return a response within a timeout or generally returns an error, the broadcaster should failover to another orchestrator without applying a local ban to that orchestrator. The broadcaster would only consider an orchestrator for a local ban if it fails the fast verification check i.e. if the orchestrator’s perceptual hash collides with the perceptual hash returned by the trusted orchestrator. The local ban policy should also consider the frequency of the failures to account for the small possibility of occasional false negatives (i.e. no collision even though the perceptual hashes correspond to videos with the same content) to avoid banning an orchestrator for a false negative.

Just an idea, not sure if feasible: It would be nice if we could leverage the (decentralized) orchestrator network for this.

Interesting idea. I think this is similar to the current full verification design except the working orchestrator is responsible for sending a video pair (i.e. source segment + transcoded results) to additional orchestrators instead of the broadcaster. One concern that I have with this approach is that if the verifying orchestrators do not return verification results to the broadcaster, the broadcaster would be unable to distinguish between the scenario where the working orchestrator withheld the video pair from the verifying orchestrators vs. the scenario where the verifying orchestrators actually did not return verification results. The benefit of having the broadcaster send the video pair to the verifying orchestrators directly is that if the verifying orchestrator does not respond the broadcaster can uniquely attribute the issue to the verifying orchestrator and just select another verifying orchestrator to take its place.

Also, FWIW, while the current full verification design describes using the same trusted orchestrator list that would be used for fast verification, the intent is to allow the trusted orchestrator list to expand over time as a broadcaster interacts with additional orchestrators allowing the broadcaster to utilize more of the network of orchestrators for this function over time.

3 Likes

How sensitive is the collision detection algorithm comparing the result of the perceptual hash? Could the output from different generation NVENCs be enough to cause a failed verification? I imagine that scenes that are harder to encode would be more prone to this if it’s even an issue. Is this something you guys have tested internally?

Perhaps create a new role, Verifier, which is similar to a transcoder but has GPUs doing only verification?

How sensitive is the collision detection algorithm comparing the result of the perceptual hash? Could the output from different generation NVENCs be enough to cause a failed verification?

The latest tests we’ve ran using MPEG-7 video signatures as the perceptual hash (phash) include comparing:

  • phashes generated for different renditions (i.e. 720p vs. 360p) transcoded by the same GPU
  • phashes generated for the same rendition transcoded by a GPU vs. a CPU

In these tests, I believe the accuracy has been around 98%. However, these tests are definitely not exhaustive with the scenarios that they account for (for example, we have not performed thorough testing comparing various different GPU models). For this reason, the intention is to perform additional testing in the wild on the network where there is likely a greater variety of GPU models and use the results to make improvements in the perceptual hash algorithm, the fast verification implementation or both. As mentioned in an earlier comment, the expectation is that there is the possibility of false negatives (i.e. no collision even though the phashes correspond to videos with the same content), so the fast verification implementation (i.e. in the local ban policy) will consider the frequency of failures instead of treating a single failure as an indication of orchestrator misbehavior.

1 Like

I think the orchestrators basically play this role in the full verification design they just have the ability to process both verification and transcoding tasks. While in theory someone could reserve their GPUs for just verification tasks, my guess is that they’d want to utilize any available GPU capacity to increase revenue so if they are not saturated with verification tasks, they’ll take on any transcoding tasks that are around.

1 Like

If I understood your explanation on the recorded call, these hashes will always have some false negatives, what % of the total are expected to be false -ve?

Here’s an idea that came to mind:
What are your thoughts on allowing orchestrators to define one orchestrator instance as the “-verifier” since it’s CPU intensive instead of running these on the same CPU as the GPU?
This would also reduce the bandwidth load off the transcoders network to one specific node an orchestrator can locate accordingly.

It appears likely that orchestrators would veer towards hardware that allows for the maximum number of GPUs per motherboard and do not have CPU sockets/chipsets for the CPUs which are currently best in class.
Typically I’d spin up an orchestrator in that region which may or may not be in the exact physical location as the transcode nodes but most likely a VM in a datacenter.

It is a lot easier, and cheaper, for an Orchestrator to provide the (growing) CPU requirements (and more importantly, scale) for Fast & Full Verification as their transcoding traffic grows in the DC instead of at each transcoder pool’s physical location.