Some thoughts regarding verification. If I understand correctly, on conceptual level, the Capability is a pure function (doesn’t affect external or have an internal state) which is repeatedly applied to the latest subsequence of the data stream. In case of video, the subsequence is a few second chunk. If the Capability is, say, face detector, it will produce face rectangles for each frame of the video. Thus, if the unit of work we need to verify consist of a fairly large number of such subsequences (a few second chunk of video is hundreds of frames, required subsequence for face detection is 1 frame), the verification could be done statistically, by sampling a small number of subsequences, applying the same Capability function to them, and comparing with Transcoder results. While not providing a 100% guarantee specific unit of work is performed correctly, it will not require implementing any capability-specific verification logic, because Capability=verificationFunction, and will only add a fraction of original task complexity as an overhead.