Livepeer Workflow Kit

What did you build?

One sentence. What is the output?

Livepeer Workflow Kit — the first framework that lets AI agents compose and run Livepeer media workflows — plus two new open-source Livepeer runners (diarized/streaming audio transcription and Florence-2 vision) that are deployed as live capabilities on the Livepeer Modules Gateway.

Why does it matter?

One to two sentences. What problem does it solve, and for whom on the network?

It is the first outside application to integrate Mike Zoop’s DAO-funded Livepeer Modules Gateway, and it breaks Livepeer out of being primarily for media creators into a media-intelligence backend for anyone driving an agent through everyday or professional tasks. This framework enables users/agents to use Livepeer to address tasks like:

  • “Extract the slides from this video,”
  • “Capture and transcribe this Google Meet,”
  • “Cut these home videos into scenes.”
    Because Roboflow is the workflow core, every new Livepeer module becomes a reusable block, so the framework — and the network’s reach into agent ecosystems like OpenClaw and Hermes — scales without rewrites; the two new runners also give orchestrators reusable diarized-transcription and vision capabilities that are essential primitives for agentic tasks.

Link to the work

Direct link to the merged PR, deployed tool, published docs, or equivalent. No link = no review.

Evidence of impact

How is this being used? Who benefits? Quantify where possible (e.g. number of users, PRs merged, integrations adopted).

  • First outside application integrated with the Livepeer Modules Gateway: first Roboflow → Livepeer gateway calls succeeded, with paid usage and billing observed through the gateway.
  • 2 new network runners deployed and published live by orchestrators (openai:audio-transcriptions diarized/streaming model; florence-2 vision), reusable by any caller via OpenAI- and Roboflow-compatible APIs.
  • Agent-native: given the framework with no example, multiple different agents independently figured out how to capture and transcribe a meeting from a one-line “take this stream and get this data” prompt.
  • Beneficiaries: agent/end users gain Livepeer access for use in everyday task based workflows; the gateway operator gains a first live outside integration with paid usage; orchestrators gain two new deployable runner capabilities.

Community proof points

Link to the Discord thread or other signal. Reminder: you should also get 2-3 Orchestrators commenting their support for this work.

Amount requested

USD-equivalent (max $5,000). Include a brief breakdown if the amount is above $2,000.

$5,000 USD-equivalent in LPT. Development work breakdown:

  • Livepeer Workflow Kit (85%)
    • Core framework
    • Livepeer-aware Roboflow blocks
    • CloudSPE integration
    • Ingest paths
    • Session runner
    • Agent skill
  • Audio diarized transcription runner (10%)openai:audio-transcriptions capability
    • NeMo diarized ASR
    • OpenAI-compatible bounded transcription
    • True streaming (WebSocket) transcription
    • Speaker diarization with segment/word timestamps
  • Florence-2 vision runner (5%)florence-2 vision capability
    • Florence-2 screen, slide, image, and visual-text understanding
    • OpenAI-compatible vision chat route
    • Direct vision analysis route
    • Roboflow LMM-compatible routes
3 Likes

This is a great concept. While the building blocks themselves are valuable, I think the real breakthrough here is the move toward a truly dynamic, user-defined workflow system.

By providing a framework that allows AI agents to compose and execute these tasks on the fly, you’re shifting the network’s value proposition from static media processing to an agile, agentic intelligence backend. This capability to dynamically chain modules—turning “media ingestion” into “task-based intelligence”—is exactly what will unlock more complex, autonomous use cases that go well beyond simple media transformation. Looking forward to seeing how these workflows evolve as the ecosystem begins to leverage this flexibility.

1 Like

Thanks Mike. I will say, your Livepeer Modules Gateway stack makes it easy for someone like myself to simply focus on building use cases, and abstracting away the network complexities. This platform can role into self hosting more of the stack or adopting other gateway tech, but what you built gives an easy path for users and made getting to PoC fast. Huge kudos to you and your team!

Ok but how are you gonna make orchestrator manage software that needs to be run/loaded on the gpu? How do you see this workflow exist next to other workflows which already utilize some gpus ie daydream? The model where orchestrators need to hot load the models which are in demand did not work, babysitting orchestrators is not possible that way, we tried it with many different ai models tried to be run on the netowrk by several parties. Nothing besides daydream left and it was nightmare to coordinate programatically on the network - gateway- orchestrator level. Having 10 different docker images loaded for 10 different projects/models eventually canibalize itself and none of the project can be run at the proper quality level. And orchestrators wont keep it loaded if it is not making money. Currently orchestrator keep DAYDream loaded due to incentives, nobody will unload daydream and load whatever different without incentives. Software stack is only getting more fragmented and less managable to maintain network-wise. Ehh livepeer is doomed and the longer I think about it the more I bias towards livepeer as rent gpu server and pay for time market. So everybody can use our gpus as they like without the need of coordinating by orchs cause user who rents the servers takes the responsibility for this. That is the only way to abstract software stack from the network stack which should only act as a payment layer.