The initial focus for R&D on supporting AI video compute to the network has been prototyping to identify a practical way for orchestrators to run AI inference jobs (i.e. generation, upscaling, frame interpolation) on a mainnet sub-network.
The intent is for ongoing work on this track to be a part of the recently proposed AI Video SPE (at the pre-proposal stage).
Updates
Containerized Job Execution Demo w/ SDXL-Turbo and SVD-XT
Start and run a container to generate an image from text using the SDXL-Turbo model.
Start and run a container to generate a video from an image using the SVD-XT (Stable Video Diffusion XT) model.
I recommend watching at 2.5x speed. Also, note that the video generation section was trimmed since SVD-XT generation took ~1.5 minutes on a RTX 3090.
The idea is that a Go package could be integrated into go-livepeer so that an orchestrator could do the above. In that scenario, unlike with these code examples, the orchestrator would likely want to start and “warm” the containers ahead of time to make sure they’re ready with model weights preloaded on the GPU for incoming requests.
A bunch of things that will need to be optimized and thought through, but containerizing pre-processing, model inference and post-processing code to be executed within a container seems like a promising approach.
Architecture Design
There are a lot of moving parts with the architecture and lots will likely change over time, but here are a few snapshots of some of the early thinking.
Looking good! Regarding the container warm up strategy, will there be a set of predefined models that will be available to users and kept “hot” or will models be loaded dynamically based on the request? Is there an estimate of the system RAM requirements at this point?
will there be a set of predefined models that will be available to users and kept “hot” or will models be loaded dynamically based on the request?
My current thinking is to start simple with a very limited set of predefined models for a basic video generation workflow (i.e. generation, upscaling, frame interpolation) and explore dynamic loading later on since it would be much more complex.
Currently looking at sdxl/sdxl-turbo for image generation, svd/svd-xt for video generation, Real-ESRGAN for upscaling and FILM for frame interpolation.
Is there an estimate of the system RAM requirements at this point?
Haven’t looked into system RAM yet. For VRAM, if all models are loaded into memory upfront then will need enough VRAM to fit all the models and if all models are loaded/offloaded as they are used then the VRAM requirements would be reduced, but inference speed will be slower.
@yondon Thank you so much for the technical update . I’m thrilled to see this implemented as a containerized workflow! I have a few questions regarding the details:
For the first milestone, you consider providing popular models like Stable Diffusion. It might be a bit early, but do you envision AI researchers uploading their custom models to a live-hosted Docker repository? Could these containers be pulled locally upon the first network request and then, perhaps, automatically purged if unused for an extended period?
Regarding security and permissions, have you investigated the minimal permissions required for these Docker containers? In my experience with AI containers, I’ve occasionally needed to use the --privileged flag to avoid CUDA permission errors, even when using the Nvidia runtime. However, this raises significant security concerns, especially when running containers from external sources. How do you plan to address this?
What are your thoughts on including a method for ensuring the integrity of the orchestration AI inference process? Would it be enough to include the hash of the image, which the orchestrator’s container used to generate the output, in the production itself? Would this provide a reliable way to validate that the indeed performs the requested inference tasks correctly?
Good questions and touches on a bunch of open areas for research!
I do think supporting custom models/containers could be interesting especially if there ends up being lots of models finetuned for specific domains (for example, a video model that is finetuned for motion in specific martial arts or dance styles). The images could be pulled on first request or perhaps the developer could have an option to preemptively request for images to be pulled + containers to be started to reduce initial latency. There are a bunch of security questions to figure out (related to your second question) so I imagine the progression to look something like: support a predefined set of models/containers → establish a centralized review process for adding to this set of models/containers → figure out how to support a more open process for bringing models/containers.
I haven’t investigated this topic deeply yet and welcome any suggestions for ways to minimize the set of permissions required! During prototyping, I’ve been trying out Cog to containerize inference code and I haven’t run into permission issues yet, but I’m guessing Cog is handling the permission settings under the hood. As it relates to custom models/containers, I’ve wondered if something that could help is having curated image registries where orchestrators would only opt-in to using registries that have curators that they trust.
At the minimum, I think the orchestrator needs to provide cryptographically signed receipts with an output. The receipt could contain the hash of the image used as well as other metadata such as the # inference steps, # of pixels in the output, etc. These receipts could then be used by a broadcaster to prove that the orchestrator attested that a specific output resulted from executing a specific image container. The arbitration process for a dispute is tricky because its not clear to me how feasible it is to force deterministic behavior for various models [1] without taking on a big performance hit - for example, while I know that using the same seed + other settings for SD should result in the same image from a visual perception POV, I’m not sure that you can guarantee that the image is bit-for-bit the same regardless of the hardware used. There are folks working on ZKP based proving of model inference and there are optimistic variants, but the question will be how much overhead each one introduces. Additional mechanisms that require more trust include relying on multi-step arbitration where step 1 runs a trusted classifier to see if an output is a near exact match to the expected output for a specific container image and step 2 is human review if there is disagreement with step 1.
Looking good!
Is there any verification of work at all for AI jobs?
An RTX 4090 (same amount of RAM) is approximately 40% faster for this type of workload, what could this mean in terms of minimum GPU specs for AI Orchestrator’s in your opnion?
@stronk has some related questions in the AI Alphanet GPU poll waiting on someone from Livepeer to chime in.
Hey @yondon, it’s great to see our development visions aligning! I’m quite impressed with the development pipeline you’ve outlined.
Regarding your use of Cog during prototyping - that’s an intriguing find! I appreciate you sharing it. Cog’s permission structure is a valuable reference when we start fine-tuning our system. In my experience, especially when creating security-critical containers, I often turn to singularity. Singularity’s approach to security, particularly in scenarios involving untrusted users and code, might be more robust and appropriate for our context. Although it presents certain advantages, it is essential to note that this technology is less mature and has seen relatively less development compared to Docker. Consequently, one must exercise considerable caution before integrating it into a critical system like Livepeer.
I see significant merit in your idea of curated image registries for custom models/containers. In a previous post, I suggested a Livepeer registry merely as a centralized collection for custom containers within the Livepeer ecosystem. While curating such a registry and allowing orchestrators to opt in could enhance trust and security, we must also be mindful of potential scaling challenges as demand for custom models grows. Perhaps a more balanced approach would be to combine automated security checks during container uploads with stringent isolation measures in go-livepeer to mitigate security risks. This hybrid strategy could offer both security and scalability. Orchstrators can then opt to use trusted or untrusted sources.
On orchestrators, providing cryptographically signed receipts, including container code and training metadata in these receipts, is a great starting point. It might even suffice for pre-selected containers. However, enforcing deterministic behaviour across various models is a complex challenge. In my experience, even with consistent seeds, dependencies, and operating systems, hardware differences can still lead to output variations (as illustrated in this GitHub issue from one of my repositories).
While this issue might take time to be critical for the initial SPE, it certainly poses an interesting research challenge and future consideration. The potential of ZKP, especially when combined with optimistic variants, is promising, provided they don’t excessively tax our resources. We could also explore AI watermarking techniques like synth id to see if they can be applied or enforced in our docker container uploads. Other alternatives include assessing output probability based on the expected model generation or periodically distributing jobs to multiple orchestrators for comparative analysis based on a mean value. Let’s return to this intriguing topic when it becomes more relevant in our timeline.