Summary
The focus since the last update has been on implementing the workflows for text-to-image, image-to-image and image-to-video capabilities in the ai-video
branch of go-livepeer and in the ai-worker repo.
Highlights include:
- Implemented the workflow for these capabilities for a single B + multiple Os along with support for advertising supported models during capability discovery which is demoed here.
- Released an initial benchmarking script for AI video subnet jobs.
Updates
Advertising Supported Models w/ Capability Constraints
The current architecture allows an O to advertise text-to-image, image-to-image, image-to-video as new capabilities. Given a request from a B that requires a capability, O will execute a job with the capability - if a request requires the image-to-video capability then O will execute an image-to-video job with its image-to-video capability.
These generative AI capabilities can be used with a variety of diffusion models. Initially, these models will be identified using their model ID on HuggingFace. The O needs to have the weights for the model available in storage so that the weights can be loaded into GPU VRAM before inference with the model can be executed. An O can decide which models that it wants to support for which capabilities using a JSON file that looks like this:
{
{
"pipeline": "text-to-image",
"model_id": "stabilityai/sdxl-turbo",
"warm": true
},
{
"pipeline": "image-to-video",
"model_id": "stabilityai/stable-video-diffusion-img2vid-xt"
},
{
"pipeline": "image-to-video",
"model_id": "stabilityai/stable-video-diffusion-img2vid-xt-1-1"
}
]
An O will advertise a capability along with a list of supported model IDs as a capability constraint. These constraints describe the supported configurations for a particular capability. In addition, to advertising supported model IDs in capability constraints, O also can advertise whether it has a model “warm” in GPU VRAM which would lead to a faster execution of the first request for that capability.
External and Managed Containers
In the current architecture, O uses containers to execute inference code for a text-to-image, image-to-image or image-to-video capability.
In the latest code, an O can be configured to use external or managed containers.
- O will start/stop managed containers using a Docker based system.
- O can use external containers by configuring a URL for a capability + model if the container is hosted/managed outside of the node. The container might be managed by a service like Modal or the operator might write their own custom logic to manage the lifecycle of containers based on request activity and their own devops stack (i.e. k8s). The latter is a theoretical possibility, but there is minimal support for it right now - however, if this is interesting to you please follow up!
Timeline
The main goal of the next 2 weeks is to implement a basic pricing framework and payment workflow for the new capabilities. And after that the intent is to start preparing for testing and iteration with the community!