AI Video Compute Technical Update 2/22/24

yondon · February 22, 2024, 4:00pm

Summary

The focus since the last update has been on implementing the workflows for text-to-image, image-to-image and image-to-video capabilities in the ai-video branch of go-livepeer and in the ai-worker repo.

Highlights include:

Implemented the workflow for these capabilities for a single B + multiple Os along with support for advertising supported models during capability discovery which is demoed here.
Released an initial benchmarking script for AI video subnet jobs.

Updates

Advertising Supported Models w/ Capability Constraints

The current architecture allows an O to advertise text-to-image, image-to-image, image-to-video as new capabilities. Given a request from a B that requires a capability, O will execute a job with the capability - if a request requires the image-to-video capability then O will execute an image-to-video job with its image-to-video capability.

These generative AI capabilities can be used with a variety of diffusion models. Initially, these models will be identified using their model ID on HuggingFace. The O needs to have the weights for the model available in storage so that the weights can be loaded into GPU VRAM before inference with the model can be executed. An O can decide which models that it wants to support for which capabilities using a JSON file that looks like this:

{
  {
    "pipeline": "text-to-image",
    "model_id": "stabilityai/sdxl-turbo",
    "warm": true
  },
  {
    "pipeline": "image-to-video",
    "model_id": "stabilityai/stable-video-diffusion-img2vid-xt"
  },
  {
    "pipeline": "image-to-video",
    "model_id": "stabilityai/stable-video-diffusion-img2vid-xt-1-1"
  }
]

An O will advertise a capability along with a list of supported model IDs as a capability constraint. These constraints describe the supported configurations for a particular capability. In addition, to advertising supported model IDs in capability constraints, O also can advertise whether it has a model “warm” in GPU VRAM which would lead to a faster execution of the first request for that capability.

External and Managed Containers

In the current architecture, O uses containers to execute inference code for a text-to-image, image-to-image or image-to-video capability.

In the latest code, an O can be configured to use external or managed containers.

O will start/stop managed containers using a Docker based system.
O can use external containers by configuring a URL for a capability + model if the container is hosted/managed outside of the node. The container might be managed by a service like Modal or the operator might write their own custom logic to manage the lifecycle of containers based on request activity and their own devops stack (i.e. k8s). The latter is a theoretical possibility, but there is minimal support for it right now - however, if this is interesting to you please follow up!

Timeline

The main goal of the next 2 weeks is to implement a basic pricing framework and payment workflow for the new capabilities. And after that the intent is to start preparing for testing and iteration with the community!

Karolak · February 22, 2024, 4:08pm

Would having a card with more VRAM make it possible to keep more warmed containers? Lets say rtx 6000 ada having 48 gb and having 3 containers warmed could be quicker in executing jobs than lets say rtx 4080 with less number of warmed containers, even though performance of the rtx 4090 is much better.
Can each GPU be used as a separate Transcoder with different models assigned (and warmed) to it?

yondon · February 23, 2024, 9:58pm

Yeah I think that should be possible though haven’t tested much here. The current implementation is naive and just maps 1 container to 1 GPU regardless of how much VRAM is available on the GPU, but logic could be implemented to more intelligently map containers to a GPU taking into account VRAM.

Topic		Replies	Views
AI Video Compute Technical Update 1/16/23 Research & Protocol Improvements	2	604	February 7, 2024
AI Video Compute Technical Update 12/29/23 Research & Protocol Improvements	6	702	January 5, 2024
AI Video Compute Technical Update 3/18/24 Research & Protocol Improvements	6	518	March 22, 2024
A benchmarking script for AI video subnet jobs Research & Protocol Improvements	18	1064	March 12, 2024
AI Video Compute Technical Update 4/08/24 Research & Protocol Improvements	0	515	April 8, 2024