AI Video Compute Technical Update 2/22/24

Summary

The focus since the last update has been on implementing the workflows for text-to-image, image-to-image and image-to-video capabilities in the ai-video branch of go-livepeer and in the ai-worker repo.

Highlights include:

  • Implemented the workflow for these capabilities for a single B + multiple Os along with support for advertising supported models during capability discovery which is demoed here.
  • Released an initial benchmarking script for AI video subnet jobs.

Updates

Advertising Supported Models w/ Capability Constraints

The current architecture allows an O to advertise text-to-image, image-to-image, image-to-video as new capabilities. Given a request from a B that requires a capability, O will execute a job with the capability - if a request requires the image-to-video capability then O will execute an image-to-video job with its image-to-video capability.

These generative AI capabilities can be used with a variety of diffusion models. Initially, these models will be identified using their model ID on HuggingFace. The O needs to have the weights for the model available in storage so that the weights can be loaded into GPU VRAM before inference with the model can be executed. An O can decide which models that it wants to support for which capabilities using a JSON file that looks like this:

{
  {
    "pipeline": "text-to-image",
    "model_id": "stabilityai/sdxl-turbo",
    "warm": true
  },
  {
    "pipeline": "image-to-video",
    "model_id": "stabilityai/stable-video-diffusion-img2vid-xt"
  },
  {
    "pipeline": "image-to-video",
    "model_id": "stabilityai/stable-video-diffusion-img2vid-xt-1-1"
  }
]

An O will advertise a capability along with a list of supported model IDs as a capability constraint. These constraints describe the supported configurations for a particular capability. In addition, to advertising supported model IDs in capability constraints, O also can advertise whether it has a model “warm” in GPU VRAM which would lead to a faster execution of the first request for that capability.

External and Managed Containers

In the current architecture, O uses containers to execute inference code for a text-to-image, image-to-image or image-to-video capability.

In the latest code, an O can be configured to use external or managed containers.

  • O will start/stop managed containers using a Docker based system.
  • O can use external containers by configuring a URL for a capability + model if the container is hosted/managed outside of the node. The container might be managed by a service like Modal or the operator might write their own custom logic to manage the lifecycle of containers based on request activity and their own devops stack (i.e. k8s). The latter is a theoretical possibility, but there is minimal support for it right now - however, if this is interesting to you please follow up!

Timeline

The main goal of the next 2 weeks is to implement a basic pricing framework and payment workflow for the new capabilities. And after that the intent is to start preparing for testing and iteration with the community!

5 Likes

Would having a card with more VRAM make it possible to keep more warmed containers? Lets say rtx 6000 ada having 48 gb and having 3 containers warmed could be quicker in executing jobs than lets say rtx 4080 with less number of warmed containers, even though performance of the rtx 4090 is much better.
Can each GPU be used as a separate Transcoder with different models assigned (and warmed) to it?

Yeah I think that should be possible though haven’t tested much here. The current implementation is naive and just maps 1 container to 1 GPU regardless of how much VRAM is available on the GPU, but logic could be implemented to more intelligently map containers to a GPU taking into account VRAM.