Summary
The focus since the last update has been on completing the engineering milestones required for an e2e paid AI inference job execution workflow where a broadcaster can send a request along with a payment ticket to an orchestrator and the orchestrator can process the payment (and redeem any received winning tickets) and execute the job.
Highlights include:
- Implemented an updated selection workflow that takes into account the models advertised by orchestrators per capability and whether the orchestrator has the model weights “warm” (i.e. loaded into GPU VRAM).
- Implemented a payment workflow for the text-to-image, image-to-image and image-to-video capabilities which is demoed here.
Updates
Capability and model aware selection
In the last update, we noted that orchestrators are able to advertise supported models with capability constraints. However, at the time, a broadcaster only filtered orchestrators based on whether they supported a model and whether the model was warm - the broadcaster did not use this information to prioritize orchestrators during selection. Furthermore, previously, the broadcaster used a naive round robin selection strategy to decide which orchestrators to send a request to.
As of the this update, the following improvements have been made to the broadcaster implementation:
- The broadcaster will first select from the pool of orchestrators that have the model warm and then select from the pool of orchestrators that do not have the model warm
- The broadcaster will use the same selection strategy used for transcoding, that considers stake, price and latency, when selecting orchestrators [1]
[1] The selection strategy has been a topic of debate within the community so for now the intent is to not make any changes to it as it pertains to AI capabilities and leave improvements to the strategy (whether it be tweaks to weights, algorithms or even the introduction of a more modular system) to be addressed separately.
AI capability pricing
As of this update, the ai-video
branch of go-livepeer allows an orchestrator to advertise a price per pixel for each capability + model ID pair - they might charge X for text-to-image with stabilityai/sdxl-turbo and charge Y for image-to-video with stabilityai/stable-video-diffusion-img2vid-xt. The prices would be set in a config file (that is also used to specify the supported models) specified via the -aiModels
flag that looks like this:
[
{
"pipeline": "image-to-video",
"model_id": "stabilityai/stable-video-diffusion-img2vid-xt-1-1",
"price_per_unit": 3390842
},
{
"pipeline": "text-to-image",
"model_id": "stabilityai/sdxl-turbo",
"price_per_unit": 4768371
},
{
"pipeline": "image-to-image",
"model_id": "stabilityai/sdxl-turbo",
"price_per_unit": 4768371
}
]
The price varies per capability because the compute cost of generating an image can differ from that of generating a video. The price also varies per model ID because the compute cost of using one model can be differ from the cost of using another for the same capability (see SD1.5 vs. SDXL).
The compute cost of a capability + model ID can also be influenced by request parameters such as the output resolution. The current implementation accounts for the output resolution by calculating the payment required for a request based on the following formula:
output_pixels = output_height * output_width * output_frames
payment = output_pixels * price_per_pixel
Generally, if a request generates more pixels, the fee for the request increases. For example, a text-to-image request for a 1024x1024 image will cost more than a text-to-image request for a 512x512 image. And an image-to-video request for a 576x1024 video with 25 frames will cost more than an image-to-video request for a 576x1024 video with 14 frames.
There may be other request parameters that could influence the compute cost of a request. At the moment, these parameters are not yet factored into the pricing of requests and many of these parameters are also not adjustable yet by users. The current implementation is just a starting point and the intent is for it to evolve over time to more accurately capture the costs incurred for using a model.
The demo also references a pretty rough pricing worksheet that was used to derive a price per pixel to charge per capability + model ID. The methodology used in the worksheet was:
- Get the price per request for text-to-image and image-to-video charged by SaaS APIs (specifically Together.AI and Stability.AI
- Assume a specific resolution (and # frames for video) for the output and # of inference/denoising steps (note: this is not factored into pricing right now and the # of inference steps is whatever the diffusers library sets as the default for now)
- Divide the price per request by the number of output pixels in order to get a reference price per pixel
- Use that price per pixel for a capability + model ID pair
As mentioned earlier, the pricing implementation will need to be iterated on over time and additionally the methodology for orchestrators to determine how to price a capability + model ID could use more thought too! The community is welcome to not only play around with the worksheet and improve on it, but also to use it as a jumping off point to make improvements to how pricing could work overall.
AI capability payments
As of this update, a broadcaster will use the pricing implementation (described in the previous section) to create a payment with N tickets such that the cumulative ticket EV will cover the fee for a request. The payment will then be processed by the orchestrator in the same way that payments are processed by orchestrators for transcoding today. The implementation re-uses the existing probabilistic micropayment system used for transcoding.
Next Up
- Design a solution for mainnet orchestrators to advertise a separate service URI that can be used on the subnet
- Design a solution for collecting metrics on the subnet
- Testing with the most recent version of the
ai-video
branch of go-livepeer now that there are basic implementations of all the required components to complete an e2e workflow