ComfyStream SD1.5 Workflow Performance Tuning

I spent a little time playing around with the configs and looking at the performance metrics on Runpod. A few findings:

  • Runpod is CPU bottlenecked. The main ComfyStream thread is running at 1300%. This significantly limits the possible FPS. I can only do 6FPS consistently. (On my local machine with a Ryzen 9700, I can get up to 24FPS). GPU is only 10-20% utilized at 6FPS.
  • Increasing the KSampler denoise steps to 3 significantly improved the output visual quality. It also increased GPU utilization to about 35%.
  • Increasing the CFG also improved output visual quality, although it needs to be coupled with increased denoise steps. It also increased GPU utilization. At 1.5 CFG / 3 denoise steps, GPU utilization is around 50% for 6fps.
  • This also points to the potential of StreamDiffusion. Moving to a multi-step batching workflow will likely significantly improve visual quality (separate from the temporal consistency improvement).
1 Like

@ericxtang Thanks for sharing your findings! Have you already run a profiler on the workflow to pinpoint the cause of the high CPU usage, or would you like the research team to take a look?

To summarize your video, we currently have:

  • CPU is the bottleneck for FPS. It already as at it’s max on your machine.
  • Increasing denoising steps enhances quality but also increases GPU usage.
  • Increasing CFG improves output quality and further utilizes GPU resources.
1 Like