Video DepthAnyhing and Temporal Consistency

ericxtang · February 10, 2025, 11:29pm

ByteDance dropped a video DepthAnything model last week. The paper mentions temporal consistency issues with DepthAnything 2, and improves on it with “an efficient spatial-temporal head”. This raises the question - is the current temporal inconsistency partially introduced by inconsistency in the depth map?

A potential way to examine this is compare the results of the DepthAnythingV2-Tensorrt model with the video DepthAnything model, and see whether the video model’s more temporally consistent depthmap produces more temporally consistent diffusion output.

ps - the paper does mention that it currently struggles with streaming workflows, which is left for future work. However, if the performance is fast enough for segments of video, it can still be useful for one-to-many live video use cases, with 1-2 seconds of additional delay.

Topic		Replies	Views
New Marigold Model Drop AI Research	0	90	May 15, 2025
A benchmarking script for AI video subnet jobs Research & Protocol Improvements	18	1265	March 12, 2024
AI Video Compute Technical Update 1/16/23 Research & Protocol Improvements	2	644	February 7, 2024
AI Video Compute Technical Update 2/22/24 Research & Protocol Improvements	2	455	February 23, 2024
AI Video Compute Technical Update 12/29/23 Research & Protocol Improvements	6	784	January 5, 2024

Video DepthAnyhing and Temporal Consistency

Related topics