Thanks to all the participants of our orchestrator survey and thanks for being patient with us to present the results. With a couple of follow ups (and plenty of distractions) it unfortunately took us longer than expected to get the evaluation into a presentable format, but here we go:
A note on the participants: We got feedback representing estimated 10-20% of staked LPT (we had some anonymous respondents). 85% of those used a combined O/T (Orchestrator/Transcoder) setup vs. 15% with a standalone Orchestrator (excluding transcoder pools here). The regions supported of survey participants are:
- Europe 100%
- North America 90%
- Asia 60%
- South America 20%
Note: In general, multiple answers have been possible (in cases percentages add to >100%).
On the motivation to run Orchestrator(s)/Transcoder(s):
- Support network/decentralization: 85%
- Income: 70% (mostly side-hustle, 20% of those full-time though)
- Own video streaming operations: 15%
- Learning: 10%
Currently 40% use nodes to support other web3 networks as well. The decision on which region to serve is mainly driven by considering demand (75%), yet 30% base that on locations of LivepeerInc ingest nodes and 10% just serve regions based on the location of their own resources/datacenters.
On the setups
A cloud setup was only reported for parts of the standalone Orchestrators, i.e. all combined O/T setups of the survey are on-prem with 50% of those using hosted services. Other than transcoder pools, no one has transcoders connected that they don’t run themselves.
Roughly half of the respondents have more than 1 transcoder connected to their orchestrator. Those transcoders have different setups (outside of GPUs) only in 25% of cases.
The Bandwidth setup is typically the same across Orchestrators as well: 80% of respondents mentioned 1-2Gbps (symmetric) for their nodes, some even 10Gbps.
On RPCs: 1/3 run their own node (the others use Infura/Alchemy/public Arbitrum/Community Node). Cloudflare workers are used by 40% of Orchestrators.
As less relevant for operations, there is typically less than 250GB (SSD/NVME) on storage provided. There are 6-8 vCPUs for 70% of the setups. Less surprising, some smaller ones (i.e. 4 vCPUs) are related to lower capacity (in terms of sessions/number (and ‘size’) of GPUs) and conversely, larger (i.e. 16 vCPUs) to larger capacity,
Over 50% of respondents mentioned that they can easily 5-10x (and more) their current capacity when demand requires so (using K8s/docker).
There is an average of ~4 GPUs per Orchestrator (outliers in below boxplot not shown, transcoder pools not included), 1.4 GPU per Transcoder on average respectively. Note that for this section we also added data from publicly available dashboards and forum-posts if data is <1y old:
We haven’t asked for the corresponding offering on resolutions, but just for the maximum session capacity in general. Keeping that caveat in mind1, the average session capacity per GPU is close to 18:
We also asked for the GPU specifications. We classified those into two major groups:
Smaller: up to 8 GB RAM, <5k CUDA cores (typically pre 2020 releases, e.g. 1660, 1660ti, 2060, 1070, 1070ti, 1080, Tesla p4, 2080, 4000)
Larger: 8+GB RAM, >5k CUDA cores (typically released 2020 and beyond, e.g. 3060, 3070 (only one with 8GB RAM in the set), 3080, 2080ti, 1080ti, 4080, 3080ti, A4000, 3090)
The overall data set had ~80% smaller GPUs (~60% of which 1070 or 1080) and 20% larger (⅔ are 2080ti, 1080ti and 3090), reflecting the common setups of either only smaller ones (often similar type, e.g. only 1070) or 1-3 small and 1-2 medium/large ones per Orchestrator.
We asked for cost per instance (which includes related costs like bandwidth and power), total costs (e.g. incl. labor cost) and the monthly hours spent on operating the Orchestrator Infrastructure.
The latter was most dispersed, even when accounting for the different setups, e.g. by normalizing with the session capacity offered. Typically one would expect some sort of economies of scale here, i.e. when Orchestrators offer more capacity, the work hours per session decline. However, in our survey, we observed the opposite: respondents with >100 max. sessions capacity would have an average of ~0.33h/month/session capacity vs. 0.2h/month/session as average for those Orchestrators with <100 max. session capacity.
Besides the general noise of the answers (standard deviation 0.15), there seem to be differences related to how accurately these numbers might have been tracked given the lower hour numbers are often noted when there are other web3 networks supported as well or the motivation for Livepeer operations mentioned as side hustle.
Also video transcoding is obviously a very specialized task, which requires both extensive onboarding time and continuous maintenance and adjustments also related to changing demand. The former was also noted in comments around the learning curve, i.e. respondents mentioning that initially they spent 5-10x the time to ‘onboard onto Livepeer’.
Normalizing the answers for session capacity, some economies of scale for cost of instances and total costs are apparent:
This observation has some caveats though
Noisy data: the underlying data is quite noisy, such that R^2 for the total cost is just 0.4 and for the instances 0.64 (similar to the ‘accounting’ of work hours, we can’t expect that respondents have a common ‘accounting-standard’ of their hardware investments) - consequently, the red area around the regression lines indicating the standard error is quite wide
Utilization and cost of bandwidth: the noted costs are related to current utilization (which also varies between node operators as the amount of ‘work done’ varies, see also a discussion on that in the appendix), i.e. this is not the cost at full utilization of total capacity - whilst clearly higher utilization will shift the lines higher (related to higher cost for e.g. bandwidth and power), it is not clear to which extent esp. cost for bandwidth in line with higher utilization might change the above curves as well
Cost of labor: The fact that total costs per session seem to decline slower than the cost per instance are in line with the absence of economies of scale for labor mentioned above. At least, if we assume an hourly wage of $20-30 and add that to the reported instance cost, we roughly get the total cost curve shown in the above chart
The last point also means, the share of labor to total costs might even be higher if one assumes higher wages, e.g. if we take 40-50$ the share of labor cost at total costs ranges from 50-70% vs. 30-50% in the above chart.
Another potential driver of the gap instance vs. total costs might be the attribution of RPC related costs, since most Orchestrators with higher session capacity would run their own node and consequently report lower cost for their instances.
Last but not least, a note on costs of capital: most orchestrators don’t have a significant amount of self-bonded LPT2 as staked LPT is provided by delegators and hence there is little direct costs related to capital. However, there might be implicit costs in terms of ‘lost revenue’ given stake impacts how the go-livepeer Broadcast node distributes encoding tasks (and subsequent income). Hence, Orchestrators need to attract delegators with favorable commissions on LPT rewards and fee cuts, i.e. some share of their income, beyond the necessary amount to get into the top 100. A recent adjustment put more weight on the price per pixel, yet it is not clear to which extent this improves the dependency of stake and received work (see another forum post and a good discussion around it in the water cooler chat).
These considerations weren’t part of the survey, yet the ‘distribution of work’ and differences between Orchestrators need to be kept in mind when looking at the above setups (e.g. some respondents commented that they have newer GPU models that just aren’t economic to add right now), costs and how generalizable they are to e.g. get a view on the total operating cost of the Livepeer network, which is why we added some initial data points in the appendix.
Wrapping it up
Whilst less revealing that the largest cost driver (if allocated per session capacity) is the hardware (i.e. the instances), the survey showed that there is a significant share of costs related to labor/operations of the infrastructure (potentially even understated). It wouldn’t surprise that this share is in fact also higher for ‘smaller’ setups (with respect to session capacity) assuming invested work hours might not be tracked pedantically esp. when considered a side-hustle. Beyond costs, we got some insights into the setups esp. on the GPU types provided which helps both understanding the cost base but also the transcoding capabilities (e.g. av1 encoding).
We haven’t contrasted the setups with data on performance, utilization and workload e.g. to get some understanding on general levels of profitability, which could be an area of further analysis3.
Thanks again to all participants for trusting us to conduct this survey. We look forward to hearing your thoughts on the results and discussing how we might even leverage these insights to improve the Livepeer ecosystem!
1) See here for resolutions vs. transcoding capacities (even though the transcoding profiles might not match exactly those of Livepeer’s transcoders): nVidia Hardware Transcoding Calculator for Plex Estimates
2. I.e. currently ~10 with 3.5k+ LPT, which would correspond to ~100$/month on capital if we assume 5% interest rate and LPT price of 7$. The orchestrators with very high amounts of LPT self staked typically have very little (if not 0) transcoding fee income, high shares of self-bond and also high shares on reward- and fee cuts which raises the question to which extent these are comparable orchestrators to those in the survey
3. Tools exists though for some back of the envelope considerations, e.g. as provided by titan node
Livepeer’s work token model requires Orchestrator to provide LPT stake to be eligible for assignment of transcoding jobs. Moreover, the broadcaster algorithm used to distribute encoding tasks to Orchestrators to 70% stake based (only recently adjusted), incentivizing Orchestrators to seek stake beyond the required LPT to get into the top 100 by stake.
To attract the required LPT from delegators, Orchestrators set a reward cut, i.e. the share of received LPT rewards they keep and a fee cut (the share of earned transcoder fees they keep). Those cuts are widely distributed amongst orchestrators (and changing over time):
From an orchestrator’s perspective, it seems fair to assume that delegators would like to maximize their yield. The inflationary rewards are pro rata to staked LPT and paid continuously, so unless Orchestrators fail to claim, this income stream to delegators just depends on the reward cut.
The fee cut however might deem attractive when lower, yet fees need to be earned in the first place and also to be substantial in relation to the LPT rewards in order to matter for the delegators’ yield considerations. It seems lower cuts are corresponding to higher stake and consequently the share of total rewards. However, the share of fees earned does not:
Looking at the cuts for current top 5 Orchestrators by stake (green), the 5th-20th (orange) and the remaining ones (red) over time, it does seem that larger Orchestrators consistently had lower reward cuts (even though converging lately), shown by the solid lines in the left chart below.
Consequently, e.g. the top 5 orchestrators by current stake earned 40-50% of all rewards (dashed green line left chart):
Conversely though, they only earned 20% of all fees generated (dashed green line right chart, until recently the major part of those just by Vires-in-Numeris) - they seem to choose not to provide transcoding services pro-rata to their stake. The potential reasons for this are worth a separate analysis, yet it’s clear that there are differences in the cuts Orchestrators charge depending on their total stake and differences in the amount of transcoding services they are able/willing to provide.