Dual Ethash Mining + Transcoding w/ RTX 3080 10G & CUDA MPS

yondon · November 13, 2020, 10:57pm

While dual ethash mining and transcoding is possible on a Nvidia GPU, previous testing has demonstrated that not only does it result in a reduction in hashrate, but it also results in transcoding performance degradation. However, these previous tests used a very simple setup that involved just running the mining process and the transcoding process on the same machine using the same GPU. These two host processes create their own CUDA contexts and activity associated with each CUDA context is serialized on the GPU. As a result, mining and transcoding would not actually be executed concurrently.

Given that transcoding primarily occurs on the NVENC/NVDEC chips on the GPU as opposed to the CUDA cores, I was curious if there was a way to avoid serialization of mining and transcoding. It turns out that it is possible to avoid serialization when using two processes using CUDA Multi-Process Service (MPS). An MPS server can collect activity from multiple processes and pass it to the GPU using a single CUDA context allowing the activity from the processes to be executed concurrently (if they can run concurrently in the first place).

So, I decided to run some benchmarks for ethash mining and transcoding with MPS enabled using a Nvidia GeForce RTX 3080 10G [1]. See below for the results.

[1] MPS does not support usage of NVENC/NVDEC in pre-Volta architecture GPUs. The RTX 3080 uses the Ampere architecture which fulfills this requirement.

Benchmarks

Hardware

GPU: 1x GeForce GTX 3080 10G
CPU: AMD Ryzen Threadripper 1950X 16-Core Processor 2.17 GHz
RAM: 65 GB
Nvidia Driver: 455.23.04
CUDA: 10.2

Testing Tools

ethminer
ffmpeg script
lpms-bench script script
- Note: Would like to standardize this soon!
2 minute video clip segmented into roughly 2 second segments

Both transcoding scripts accept an m3u8 playlist as input which is used to fetch segments for transcoding.

First, I used ffmpeg to quickly test a few setups. Then, I used lpms-bench to get a better sense of expected transcoding performance of a go-livepeer transcoder node since it uses the same transcoding code under the hood.

ethminer configuration

These are two configurations of ethminer that I tried out to control the amount of work sent to the GPU.

Config #	–cuda-streams	–cuda-block-size	–cuda-grid-size
1	2 (default)	128 (default)	8192 (default)
2	1	64	4096

Baseline

These are the baseline metrics for standalone ethminer mining, standalone ffmpeg transcoding and standalone lpms-bench transcoding.

ethminer config #1 hashrate = 86.29Mh
ethminer config #2 hashrate = 80.21Mh (-7.05% relative to ethminer config #1)
ffmpeg 1 stream transcode time = 7.66s
lpms-bench 1 stream transcode time = 8.088s

Hashrate diff is calculated relative to the hashrate of ethminer config #1.
Transcode time diff is calculated relative to the transcode time of ffmpeg 1 stream.

ethminer + ffmpeg

All tests were run with 1 stream.

ethminer config	hashrate (Mh)	hashrate diff (%)	transcode time (s)	transcode time diff (%)
1	73.28	-15.07	34.827	+354.89
2	69.72	-19.2	17.892	+133.69
1	86.29	0	DNF	N/A
2	78.32	-9.23	10.671	+39.38

Using MPS with ethminer config #1 (default) actually resulted in terrible transcoding performance. Transcoding did not even finish! The output of nvidia-smi dmon during this test showed that the encoder and decoder utilization was consistently at 1%. I suspect this had to do with the fact that ethminer maxed out streaming multiprocessor and memory utilization preventing much additional activity from happening concurrently.

Meanwhile, using MPS with ethminer config #2 resulted in a dramatic improvement in transcoding performance and a nice improvement in hashrate as well. Transcoding time in this test was over 3x faster than in the simple dual mining + transcoding test (no MPS, using ethminer config #1). Hashrate in this test was 1.06x higher than in the simple dual mining + transcoding test. I suspect that switching to ethminer config #2 reduced the streaming multiprocessor utilization by a little bit which allowed more activity to be executed concurrently.

ethminer + lpms-bench

All tests were run with ethminer config #2 and MPS enabled.

# streams	hashrate (Mh)	hashrate diff (%)	transcode time (s)
1	78.64	-8.86	11.874
2	78.32	-9.24	20.85
3	77.75	-9.9	30.146

This setup transcodes 3 concurrent streams faster than the simple dual mining + transcoding setup transcodes 1 stream and also achieves a higher hashrate.

ethminer + lpms-bench (simulate live stream)

All tests were run with lpms-bench simulating a live stream by waiting the duration of each segment before submitting it to the GPU. I did not record the transcode times because I expect them to be the same as the ones recorded in the previous section since the only difference in these tests is that a delay was added between between segment submission to the GPU.

# streams	hashrate (Mh)	hashrate diff (%)
1	79.22-80.34	-6.89-8.19
2	79.05-80.16	-7.1-8.39
3	79-79.91	-7.39-8.44

There are more small fluctuations in hashrate when simulating a live stream likely due to the variable rate that segments are submitted to the GPU which depends on the duration of each segment.

Simulating a live stream seems to improve hashrate. I think this makes sense because the delay between segments should result in more periods of time where the GPU is only mining.

Observations

While the benchmarks above only cover a small range of configurations, they demonstrate that that you can substantially improve dual ethash mining and transcoding performance by using MPS and tweaking ethminer parameters at least on a post-Volta GPU such as the GeForce RTX 3080 10G. I suspect there is a lot more room for optimizations here. Better dual ethash mining and transcoding performance means that transcoders can potentially transcode more streams (and earn more) while continuing to mine!

bouncecoin · March 6, 2021, 2:12am

Great article. Can you detail your usage of CUDA MPS?

If I run an MPS server, will it intercept the CUDA traffic? In other words, and I use any ETHASH miner to do so?

yondon · March 8, 2021, 2:58pm

I followed the instructions in the CUDA MPS docs. I started nvidia-cuda-mps-control (which I think should be installed with the Nvidia driver), exported the CUDA_MPS_PIPE_DIRECTORY and CUDA_MPS_LOG_DIRECTORY environment variables, and then started the transcoding benchmarking script and ethminer. The MPS server should intercept the traffic and in theory any ethash Nvidia miner should work, but I’ve only tested with ethminer and trex. One nice thing about ethminer is that it has additional CUDA flags to tweak the workload sent to the GPU - I’ve noticed that other miners only have an intensity flag and I’m not sure how that works under the hood, but it doesn’t seem to offer as granular control over the workload. The CUDA flags are useful when using CUDA MPS because they allow you to prevent mining from completely maxing out GPU utilization which the docs indicate is the scenario where CUDA MPS can actually help with concurrency.

SFCrypto · March 20, 2021, 1:10am

Hi @yondon I am looking to setup a node and have never built a CPU or GPU. Would you say this is a good setup for me to start mining with? I want to get the most bang for my buck and am willing to spend around or less than 3k. Thanks for your help!

yondon · March 20, 2021, 1:48pm

In terms of hardware performance, it’s probably a good start, but I’m not sure about the cost factor (I happened to have access to this hardware for testing). Worth keeping in mind that bandwidth will be important for transcoding on the network as well.

I recommend jumping into the #transcoding channel in the Discord where these topics are discussed.

videoDAC · March 30, 2021, 3:45pm

I have a related question on this… if there are Transcoders on the network who are not also Mining, do you recommend we set up Mining in parallel with the Transcoding?

yondon · April 6, 2021, 7:49pm

I think it is worth trying out and experimenting with it you’re willing to deal with the extra considerations that come with ETH mining since it would generate additional consistent revenue (there is always work to do) that can be good complement to transcoding revenue which will vary depending on the demand side traffic patterns of the network. Anyone thinking about dual mining should consider additional factors including:

Electricity cost since ETH mining will consume a lot more electricity. If your electricity cost is high, ETH mining may not be profitable. A site like https://whattomine.com/ can be helpful for estimations
Managing your GPU for ETH mining i.e. making sure it has proper cooling since ETH mining is known to raise the temperature of GPUs pretty high

These factors are probably things that existing ETH miners are already dealing with, but they may be new to Livepeer video miners.

For anyone that is interesting in trying out and experimenting with dual mining there are some updated guides in the docs.

vires-in-numeris · April 9, 2021, 8:35am

If you’re running a hosted GPU server, also look at the T&C of your provider. Most of them forbid mining so you might get blocked if they discover it.

pushbutton_snowbank · October 24, 2021, 11:46pm

Yep, I can confirm this is the case.

I’ve only tested with a single RTX 3090 with Trex Miner, mining performance seems to dip about 10% and the livepeer transcoding benchmark WITH mining is around 0.3 WITHOUT is around 0.05. I’m waiting on a Quadro RTX a5000 to bench as well.

Topic		Replies	Views
GPU + Transcoding: Questions Research & Protocol Improvements	4	2470	October 22, 2019
Supported GPUs on Livepeer Transcoders	0	1900	November 13, 2020
Community Call - July 18th, 2019 Updates	0	1290	July 19, 2019
Transcoder Campaign: Livepool.io - Public transcoding pool Transcoder Campaign	38	7779	September 21, 2022
Weekly Development Update - August 28th 2019 Updates	0	993	August 29, 2019

Dual Ethash Mining + Transcoding w/ RTX 3080 10G & CUDA MPS

Benchmarks

Observations

Related topics