AI Alpha subnet GPU / hardware poll

Strykar · December 21, 2023, 12:19pm

The objective of this thread is to have an open discussion on choosing the optimal Nvidia GPUs for Orchestrator’s participating in the AI Alpha Subnet. A top 5 or 10 ten list if you will.

This chart summarizes the GPUs and their power consumption:

Cards benched (512x512, 50 steps, SD v1.5 webui):

1080 Ti 11 GB (Not buying this GPU at release will haunt me in my dreams)
3060 Ti 12 GB (Cheapish low-end)
3070 Ti 8 GB (way worse than I expected)
2080 Ti 12 GB (Cheapest low-end, especially used)
3080 12 GB (Prices have dropped since 4xxx release)
3090 24 GB (Best)

I found the performance for image generation similar to Tom’s Hardware.
The absolute best GPU appears to be the US $2200 24 GB RTX 4090 (Top GPU in 2023 / 2024 or until the 4090 Ti / Titan release).

It should smoke even the A100 assuming inference workloads that fit within 24 GB.
It will max out just fine on PCIe Gen 3.0 x16 (16 GT/s) slots with only a 5 Δ% performance drop compared to PCIe Gen 4.0 (32 GT/s) and a negligible Δ% vs PCie Gen 5. Useful for selecting the motherboard, meaning even older HEDT chipsets like X99 (40x PCIe lanes) will work.
Driving more than two 4090s in a system will draw significant power and three might be the limit before liquid cooling is required.
4 - 6x 4090s can be driven in custom desktop and 2U rackmount chassis (dual-socket Epyc 1+TB RDIMM) which ship with 4x 2400W redundant power supplies.
Workloads are sensitive to VRAM and I assume are using only FP16, please correct me if I am wrong.

This is not a post that covers benchmarks in detail, I found the results similar to Tom’s Hardware.

Also see GPU Benchmarks for Deep Learning | Lambda

Seems dual any work is out, and fractional workloads aren’t supported on the consumer GPUs.
We need a benchmark tool, that internally categorises these GPUs by capability (Generation?) first, then performance.

Any Turing GPU can be a starting point (with patience), but dual air-cooled 3090 / 4090 rigs should be the sweet spot?

Practical low end, pool entry-level: 2080 Ti 11 GB < 3060 Ti 12GB
Orchestrator grade: 3080 12 GB < 4080 16 GB < 3090 24 GB < 4090 24 GB

Which GPUs would you deploy?

RTX 2080 Ti 11 GB
RTX 3060 Ti 12 GB
RTX 3080 12 GB
RTX 3080 Ti 12 GB
RTX 3090 24 GB
RTX 3090 Ti 24 GB
RTX 4070 Ti 12 GB
RTX 4080 16 GB
RTX 4090 24 GB
Other

0 voters

stronk · December 21, 2023, 12:51pm

For me it would depend, the capex on getting 3090’s, 3090Ti’s or 4090’s is a tad high, so it would really depend on how quickly that investment pays off. So for now 4080’s seem like the best value option to me even though it has a bit less VRAM

stronk · December 29, 2023, 1:12pm

What about a card like the A6000 or A6000 ADA ? They would have much more VRAM available, but are also slower and a lot more expensive. Would there be any value in such a card or would multiple 4090’s simply be a more efficient choice?

EDIT:
I’ve asked about this during the last treasury chat, I’ll dump the convo:

Marco | captain-stronk.eth — 2024/01/03, 18:26:08
@Lazydayz137 is there an gpu you would consider ‘optimal’?

Lazydayz137 — 2024/01/03, 18:26:38
24gb ram…3090 Best Buy if not sending to data center and you can run consumer.

Marco | captain-stronk.eth — 2024/01/03, 18:28:07
is there a merit to looking into cards like A6000? or better to get multiple 4090 or 3090’s ?

Lazydayz137 — 2024/01/03, 18:53:25
If in data center want a500 or 6000

Lazydayz137 — 2024/01/03, 18:53:59
3090 best bang for buck and that 25 holds 7b models to train at like 16gp or something that makes it magic

Lazydayz137 — 2024/01/03, 18:54:21
But I’m trying to spread o we and see what workloads can be distributed amongst gpu’s like already on network

Wiser · January 16, 2024, 6:34pm

Maybe the 4070 ti Super with 16 giga VRAM could be a nice option for the beginning ?

I’m also wondering about the importance of the CPU and RAM in this type of configuration.

Topic		Replies	Views
ComfyStream / Real Time AI Demo Rig Spec AI Research	2	101	February 14, 2025
Transcoder Campaign: Fluf's (noworries) Transcoder Campaign	3	580	November 12, 2021
A benchmarking script for AI video subnet jobs Research & Protocol Improvements	18	1025	March 12, 2024
GPU + Transcoding: Questions Research & Protocol Improvements	4	2464	October 22, 2019
Transcoder Campaign: LPTNode_JE Transcoder Campaign	10	1446	August 10, 2022

AI Alpha subnet GPU / hardware poll

Related topics