All You Need Is 4x 4090 GPUs to Train Your Own Model

gzer0 · 2024-12-28T23:48:21 1735429701

This is a great build, thanks for sharing your learnings.

The best build I have seen so far had 6x4090's. Video: https://www.youtube.com/watch?v=C548PLVwjHA

  Specifications
  - GPU Accelerator - 6 x 24GB NVIDIA GeForce RTX 4090
  - Processor - Intel Xeon W7-3465X, 28C/56T, 2.5GHz - 4.8GHz
  - Memory - 256GB (8x32GB) DDR5 ECC 4800MHz
  - System Drive  - 2TB Samsung 980 PRO NVMe PCIe 4.0 M.2 SSD
  - Storage Drive - 4TB Samsung 870 EVO SSD
  - Operating System - Ubuntu 20.04

An interesting choice to go with 256GB of DDR5 ECC; if spending so much on the 6x4090's, might as well try to hit 1 TB of RAM as well.

The cost of this... not even sure. Astronomical.

manmal · 2024-12-29T00:09:19 1735430959

On Reddit there's reports of 8x4090, or even 8xH100. I don't know where people get this kind of money for this, and why they don't rent infra instead.

FridgeSeal · 2024-12-29T03:23:16 1735442596

Probably because they are after a lot of fast, local storage, and _that_ is where rented ML infra providers will sting you.

Edit: could also just be more-money-than-sense. Never discount stupidity.

dogma1138 · 2024-12-29T16:26:28 1735489588

Hardware can be resold, and bought 2nd hand also.

The 4090 will likely maintain 50% of its current value due to its memory capacity over the next 12-18 months.

CapEx vs OpEx is a thing even if you are not a business…

KuriousCat · 2024-12-29T17:07:36 1735492056

Why do you think RoI is better when infra is rented?

taskforcegemini · 2024-12-30T09:27:35 1735550855

I mean, at some point someone has to buy them to be able to offer services on them to others. Renting comes with certain limitations owners don't have. And some people have too much money to not invest in fun.

belter · 2024-12-29T00:09:23 1735430963

Don't forget to talk to your local power company one year in advance. They will need to upgrade your local substation transformer... :-)

amluto · 2024-12-29T17:37:01 1735493821

This build is 3kVA max. That’s about 1/3 of a current gen EV, only 15% of an original Tesla Model S with dual chargers, and about equal to a standard American oven. This is much more polite to the grid than, say, a couple of tea kettles or especially a reasonably sized electric tankless water heater.

keyle · 2024-12-28T23:19:46 1735427986

This article was written or rewritten via your model right?

The last paragraphs fell totally like AI.

Anyway I'd like a follow up on the curating, cleaning and training part which is far more interesting than how to select hardware which we've been doing for over 25 years.

red2awn · 2024-12-28T23:44:58 1735429498

> Architecture Advantages: Enhanced ray tracing, Shader Execution Reordering, and DLSS 3 technology for improved efficiency.

This jumps right out as written by AI, these features have nothing to do with training LLMs.

sabareesh · 2024-12-28T23:22:10 1735428130

Yes it is , thanks for the feedback. I will soon add it to github

_just7_ · 2024-12-28T23:11:29 1735427489

I would be much more intrested in a piece on what you can train with this kind of rig, rather than the rig itself

minimaxir · 2024-12-28T23:24:04 1735428244

The bottleneck for most model training sizes is VRAM, and since each 4090 has 24 GB VRAM, that's 96 GB VRAM total. The article mentions that it can train LLMs from scratch up to 1 billion hyperparameters, which tracks.

Nowadays that's not a lot: a single H100 that you can now rent has 80 GB VRAM, and doesn't have the technical overhead of handling work across GPUs.

tmostak · 2024-12-29T03:19:09 1735442349

You should be able to train/full-fine-tune (i.e. full weight updates, not LoRA) a much larger model with 96GB of VRAM. I generally have been able to do a full fine-tune (which is equivalent to training a model from scratch) of 34B parameter models at full bf16 using 8XA100 servers (640GB of VRAM) if I enable gradient checkpointing, meaning a 96GB VRAM box should be able to handle models of up to 5B parameters. Of course if you use LoRA, you should be able to go much larger than this, depending on your rank.

sabareesh · 2024-12-28T23:37:21 1735429041

Definitely agree but part of the reason why i built this to learn about all the overhead and gotchas

llm_nerd · 2024-12-29T03:39:59 1735443599

Is there a reason you used hyperparameters rather than parameters? I was going to politely correct the terminology but you seem to be in AI for some time so either it was a mistype or I am misunderstanding what you are referencing.

didgeoridoo · 2024-12-29T04:05:56 1735445156

I imagine that when you get really deep into model training, it can seem like there are a billion hyperparameters you have to worry about.

minimaxir · 2024-12-29T05:50:51 1735451451

It's a force of habit, parameters would be more accurate (almost everyone uses them interchangeably nowadays)

unixpickle · 2024-12-29T06:56:19 1735455379

Wait what? Who actually calls trainable params "hyperparameters"? Nobody at OpenAI does, as far as I know.

minimaxir · 2024-12-29T07:04:50 1735455890

People who are making quick social media posts while taking a casual walk outside on websites that don't make it easy to edit posts and are not expecting to be nitpicked about it.

Overall, it's something I've seen very often on social media and less technical articles about LLMs. OpenAI would fall into the "almost" category.

llm_nerd · 2024-12-29T17:31:13 1735493473

It's okay to say that you mistyped or whatever, while taking a casual walk outside on websites that don't make it easy to edit posts and are not expected to be nitpicked about it. Throwing in that everyone uses them interchangeably, however, is just profoundly wrong on every level.

I wasn't nitpicking. It is a HUGE differentiation, and I pointed it out specifically because people pick up on terminology so people who might not know better will go forward and just drop in the more super duper hyperparameter, not realizing that it makes them look like they don't know what they're talking about. As I said in the other post, no one who knows anything uses them interchangeably. It is just completely wrong.

minimaxir · 2024-12-29T20:42:32 1735504952

Again, I've heard and used the terminology "model hyperparameter" in place of "model parameter", and I've also heard "model parameter" in place of "model hyperparameter" because not every human interaction is a paper on arXiv and the terms are obviously very similar. The context of the term is what matters in the end (as demonstrated by other comments following my correct intent), and society will not crumble if using either term incorrectly in casual conversation. No one intentionally uses the wrong term, but as jokingly said in another comment "when you get really deep into model training, it can seem like there are a billion hyperparameters you have to worry about."

I appreciate being corrected, but you are the one who asked for my opinion based on my extensive time in AI, you can choose to believe it or not.

Bancakes · 2024-12-29T03:09:50 1735441790

I doubt the RAM is added up. I think that’s only a feature reserved for their NVLinked HPC series cards. In fact, without nvlink, I don’t see how you’d connect them together to compute a single task in a performant and efficient way.

minimaxir · 2024-12-29T05:54:28 1735451668

It depends on how the parallelism is implemented, e.g. distributed data parallel (DDP) to synchronize gradients: https://pytorch.org/tutorials/intermediate/ddp_tutorial.html

It's a rabbit hole I stay away from for pragmatic reasons.

whimsicalism · 2024-12-29T17:12:07 1735492327

yeah essentially this

sabareesh · 2024-12-28T23:25:06 1735428306

Here is some additional journey apart from the rig. https://sabareesh.com/posts/llm-intro/

layer8 · 2024-12-29T00:19:27 1735431567

How long does training a 1B or 500M model take approximately on the 4-GPU setup? Or does that dramatically depend on the training data? I didn’t see that info on your pages.

sabareesh · 2024-12-29T01:22:15 1735435335

Roughly it takes 7 days to train on 100B tokens on 500M model

paxys · 2024-12-29T03:28:37 1735442917

And where you get the training data from.

sabareesh · 2024-12-29T03:30:22 1735443022

Start with FineWebEdu

sabareesh · 2024-12-28T22:37:36 1735425456

Hey HN I am sharing my experience on how i pretrained my own LLM by building a ML rig at home

senectus1 · 2024-12-28T23:25:41 1735428341

this is a decent birds eye view thanks, could you expand on this to show how long it took to produce... what model you produced? What did you produce? what did you train for.. the posts seems to suggest its for diffusion purposes?

sabareesh · 2024-12-28T23:33:42 1735428822

Here is some LLM models not diffusion though. https://huggingface.co/sabareesh88/fw14k this post has additional details https://sabareesh.com/posts/llm-intro/

magicalhippo · 2024-12-29T05:13:29 1735449209

On a tangent, if I wished to fine-tune one of those medium sized models like Gemma2 9B or Llama 3.2 Vision 11B, what kind of hardware would I need and how would I go about it?

I see a lot of guides but most focus on getting the toolchain up and running, and not much talk about what kind of dataset do I need to do a good fine tuning.

Any pointers appreciated.

pilooch · 2024-12-29T07:08:18 1735456098

I do this for many application. 2 to 4 RTXA5000 do the job (Lora finetune). As for dataset, depending on your task, you need image / text pairs.

magicalhippo · 2024-12-29T07:21:10 1735456870

> As for dataset, depending on your task, you need image / text pairs.

I guess the main question is, do you just prepare training data as if you were training from scratch, or is there some particularities to finetuning that should be considered?

MuffinFlavored · 2024-12-29T19:41:40 1735501300

What would you expect from fine tuning? What would the input training material be, and what would the expected differences in output be?

magicalhippo · 2024-12-30T00:47:40 1735519660

In several cases I've been wanting better prompt adherence.

Llama 3.2 Vision is very strictly trained to output a summary at the end which I find difficult to get it stop doing for example.

Another one is that when given a math problem and asked to generate some code that computes the result, most models outputs code fine but insists on doing calculations themselves even if the prompt explicitly say they shouldn't. As expected, sometimes these intermediate calculations are incorrect and hence I don't want the LLM to do that when the produced code would handle it perfectly. If the input prompt contains "four times five" I want the model to generate "4 * 5" rather than "20", consistently.

I've been curious to see if I could tune them to adhere better to the kind of prompts I would be giving.

For LLama 3.2 Vision I've also been curios if I can get it to focus on different details when asked to describe certain images. In many cases it is great but sometimes misses some key aspects.

As for the input training material, that's what I'm trying to figure out what I need. I feel a lot of the guides are like that "how to draw an owl" meme[1], leaving out some crucial aspects of the whole process. Obviously I need input prompts and expected answers, but how many, how much variation on each example, and do I need to include data it was already trained on to avoid overfitting or something like that? None of the guides I've found so far touch on these aspects.

[1]: https://knowyourmeme.com/memes/how-to-draw-an-owl

rldjbpin · 2024-12-29T12:04:14 1735473854

nice writeup, but i feel that for most people, the software side of training models should be more interesting and accessible.

for one, "full" gpu utilization, one or many, remains an open topic in training workflows. spending efforts towards that, while renting from cloud, is a more accessible and fruitful to me than to finetune for marginal improvements.

this course was a nice source of inspiration - https://efficientml.ai/ - and i highly recommend looking into this to see what to do next with whatever hardware you have to work with.

KeplerBoy · 2024-12-29T00:17:22 1735431442

Let's talk riser cables. I keep encountering issues with riser connectors claiming to support PCIe 4.0, which seem to have sub-par performance. They work fine with the GPUs and NICs I tested them with, but attaching a nvme drive causes all kinds of issues and prevents the machine from booting. I guess nvme isn't as tolerant of elevated bit-error-rates.

That just doesn't inspire a lot of confidence in those risers, so now I'm contemplating mcio risers.

Neywiny · 2024-12-29T01:31:17 1735435877

NVMe sits over PCIe. I'd be more inclined to believe they're playing games with their voltage levels to lower power consumption on mobile/embedded (not based on anything but I wouldn't be surprised). Or, if you're then going to an m.2 adapter, something with that.

sabareesh · 2024-12-29T03:29:03 1735442943

I ran several nccl test had no issue with bandwidth. https://github.com/NVIDIA/nccl-tests?tab=readme-ov-file

xena · 2024-12-29T04:25:33 1735446333

I'd love to read something you wrote, not something you had an AI model write for you.

abc-1 · 2024-12-28T23:25:59 1735428359

Fun for a wealthy hobbyist, but if you want to do real work, you’re better off renting from Runpod. Good blog though.

sabareesh · 2024-12-28T23:30:13 1735428613

One of the motivation is to do several distillation, experimentation, research. But as you mentioned there are better ways to do this

bb88 · 2024-12-28T23:19:00 1735427940

All you need is a 4x 4090 GPUs and a dedicated 30 amp circuit.

andrewmcwatters · 2024-12-28T23:56:42 1735430202

Why are people downvoting this? Yes, you really do need a dedicated circuit to run this type of machine. You will trip your circuit breaker if you don't have sufficient wattage on the line to run something rated for this power draw.

Commercial setups are not appropriate for typical 15 amp circuit loads.

andrewmcwatters · 2024-12-28T23:59:07 1735430347

Further, If you can afford to build this, you can afford to purchase at least the Romex, an AFCI circuit breaker, raceway, and run it into whatever room in the house you plan on operating this in.

fzzzy · 2024-12-28T23:25:57 1735428357

You sure? In my experiments with multi gpu inference, I couldn't get anywhere close to max theoretical power draw.

bb88 · 2024-12-28T23:53:30 1735430010

Yes!

His power supplies are 2x1500 Watt. That puts it at 3KW max which is more than a 20A circuit can provide (2400W).

The standard outlet is typically rated at 15 amps or 1800W. And the 15A breaker is on one circuit. You can get 20A circuits but they need to be wired for it, and replacing the breaker won't cut it.

Assuming his GPU is ~450W (his number) and power supplies are 80% efficient, well that means he's pulling close to ~2400 watts which is super close to the limit of a 20A circuit.

4 * 450 / 0.80 efficiency = 2250W.

That doesn't include the power consumed by the CPU or mother board or other things on that circuit. But a 170W CPU would easily push this over 2400W provided by a 20A circuit.

leoc · 2024-12-29T00:11:47 1735431107

In the US. The UK or EU will do you 3000W out of a standard domestic socket.

taneq · 2024-12-29T06:40:42 1735454442

2.4kW yeah, 3kW technically needs a 15A socket.

bb88 · 2024-12-29T00:22:18 1735431738

In the UK they get that by doubling the voltage. The current draw will still be similar to the US.

It's over current that causes fires.

bpye · 2024-12-29T06:08:38 1735452518

The _power_ draw will be similar, but a 13A 230V outlet can do 2990W, vs a 15A 110V outlet at 1650W.

bb88 · 2024-12-29T08:05:10 1735459510

You proved my argument! Lol.

sabareesh · 2024-12-28T23:34:30 1735428870

Well during training all GPUs were consuming max ~ 450W

fzzzy · 2024-12-28T23:42:57 1735429377

Thanks, good to know. Perhaps it is different for diffusion; with llms, layers are generally split across gpus, meaning inference has to happen on one gpu before the values can be passed between the layer split.

Y_Y · 2024-12-28T23:52:30 1735429950

That's only if your model is too big for a single GPU and you're not batching.

fzzzy · 2024-12-29T00:14:21 1735431261

Yes, that's what I was doing. Thanks for the info.

halyconWays · 2024-12-29T00:08:53 1735430933

Why not 3090s? Same VRAM and cheaper. With both setups you'd be limited to 1B. By contrast, you can run 4-bit quants of Llama 70B on two {3,4}090s, and it's still pretty lobotomized by modern standards.

You can also train your own model even without GPUs. Just depends on parameter size.

sabareesh · 2024-12-29T01:16:25 1735434985

It is previous architecture and it doesnt support newer version of Flash Attention , fp8 training etc

jszymborski · 2024-12-29T03:33:24 1735443204

It is, however, like 3x cheaper.

halyconWays · 2024-12-30T00:51:44 1735519904

That's fair. I did run into that issue when trying to speed up Hunyuan

anonytrary · 2024-12-29T00:13:17 1735431197

Thanks for sharing. Have you prodded the model with various inputs and written an article that show various output examples? I'd love to get an idea of what sort of "end product" 4x4090s is capable of producing.

sabareesh · 2024-12-29T01:20:52 1735435252

You might find more information here helpful https://sabareesh.com/posts/llm-intro/ But i am still in process of evaluating post training process with RL. RLHF is almost a mirage that shows what is possible but not the full capability of what model can do

NKosmatos · 2024-12-29T00:12:49 1735431169

Wouldn’t a cluster of M4 minis cost less and provide more VRAM? There are posts about people getting decent performance for a lot less than 12k USD.

lostmsu · 2024-12-29T06:13:49 1735452829

If you want to wait for over a year to get your model trained (vs 7 days).

angoragoats · 2024-12-29T03:11:22 1735441882

If you are willing and able to put together the type of system described in the OP (a workstation-class PC, with multiple discrete GPUs and often multiple power supplies), a Mac never makes sense. There are hardware options available at essentially every price point that beat (in some cases drastically) the performance and memory capacity of a Mac.

And I say this at the risk of being called pedantic, but a cluster of Mac minis would have zero VRAM.

sabareesh · 2024-12-29T01:15:41 1735434941

You get more vram but not enough cores

whimsicalism · 2024-12-29T00:19:01 1735431541

no, these chips are optimized for inference not training & frankly cuda is still table stakes.

HN loves it some Apple

lostmsu · 2024-12-29T06:13:09 1735452789

They are not optimized for inference vs RTX GPUs.

jmward01 · 2024-12-29T00:40:44 1735432844

You can get 4060 ti 16GB cards for ~$450 or 4070 ti 16gb for ~850 instead of the $2.5k for a 4090. I wonder how well 4 of those cards would perform. The 4060 TDP is 165w instead of 450w for the 4090. The 4070 looks like the best tradeoff though for cost/power/etc though. You could probably set up an 8 card 4070 ti 16gb system for less than the 4 card 4090 system

magicalhippo · 2024-12-29T05:04:23 1735448663

The 4060 Ti is hampered by having a narrow memory bus, there's various benchmarks out there, here[1][2] are some examples, and here's[3] one which tests dual 4060 Ti's.

[1]: https://www.pugetsystems.com/labs/articles/llm-inference-con... (8GB model tested but it has same bus width and overall bandwidth as 16GB model)

[2]: https://www.reddit.com/r/LocalLLaMA/comments/1b5uwr4/some_gr...

[3]: https://www.reddit.com/r/LocalLLaMA/comments/178gkr0/perform...

wruza · 2024-12-29T03:28:31 1735442911

I’ve heard that people buy multiple 24GB P40’s for a bucket of dirt. But that was for inference, not sure about training.

g Tesla p40 llm reddit

sabareesh · 2024-12-29T01:23:20 1735435400

I was eyeing 4060 before going with 4090. But it boils down to cuda cores and memory bandwidth

jmward01 · 2024-12-29T02:11:56 1735438316

The 4090 computer per watt is the best (on paper) between the 4060 ti, 4070 ti and 4090. Best bang for $$ though looks like the 4070ti 16GB. I've been eying that one for a new dual card training rig.

AnarchismIsCool · 2024-12-29T03:40:18 1735443618

Couldn't you do better with 2x AGX Orin 64gb?

jsheard · 2024-12-28T23:20:18 1735428018

It's probably better to hold out for the 5090 at this point, it's coming very soon as is expected to have 32GB of VRAM.

paxys · 2024-12-29T03:20:29 1735442429

Coming soon maybe, but when will you actually be able to get your hands on one?

sabareesh · 2024-12-28T23:23:28 1735428208

Yeah depends on the price, definitely 24GB is limiting

Bancakes · 2024-12-29T03:11:03 1735441863

Anyone care to publish AMD training/inference benchmarks using ROCm? They’re hard to find.

sabareesh · 2024-12-29T03:27:00 1735442820

At this point it is still not worth considering AMD but may me this will change soon. I would look into semianalysis report

nitred · 2024-12-31T12:34:05 1735648445

Can someone definitively say for sure that I can just use two independent PSUs? One for GPUs and one for GPUs and motherboard and SATA? No additional hardware?

mcdeltat · 2024-12-29T14:14:49 1735481689

Is anyone else concerned with the power usage of recent AI? Computational efficiency doesn't seem to be a strong point... And for what benefit? IMO the usefulness payoff is too low

JacksonDam · 2024-12-28T23:26:14 1735428374

Interesting that DLSS 3 is mentioned as an advantage?

Retr0id · 2024-12-28T23:47:27 1735429647

Because the article was clearly co-authored by AI

sabareesh · 2024-12-29T01:18:11 1735435091

It is co-authored by AI but I left it because it made some indirect sense. I clarified on the parent comment

sabareesh · 2024-12-29T01:17:30 1735435050

I clarified bit more on the article regarding this. But basically "Well this may not directly provide benefit but because this is a consumer grade card these features enabled having support for more advanced features such as bfloat16 and event float8 training support also the sheer number of cuda cores."

486sx33 · 2024-12-29T01:23:42 1735435422

I’d love to hear the dev story of H100 , it seemed to come out of left field !

paxys · 2024-12-29T03:27:37 1735442857

Where exactly do you plug in this beast?

m463 · 2024-12-29T13:20:56 1735478456

"This needs 30 AMP circuit..." lol

master_crab · 2024-12-28T23:15:49 1735427749

All you need is 4x 4090 GPUs to Train Your Own Model -- and $12000 to buy them

kristopolous · 2024-12-29T00:00:31 1735430431

The GPU rental market is fairly reasonable. There's lots of companies doing it. (I work at one of them). 4x 4090 can be fetched for around $0.40/hour on some platforms ... about $1.20 on others depending on how available you want it. Regardless, all in, you can do an average 10-or-so-day train for < $500.

If you want on-prem, wait a few months. The supply of 5000 series (probably announced at CES in a few days) should push more 4000 on the market and, maybe, for a bit, over-supply and push the price down.

Nvidia stopped manufacturing the 4000 a few months ago because they don't have endless factories. Those resources were reallocated to 5000 series and thus pushed the price for the 4000 up to the ridiculous place it is now (about $2,000 on ebay)

I think the current appetite for crypto and ai is big enough to consume all 4000 and 5000 series cards to a point of scarcity (even 3090s are still fetching about $1000) but there should be a window where things aren't crazy expensive coming up.

There's no evidence supply will continually outstrip demand unless something unusual happens.

whimsicalism · 2024-12-29T00:10:53 1735431053

don't you need nvlink? feel like an 80gb a100 would start being worth it at a $1.20/4x 4090 price point

kristopolous · 2024-12-29T00:32:39 1735432359

Some suppliers have support for it, some don't. They either use docker or kvm and it depends on how clever their hosting software is. We can do it, but that's a recent thing. it's really hit or miss

whimsicalism · 2024-12-29T17:10:59 1735492259

? sorry i really don't understand this reply... some suppliers have support for nvlink on 4090? i doubt that

yieldcrv · 2024-12-29T00:40:43 1735432843

How soon could I break even on renting my GPUs out?

lostmsu · 2024-12-29T06:16:37 1735452997

If you are on Windows, take a look at https://borg.games/setup (founder here)

We aim at $1200/y for 3090, so around a year given descent electricity prices.

Highly recommend setting a lower power limit (usually 250W for 3090).

kristopolous · 2024-12-29T22:53:32 1735512812

Btw, for other people reading this, the main player in the "rentable gamer gpu" space is salad.com who 6 months ago cut a deal with civitai (https://blog.salad.com/civitai-salad/). They're trying to capture enterprise customers to use the extra cycles on teenager's gaming rigs.

The industry is full of effectively "imitation companies" right now. For instance, runpod, quickpod, simplepod and clore are the ones cloning us at vast right now.

We see them in our discord, they try to snipe away customers, get in our comment threads on reddit and twitter with self-promotes, clone our features ... this is the ferocious wild west days of this industry. I've even gotten personal emails from a few who I guess scanned their database looking for registration addresses from other companies in the space.

There's even companies like primeintellect which are trying to become the market of markets - but they have their own program - it's clearly a play to snipe other customers by funneling them through some interface where they'll eventually push out the other companies and promote their own instances.

Then there's interesting insider hype players with their own infra like sfcompute who are trying to pretend like they invented interruptible instances and somehow get a bunch of people treating them like they're innovators. The resellable contracts they talk about are a pretty common feature and especially from the host's programmatic command line controller, it's just usually tucked deep in the documentation. They're doing effectively a re-prioritization play.

I guess my angle is "highest integrity possible". It's certainly a gamble - scammy companies sometimes capture a market then become unscammy - I'll hold my tongue but there's plenty of examples.

It's interesting times.

lostmsu · 2025-01-01T19:07:45 1735758465

Wow, I question the ethical side of this comment. It starts praising a company as if it were an unrelated entity, then quietly switches to "us", then makes implications about competing enterpreneural efforts being scams without any evidence. And "clones" (as if everyone knew about them - I didn't until about 1y into mine for instance).

There's also the hypocrisy of complaining about competitors jumping in on "their threads" in a comment on a competitor thread.

Yes, this comment of yours is highly unethical.

yieldcrv · 2024-12-30T10:48:36 1735555716

yo, I don’t care bro

I guess what I’m missing is, what’s scammy about them?

even in the web3 space, AI gpu compute markets are oversaturated

but why is an end user supposed to case about the user acquisition strategy?

if they’re cheaper, more profitable for the gpu owner, or solving a need better, that’s all that matters

kristopolous · 2024-12-30T17:17:09 1735579029

> what’s scammy about them?

You can multi sell a machine, use qemu to lie about the hardware, have hidden fees... there's a bunch of hustle

> AI gpu compute markets are oversaturated

This is not the case. We see a moving average of over 90% utilization of our network. There's a lot of players, but the demand is outstripping supply

> why is an end user supposed to case about the user acquisition strategy?

Well hn is founder/insider talk but for a more direct answer, more legit institutions get higher retention and easier customers.

We're a two sided marketplace so we need to create a platform where people see integrity.

kristopolous · 2024-12-29T02:41:20 1735440080

is your electricity free? Some of these cards probably cost about $0.10/hr to run ... depending on your card/electricity rate etc.

It's probably somewhere between 12months-never depending on how the market shakes out. Maybe 2 years is a good idea ... really, if power is cheap/free and the machine is on and idle then it's free money - that's the way to look at it.

yieldcrv · 2024-12-29T02:55:56 1735440956

My electricity is not free, I would be satisfied with partially subsidizing these units too though

kristopolous · 2024-12-29T04:04:42 1735445082

Well ok, I guess I'll plug my employer's site for setting up:

https://cloud.vast.ai/host/setup

There's a lot of competition in the "airbnb gpu" so if you don't like us, the number is around 12 or so globally. We're probably either #2 or #3. Companies don't really disclose these things so it's hard to know.

Some people probably list on more than one platform. There may be some host management software somewhere that helps with that. I haven't actually checked.

I'd be happy to talk more about these privately. Some are better than others and I've got no interest posting less than charitable things about our competitors publicly, regardless of how accurate I think it is. My email is in my profile.

echelon · 2024-12-28T23:37:23 1735429043

You can get a used A100 for that cost and have better software support for training.

4090s are too small for training and you'll have to write your own suboptimal batching.

Unless you value the learning, it'd be better to rent GPUs in the cloud for training.

sabareesh · 2024-12-28T23:40:38 1735429238

Yup my initial reason behind is to learn all the quirks

echelon · 2024-12-28T23:45:20 1735429520

Consumer cards are a very different ecosystem, and you'll hit different use cases and challenges.

This might pull you down a path towards distilling and quantizing models, for instance.

sabareesh · 2024-12-28T23:26:24 1735428384

I was contemplating between building rig vs using the cloud but for some reason I want to get hands on. So you can always rent them for a fraction of cost

bfung · 2024-12-28T23:37:11 1735429031

Also (at least in Southern California) electricity prices and how long the rig is on. Not as bad as the initial build cost, but run costs will add up over time.

sabareesh · 2024-12-28T23:45:13 1735429513

That is real concern especially 4090 is not power efficient , as a100 and h100, h200. I live in Reno so it was ok

KeplerBoy · 2024-12-29T00:06:49 1735430809

You can always reduce the clock and voltage to hit better Flops/Joule.

yieldcrv · 2024-12-29T00:40:02 1735432802

Thats way less than the 6 or 7 figure sums from a year ago

I’m glad to know

andrewmcwatters · 2024-12-29T00:00:42 1735430442

The last time I checked, a modern Threadripper build is a bit over $10,000. So if you have the budget for that but need something GPU-oriented instead, then I could see that being a reasonable option.

KeplerBoy · 2024-12-29T00:08:59 1735430939

The thing is you need a threadripper-class build to make use of 4 GPUs in the first place. Ordinary PCs don't have the PCIe lanes necessary for that.

But pricing is okay-ish, have a look at Geohot's Tinybox for turnkey solutions.

andrewmcwatters · 2024-12-29T00:36:36 1735432596

Ah, of course. I forgot about PCI-e lane requirements. Yeah, you're not going to casually find 8-slot (12?) PCIe x16 motherboard configurations.

Dylan16807 · 2024-12-29T05:13:11 1735449191

How much PCIe bandwidth do you need to avoid it being the bottleneck?

KeplerBoy · 2024-12-29T11:12:22 1735470742

Depends on the Application. In Bitcoin farming it famously was not an issue at all, manufacturers came up with the weirdest motherboards featuring many x1 pcie slots. Look up the Biostar TB360-BTC PRO 2.0 if you want to see a curiosity.

In Deep Learning it depends on your sharding strategy.

mcphage · 2024-12-28T23:31:05 1735428665

Well, one or the other, at any rate :-)

patagonianboy · 2024-12-29T00:13:49 1735431229

Yeah, it's powerful, but can it run crysis?