Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Cost was what brought supersonic down. Comparatively speaking, it may be the cost/benefit curve that will decide the limit of this generation of technology. It seems to me the stuff we are looking at now is massively subsidised by exuberant private investment. The way these things go, there will come a point where investors want to see a return, and that will be a decider on wether the wheels keep spinning in the data centre.

That said, supersonic flight is yet very much a thing in military circles …





Yes, cost is important. Very important.

AI is a bit like railways in the 19th century: once you train the model (= once you put down the track), actually running the inference (= running your trains) is comparatively cheap.

Even if the companies later go bankrupt and investors lose interest, the trained models are still there (= the rails stay in place).

That was reasonably common in the US: some promising company would get British (and German etc) investors to put up money to lay down tracks. Later the American company would go bust, but the rails stayed in America.


I think there is a fundamental difference though. In the 19th century when you had a rail line between two places it pretty much established the only means of transport between those places. Unless there was a river or a canal in place, the alternative was pretty much walking (or maybe a horse and a carriage).

The large language models are not that much better than a single artist / programmer / technical writer (in fact they are significantly worse) working for a couple of hours. Modern tools do indeed increase the productivity of workers to the extent where AI generated content is not worth it in most (all?) industries (unless you are very cheap; but then maybe your workers will organize against you).

If we want to keep the railway analogy, training an AI model in 2025 is like building a railway line in 2025 where there is already a highway, and the highway is already sufficient for the traffic it gets, and won’t require expansion in the foreseeable future.


> The large language models are not that much better than a single artist / programmer / technical writer (in fact they are significantly worse) working for a couple of hours.

That's like saying sitting on the train for an hour isn't better than walking for a day?

> [...] (unless you are very cheap; but then maybe your workers will organize against you).

I don't understand that. Did workers organise against vacuum cleaners? And what do eg new companies care about organised workers, if they don't hire them in the first place?

Dock workers organised against container shipping. They mostly succeeded in old established ports being sidelined in favour of newer, less annoying ports.


> That's like saying sitting on the train for an hour isn't better than walking for a day?

No, that’s not it at all. Hiring a qualified worker for a few hours—or having one on staff is not like walking for a day vs. riding a train. First of all, the train is capable of carrying a ton of cargo which you will never be able to on foot, unless you have some horses or mules with you. So having a train line offers you capabilities that simply didn’t exist before (unless you had a canal or a navigable river that goes to your destination). LLMs offers no new capabilities. The content it generates is precisely the same (except its worse) as the content a qualified worker can give you in a couple of hours.

Another difference is that most content can wait the couple of hours it takes the skilled worker to create it, the products you can deliver via train may spoil if carried on foot (even if carried by a horse). A farmer can go back tending the crops after having dropped the cargo at the station, but will be absent for a couple of days if they need to carry it on foot. etc. etc. None of these is applicable for generated content.

> Did workers organize against vacuum cleaners?

Workers have already organized (and won) against generative AI. https://en.wikipedia.org/wiki/2023_Writers_Guild_of_America_...

> Dock workers organised against container shipping. They mostly succeeded in old established ports being sidelined in favour of newer, less annoying ports.

I think you are talking about the 1971 ILWU strike. https://www.ilwu.org/history/the-ilwu-story/

But this is not true. Dock workers didn’t organized against mechanization and automation of ports, they organized against mass layoffs and dangerous working conditions as ports got more automated. Port companies would use the automation as an excuse to engage in mass layoffs, leaving far too few workers tending far to much cargo over far to many hours. This resulted in fatigued workers making mistakes which often resulted in serious injuries and even deaths. The 2022 US railroad strike was for precisely the same reason.


> Another difference is that most content can wait the couple of hours it takes the skilled worker to create it, [...]

I wouldn't just willy nilly turn my daughter's drawings into cartoons, if I had to bother a trained professional about it.

A few hours of a qualified worker's time takes a couple hundred bucks at minimum. And it takes at least a couple of hours to turn around the task.

Your argument seems a bit like web search being useless, because we have highly trained librarians.

Similar for electronic computers vs human computers.

> I think you are talking about the 1971 ILWU strike. https://www.ilwu.org/history/the-ilwu-story/

No, not really. I have a more global view in mind, eg Felixtowe vs London.

And, yes, you do mechanisation so that you can save on labour. Mass layoffs are just one expression of this (when you don't have enough natural attrition from people quitting).

You seem very keen on the American labour movements? There's another interesting thing to learn from history here: industry will move elsewhere, when labour movements get too annoying. Both to other parts of the country, and to other parts of the world.


My understanding that inference costs are very high also, especially with new "reasoning" models.

Most models can be inferenced-upon with merely borderline-consumer hardware.

Even the fancy models where you need to buy compute (rails) that's about the price of a new car, they have a power draw of ~700W[0] while running inference at 50 tokens/second.

But!

The constraint with current hardware isn't compute, the models are mostly constrained by RAM bandwidth: back of the envelope estimate says that e.g. if Apple took the compute already in their iPhones and reengineered the chips to have 256 GB of RAM and sufficient bandwidth to not be constrained by it, models that size could run locally for a few minutes before hitting thermal limits (because it's a phone), but we're still only talking one-or-two-digit watts.

[0] https://resources.nvidia.com/en-us-gpu-resources/hpc-datashe...

[1] Testing of Mistral Large, a 123-billion parameter model, on a cluster of 8xH200 getting just over 400 tokens/second, so per 700W device one gets 400/8=50 tokens/second: https://www.baseten.co/blog/evaluating-nvidia-h200-gpus-for-...


> e.g. if Apple took the compute already in their iPhones and reengineered the chips to have 256 GB of RAM and sufficient bandwidth to not be constrained by it, models that size could run locally for a few minutes before hitting thermal limits (because it's a phone), but we're still only talking one-or-two-digit watts.

That hardware cost Apple tens of billions to develop and what you're talking about in term of "just the hardware needed" is so far beyond consumer hardware it's funny. Fairly sure most Windows laptops are still sold with 8GB RAM and basically 512MB of VRAM (probably less), practically the same thing for Android phones.

I was thinking of building a local LLM powered search engine but basically nobody outside of a handful of techies would be able to run it + their regular software.


> That hardware cost Apple tens of billions to develop

Despite which, they sell them as consumer devices.

> and what you're talking about in term of "just the hardware needed" is so far beyond consumer hardware it's funny.

Not as big a gap as you might expect. M4 chip (as used in iPads) has "28 billion transistors built using a second-generation 3-nanometer technology" - https://www.apple.com/newsroom/2024/05/apple-introduces-m4-c...

Apple don't sell M4 chips separately, but the general best-guess I've seen seems to be they're in the $120 range as a cost to Apple. Certainly it can't exceed the list price of the cheapest Mac mini with one (US$599).

As bleeding-edge tech, those are expensive transistors, but still 10 of them would have enough transistors for 256 GB of RAM plus all the compute each chip already has. Actual RAM is much cheaper than that.

10x the price of the cheapest Mac Mini is $6k… but you could then save $400 by getting a Mac Studio with 256 GB RAM. The max power consumption (of that desktop computer but with double that, 512 GB RAM) is 270 W, representing an absolute upper bound: if you're doing inference you're probably using a fraction of the compute, because inference is RAM limited not compute limited.

This is also very close to the same price as this phone, which I think is a silly phone, but it's a phone and it exists and it's this price and that's all that matters: https://www.amazon.com/VERTU-IRONFLIP-Unlocked-Smartphone-Fo...

But irregardless, I'd like to emphasise that these chips aren't even trying to be good at LLMs. Not even Apple's Neural Engine is really trying to do that, NPUs (like the Neural Engine) are all focused on what AI looked like it was going to be several years back, not what current models are actually like today. (And given how fast this moves, it's not even clear to me that they were wrong or that they should be optimised for what current models look like today).

> Fairly sure most Windows laptops are still sold with 8GB RAM and basically 512MB of VRAM (probably less), practically the same thing for Android phones.

That sounds exceptionally low even for budget laptops. Only examples I can find are the sub-€300 budget range and refurbished devices.

For phones, there is currently very little market for this in phones, the limit is not because it's an inconceivable challenge. Same deal as thermal imaging cameras in this regard.

> I was thinking of building a local LLM powered search engine but basically nobody outside of a handful of techies would be able to run it + their regular software.

This has been a standard database tool for a while already. Vector databases, RAG, etc.


> This has been a standard database tool for a while already. Vector databases, RAG, etc.

Oh, please show me the consumer version of this. I'll wait. I want to point and click.

Similar story for the consumer devices with cheap unified 256GB of RAM.


Look at computer systems that cost 2000 or less and they are useless at running LLM coding assistants for example locally. A minimal subscription to a cloud service unfortunately beats them, and even more expensive systems that can run larger models, run them too slowly to be productive. Yes you can chat with them and perform tasks slowly on low cost hardware but that is all. If you put local LLMs in your IDE they slow you down or just don't work.

My understanding of train lines in America is that lots of them went to ruin and the extant network is only “just good enough” for freight. Nobody talks about Amtrak or the Southern Belle or anything any more.

Air travel of course taking over is the main reason for all of this but the costs sunk into the rails are lost or ROI curtailed by market force and obsolescence.


Amtrak was founded in 1971. That's about a century removed from the times I'm talking about. Not particularly relevant.

Completely relevant. It’s all that remains of the train tracks today. Grinding out the last drops from those sunk costs, attracting minimal investment to keep it minimally viable.

Grinding out returns from a sunk cost of a century-old investment is pretty impressive all by itself.

Very few people want to invest more: the private sector doesn't want to because they'll never see the return, the governments don't want to because the returns are spread over their great-great-grandchildren's lives and that doesn't get them re-elected in the next n<=5 (because this isn't just a USA problem) years.

Even the German government dragged its feet over rail investment, but they're finally embarrassed enough by the network problems to invest in all the things.


Thanks yes the train tracks analogy does witber somewhat when you consider the significant maintenance costs.

That's simply because capitalists really don't like investments with a 50 year horizon without guarantees. So the infrastructure that needs to be maintained is not.

A valid analogy only if the future training method is the same as today's.

The current training method is the same as 30 years ago, it's the GPUs that changed and made it have practical results. So we're not really that innovative with all this...

Wait why are these companies losing money on every query of inference is cheap.

Because they are charging even less?

Sounds like a money making strategy. Also, given how expensive all this shit is if inference costs _more_? That’s not cheap to me.

But again the original argument was that they can run forever because inference is cheap, not cheap enough if you’re losing money on it.


Even if the current subsidy is 50%, gpt would be cheap for many applications at twice the price. It will determine adaption, but it wouldn’t prevent me having a personal assistant (and I’m not a 1%er, so that’s a big change)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: