Hacker Newsnew | past | comments | ask | show | jobs | submit | libraryofbabel's commentslogin

> Having been born in Königsberg in 1724, he never left the small German city, dying there in 1804 aged 79 never having once gone further than the city’s limits.

Totally false! Slander! He once went as far as the village of Jarnołtowo, a whole 60 miles from Königsberg![0]

But yeah. Maybe not one of history’s greatest travelers.

[0] https://en.wikipedia.org/wiki/Jarno%C5%82towo



He made plans for a trip to Paris to read a specific book, but the librarians he had been corresponding with mailed it to him as a sign of respect for his time.

All of these things can be simultaneously true (and I would say, are true):

1) We are in a huge investment bubble right now and it's going to burst.

2) LLMs are extremely useful right now for certain niche tasks, especially software engineering.

3) LLMs have the potential to transform our world long-term (~10 yr horizon), on the order of the transformations wrought by the internet and mobile.

4) LLM's don't lead directly to AGI (no continuous learning), and we're not getting AGI any time soon.

This is an extremely obvious point, but bears repeating. I feel the assumption of an implicit link (in both truth or falsehood) between these fairly independent assertions can cause people to talk past each other about the really important questions in play here.

Regarding The Great Bubble, I am very very bearish about OpenAI in particular. They've had a good run for three years with consumer mindshare due to their first-mover advantage, but they have no moat, trouble monetizing most of their users, not much luck building out products that stick among consumers that aren't chatbots, and their models are no better than Anthropic's, Google's, or even the best Chinese open weight models 6 months later.

My bet would be on Google and Apple together (with Gemini powering Siri, for now) destroying OpenAI in the consumer AI market over the next 2-3 years. Google has first-rate models... but more than that, both Google and Apple have the enormous advantage of owning underlying platforms that they can use to put their own AI chat in front of consumers. Google has a mobile OS, the leading browser, and search. Apple has the premium hardware and the other, premium, mobile OS. They also have the advantage of the current regulatory climate being less antitrust than it was. And they don't have to monetize their AI offerings (no ads in gemini; ChatGPT is adding them) and can run them at a loss for as long as it takes to eat up OpenAI's market share. If they partner up, as they seem to be doing, OpenAI should very very afraid.


>2) LLMs are extremely useful right now for certain niche tasks, especially software engineering.

Don't get lost in the tech scene sauce, programming is a small sliver of what people are using LLMs for. OpenAI's report in September pegged it at ~4% of tokens being for software generation. Sure Anthropic is probably 80% or something, but only a small sliver of LLM users are using Anthropic. The reality is probably even less if you count Google's AI overviews. We hate it, but I have never seen a regular person skip over it.

The question is if regular people will pay cell phone level subscription costs ($70-$100/mo) for LLMs. If so, then we are probably not in a bubble, and the ROI will have a 5-10 yr horizon, which is totally tenable.

500,000,000 people paying $75/mo is $450B/yr. Inference is cheap too, it's training that is ludicrously expensive. Don't be fooled by the introductory pricing we have today either, that's just to get you dependent.

And yeah, chinese models, but look at what they did to tiktok. No way they are going to let the Chinese government be peoples confidants and no way is more than 0.01% of people gonna home lab.


> And yeah, chinese models, but look at what they did to tiktok. No way they are going to let the Chinese government be peoples confidants

You're playing wack-a-mole with a paper mallet against a tank brigade. It's not only China, the EU is going to compete too in the "5-10 yr horizon".

The Chinese manage to compete while deprived of top semiconductor gear, which happens to be made only in the EU.

The Chinese models don't have to stay in China, they can be fine-tuned and used by many other countries, named differently, etc. Even if you try to block them all, others won't which would put the US in a sever competitive disadvantage, inflated and isolated, the dollar will be in the sewer, who cares how much openAI would make.


You’re right to call out consumer use is what’s eating all the tokens. I suppose what was behind my way of putting it is: I haven’t seen much in the way of truly transformative products with LLMs in the consumer space. Sure, there’s a few power users doing some cool things, and lots of promises along the lines of “AI will plan and book you a whole vacation!”, but basically for the median consumer we have an “improved search”, and “fun image generation”, with people using it a couple times a week. So what is the product that changes the world and makes 500M people pay $100 a month? I don’t think it’s really here yet. This feels a bit like we’re in 1996 where people are still trying to figure out what the internet is for, and we don’t know if OpenAI will be 1996’s Amazon, or its AOL.

None of this is to deny how remarkable the underling LLM tech is. I never expected to see something like this in my lifetime; it feels far far more strange and new than when the iPhone came along. I use a coding agent daily and it’s dramatically changed how I work. But I still think we’re in a bubble here.


Ironically the MIT study that was touted for weeks, the one that found corporate AI pilots were almost all failures, also found that virtually every single person was using LLMs almost daily for their work.

The finding of that study was that people are using their personal AI accounts rather than corporate integrated accounts. Hence the pilots failing.

But LLMs are definitely being used to get work done. Hell Jerome Powell just said live on air that he uses it for it's productivity boost.


Do you use cursor personally? the product is so good, i don't know why people aren't looking at the "companies will complain and pony up $20k a seat" isn't seen as a possibility here

> 500,000,000 people paying $75/mo is $450B/yr.

Majority of people use ChatGPT for free, that's why they are introducing ads. Normal people will not pay 70-100$ per month for LLM subscription. Your numbers are way off.


I'm sure if cell phones were free, the majority would be free users too.

But beside that, OpenAI is pricing ads at $60/1000 views. That is 3x what Meta charges, which is around $20/1000. Meta pulls about $20/user/month in ad revenue. Triple that and we land at....$60/mo.


You forgot to mention the case where the US economy craters because of political missteps leading to Google and Apple being distracted leaving it open for Chinese or other models to surge ahead.

In fact it peaked in size in 1920. But you’re broadly correct about the sense of decline post-WWI and the sense that America would be the dominant 20th-century power.

The UK itself lost one third of its land area in 1921, following yet another insurrection in Ireland (always a coerced part of the UK subject to genocidal rule and expropriation).

The last straw for the Irish was summary executions of the insurrectionists and the ravages of the Black and Tans -- like ICE but with arsonists and criminals, but without masks. The US will be fortunate if the parallels remain only financial.


It’s funny, from his books I always imagined he spent years up there in the Yukon. But it turns out he trekked in (by far the hardest experience of the Yukon Gold Rush rush was just getting there with the mandated 1 year of food and all your mining equipment), staked an unprofitable claim, talked to a lot of people in bars in Dawson City, spent an uncomfortable winter in a small cabin with some other gold-rushers eating just bread and beans and bacon, got scurvy from eating just bread and beans and bacon, and then got the hell out and went back to San Francisco.

I say this not to minimize the depth or the hardship of his experience (it sounds like a nightmare) but more in amazement at all the compressed experiences he had and the folder for stories he amassed during that one year. Certain years in life flash by (or they seem that way to me) and others are formative and seem to last forever. Clearly this was the latter for him.


Makes me wonder if I should write about Afghanistan.

You probably should. Try it and see if you like doing it.

Maybe publish a chapter online and ask feedback and encouragement (since there are fewer magazines now)?

I would be interested to hear it


Exactly. As distributed systems legend Leslie Lamport puts it: “Writing is nature’s way of letting you know how sloppy your thinking is.” (He added: “Mathematics is nature’s way of letting you know how sloppy your writing is.”)

I still have a lot of my best ideas in the shower, no paper and pen, no LLM to talk to. But writing them down is the only way to iron out all the ambiguity and sort out what’s really going to work and what isn’t. LLMs are a step up from that because they give you a ready-made critical audience for your writing that can challenge your assumptions and call out gaps and fuzziness (although as I said in my other comment, make sure you tell them to be critical!)

Thinking is great. I love it. And there are advantages to not involving LLMs too early in your process. But it’s just a first step and you need to write your ideas down and submit them to external scrutiny. Best of all for that is another person who you trust to give you a careful and honest reading, but those people are busy and hard to find. LLMs are a reasonable substitute.


I agree with this. It is an extremely powerful tool when used judiciously. I have always learned and sharpened my ideas best through critical dialog with others. (After two and a half thousand years it may be that we still don't have a better way of teaching than the one Socrates advocated.) But human attention is a scarce resource; even in my job, where I can reasonably ping people for a quick chat or a whiteboard session or fire off some slack messages, I don't want to do that too often. People are busy and you need to pick the right moment and make sure you're getting the most value from their precious time.

No such restriction on LLMs: Opus is available to talk to me day or night and I don't feel bad about sending it half-baked ideas (or about ghosting it half way through the discussion). And LLMs read with an attention to detail that almost no human has the time for; I can't think of anyone who has engaged with my writing quite this closely, with the one exception of my PhD advisor.

LLMs conversations are particularly good for topics outside work where I don't have an easily-available conversational partner at all. Areas of math I want to brush up on. Tricky topics in machine learning outside the scope of what I do in my job. Obscure topics in history or philosophy or aviation. And so on. I've learned so much in the last year this way.

But! It's is an art and it is quite easy to do it badly. You need to prompt the LLM to take a critical stance towards your ideas (in the current world of Opus 4.5 and Gemini 3, sycophancy isn't as much of a problem as it was, but LLMs still can be overly oriented to please). And you need to take a critical stance yourself. Interrogate its answers, and push it to clarify points that aren't obvious. Sometimes you learn something new, sometimes you expose fuzziness in the LLM's description (in which case it will usually give you the concept at a deeper level). Sometimes in the back-and-forth you realize you forgot to give it some critical piece of context, and when you do that it reframes the whole discussion.

I see plenty of examples of people just taking LLM's answers at face value like it's some kind of oracle (and I'm sure the comments here will contain many negative anecdotes like that). You can't do that; you need to engage and try and chip away at its position and come to some synthesis. The nice thing is the LLM won't mind having its ideas rigorously interrogated, which is something humans can be touchy about (though not always, and the most productive human collaborations are usually ones where both people can criticize each other's ideas freely).

For better or for worse, the people who will do best in this world are those with a rigorously critical mindset and an ability to communicate well, especially in writing. (If you're in college, consider throwing in a minor in philosophy or history alongside that CompSci major!) Those were already valuable skills, and they have even more leverage now.


Devin? Now that's a name I've not heard in a long time...a lonnng time.

Seriously, in this age of Claude Code and Codex, does anyone use Devin, or even know someone who does? Do they have any users at all?

Ironically, their product has probably got massively better in the last couple of years, because the underlying LLMs got massively better at coding and long-context tasks. But that doth not a successful business model make, and unless you’re Cursor (and even then I’m not so sure) this is a very very hard space to succeed in without owning your own frontier model (i.e being Anthropic, OpenAI, or Google).


yeah there is apparently not a lot of overlap between hn/twitter users and devin users, and we don’t really do marketing campaigns either

logos on website if you want to see some of our customers lol


I use their deepwiki often.

This rings true and reminds me of the classic blog post “Reality Has A Surprising Amount Of Detail”[0] that occasionally gets reposted here.

Going back and forth on the detail in requirements and mapping it to the details of technical implementation (and then dealing with the endless emergent details of actually running the thing in production on real hardware on the real internet with real messy users actually using it) is 90% of what’s hard about professional software engineering.

It’s also what separates professional engineering from things like the toy leetcode problems on a whiteboard that many of us love to hate. Those are hard in a different way, but LLMs can do them on their own better than humans now. Not so for the other stuff.

[0] http://johnsalvatier.org/blog/2017/reality-has-a-surprising-...


  > Reality Has A Surprising Amount Of Detail
Every time we make progress complexity increases and it becomes more difficult to make progress. I'm not sure why this is surprising to many. We always do things to "good enough", not to perfection. Not that perfection even exists... "Good enough" means we tabled some things and triaged, addressing the most important things. But now to improve those little things now need to be addressed.

This repeats over and over. There are no big problems, there are only a bunch of little problems that accumulate. As engineers, scientists, researchers, etc our literal job is to break down problems into many smaller problems and then solve them one at a time. And again, we only solve them to the good enough level, as perfection doesn't exist. The problems we solve never were a single problem, but many many smaller ones.

I think the problem is we want to avoid depth. It's difficult! It's frustrating. It would be great if depth were never needed. But everything is simple until you actually have to deal with it.


> As engineers, scientists, researchers, etc our literal job is to break down problems into many smaller problems and then solve them one at a time.

Our literal job is also to look for and find patterns in these problems, so we can solve them as a more common problem, if possible, instead of solving them one at a time all the time.


Very true. But I didn't want to discuss elegance and abstraction as people seem to misunderstand abstraction in programming. I mean all programming is abstraction... abstraction isn't to be avoided, but things can become too abstract

I think we're all coping a bit here. This time, it really is different.

The fact is, one developer with Claude code can now do the work of at least two developers. If that developer doesn't have ADHD, maybe that number is even higher.

I don't think the amount of work to do increases. I think the number of developers or the salary of developers decreases.

In any case, we'll see this in salaries over the next year or two.

The very best move here might be to start working for yourself and delete the dependency on your employer. These models might enable more startups.


Alternate take: what agents can spit out becomes table stakes for all software. Making it cohesive, focused on business needs, and stemming complexity are now requirements for all devs.

By the same token (couldn’t resist), I also would argue we should be seeing the quality of average software products notch up by now with how long LLMs have been available. I’m not seeing it. I’m not sure it’s a function of model quality, either. I suspect devs that didn’t care as much about quality hadn’t really changed their tune.


how much new software do we really use? and how much can old software become qualitatively better without just becoming new software in different times with a much bigger and younger customer base?

I misunderstood two things for a very long time:

a) standards are not lower or higher, people are happy that they can do stuff at all or a little to a lot faster using software. standards then grow with the people, as does the software.

b) of course software is always opinionated and there are always constraints and devs can't get stuck in a recursive loop of optimization but what's way more important: they don't have to because of a).

Quality is, often enough, a matter of how much time you spent on nitpicking even though you absolutely could get the job done. Software is part of a pipeline, a supply chain, and someone is somehow aware why it should be "this" and not better or that other version the devs have prepared knowing well enough it won't see the light of day.


Honestly, in many ways it feels like quality is decreasing.

I'm also not convinced it's a function of model quality. The model isn't going to do something if the prompter doesn't even know. It does what the programmer asked.

I'll give a basic example. Most people suck at writing bash scripts. It's also a common claim as to LLMs utility. Yet they never write functions unless I explicitly ask. Here try this command

  curl -fsSL https://claude.ai/install.sh | less
(You don't need to pipe into less but it helps for reading) Can you spot a fatal error in the code where when running curl-pipe-bash the program might cause major issues? Funny enough I asked Claude and it asked me this

  Is this script currently in production? If so, I’d strongly recommend adding the function wrapper before anyone uses it via curl-pipe-bash.                
The errors made here are quite common in curl-pipe-bash scripts. I'm pretty certain Claude would write a program with the same mistakes despite being able to tell you about the problems and their trivial corrections.

The problem with vibe coding is you get code that is close. But close only matters in horseshoes and hand grenades. You get a bunch of unknown unknowns. The classic problem of programming still exists: the computer does what you tell it to do, not what you want it to do. LLMs just might also do things you don't tell it to...


You sound bored. If we triple head count overnight, we'd only slow our backlog, temporarily. Every problem we solve only opens up a larger group of harder problems to solve.

Why wouldn't we find new things to do with all that new productivity?

Anecdotally, this is what I see happening in the small in my own work - we say yes to more ideas, more projects, because we know we can unblock things more quickly now - and I don't see why that wouldn't extend.

I do expect to see smaller teams - maybe a lot more one-person "teams" - and perhaps smaller companies. But I expect to see more work being done, not less, or the same.


What new things would we do? I do contracting so maybe I'm lowest-bidder-pilled but I feel like drops in price in lean organizations sre going to eat the lunch of shops trying to make more quality software in most software disciplines.

How much software is really required to be extensible?


There is tons of stuff to do. Lots of technologies out there that need to be invented and commercialized. Tons of inefficient processes in business, government, and academia to improve.

None of this means that it will be the kinds of professional specialized software development teams that we're used to doing any of this work, but I have some amount of optimism that this is actually going to be a golden age for "doing useful things with computers" work.


I still think it's more likely to be more of the same thing but with less people.

One man shops being the ideal, and I don't think there will be proportionately more of them


This doesn't mesh with anything that has happened in the development of computing, or technology in general.

I dispute technology in general. There are plenty of examples where industrialisation led to a drop in quality, a massive drop in price, and a displacement of workers.

It hasn't happened in software yet. I suppose this has to do with where software sits on the demand curve currently.

I'm imagining a few more shifts in productivity will make the demand vs price derivative shift in a meaningfully different way, but we can only speculate.


I think you are misunderstanding the point I'm making. I agree that "writing code" is likely to be commoditized by AI tools, much like past industrialization disruptions. But I think there is going to be more things to do in the space of "doing useful things with computers", analogous to how industrialization creates new work further up the value chain.

Of course it often isn't the same people whose jobs are disrupted who end up doing that new work.


If LLMs are good at writing software, then there's lots of good software around written by LLMs. Where is that software? I don't see it. Logical conclusion: LLMs aren't good at writing software.

Are you trying to make a distinction between writing software vs writing code? LLMs are pretty great at writing good code (a relative term of course) if you lay things out for them. I use Claude Code on both greenfield new projects and a giant corporate mono repo and it works pretty well in both. In the giant mono repo, I have the benefit of many of my coworkers developing really nice Claude.md files and skills, so that helps a lot.

It’s very similar to working with a college hire SWE: you need to break things down to give them a manageable chunk and do a bit of babysitting, but I’m much more productive than I was before. Particularly in the broad range of things where I know enough to know what needs to be done but I’m not super familiar with the framework to do it.


Presumably they are writing the same quality software faster, the market having decided what quality it will accept.

Once that trend maxes out it’s entirely plausible that the level of quality demanded will rise quickly. That’s basically what happened in the first dot com era.


I'm not convinced. Honestly it seems like we're in a market of lemons and I don't know how we escape the kind of environment that is ripe for lemons. To get out requires customers to be well informed at the time of purchase. This is always difficult with software as we usually need to try it first and frankly, the average person is woefully tech illiterate.

But these days? We are selling products based on promises, not actual capabilities. I can't think of a more fertile environment for a lemon market than that. No one can be informed and bigger and bigger promises need to be made every year.


"(...) maybe growing vegetables or using a Haskell package for the first time, and being frustrated by how many annoying snags there were." Haha this is funny. Interesting reading.

While this is absolutely true and I've read this before, I don't think you can make this an open and shut case. Here's my perspective as an old guy.

The first thing that comes to mind when I see this as a counterargument is that I've quite successfully built enormous amounts of completely functional digital products without ever mastering any of the details that I figured I would have to master when I started creating my first programs in the late 80s or early 90s.

When I first started, it was a lot about procedural thinking, like BASIC goto X, looping, if-then statements, and that kind of thing. That seemed like an abstraction compared to just assembly code, which, if you were into video games, was what real video game people were doing. At the time, we weren't that many layers away from the ones and zeros.

It's been a long march since then. What I do now is still sort of shockingly "easy" to me sometimes when I think about that context. I remember being in a band and spending a few weeks trying to build a website that sold CDs via credit card, and trying to unravel how cgi-bin worked using a 300 page book I had bought and all that. Today a problem like that is so trivial as to be a joke.

Reality hasn't gotten any less detailed. I just don't have to deal with it any more.

Of course, the standards have gone up. And that's likely what's gonna happen here. The standards are going to go way up. You used to be able to make a living just launching a website to sell something on the internet that people weren't selling on the internet yet. Around 1999 or so I remember friend of mine built a website to sell stereo stuff. He would just go down to the store in New York, buy it, and mail it to whoever bought it. Made a killing for a while. It was ridiculously easy if you knew how to do it. But most people didn't know how to do it.

Now you can make a living pretty "easily" selling a SaaS service that connects one business process to another, or integrates some workflow. What's going to happen to those companies now is left as an exercise for the reader.

I don't think there's any question that there will still be people building software, making judgment calls, and grappling with all the complexity and detail. But the standards are going to be unrecognizable.


Is the surprising amount of detail an indicator that we do not live in a simulation, or is it instead that we have to be living inside a simulation because it doesn't need all this detail for Reality, indicating an algorithmic function run amuck?

Reality is infintely analog and therefore digital will only ever be an approximation.

Can you give an example of an "other stuff"?

I once wrote software that had to manage the traffic coming into a major shipping terminal- OCR, gate arms, signage, cameras for inspecting chassis and containers, SIP audio comms, RFID readers, all of which needed to be reasoned about in a state machine, none of which were reliable. It required a lot of on the ground testing and observation and tweaking along with human interventions when things went wrong. I’d guess LLMs would have been good at subsets of that project, but the entire thing would still require a team of humans to build again today.

Sir your experience is unique and thanks for answering this.

That being said, someone took the idea of you saying LLM's might be good at subsets of projects to consider we should use LLMs for that subset as well

But I digress because (I provided more in depth reasoning in other comment as well) because if there is an even minute bug which might slip up past LLM and code review for subset of that and for millions of cars travelling through points, we assume that one single bug in it somewhere might increase the traffic/fatality traffic rate by 1 person per year. Firstly it shouldn't be used because of the inherent value of human life itself but even from monetary sense as well so there's really not much reason I can see in using it

That alone over a span of 10 years would cost 75 million-130Million$ (the value of life in US for a normal perosn ranges from 7.5 million - 13 million$)

Sir I just feel like if the point of LLM is to have less humans or less giving them income, this feels so short sighted because I (if I were the state and I think everyone will agree after the cost analysis) would much rather pay a few hundred thousand dollars to even a few million$ right now to save 75-130 Million$ (on the smallest scale mind you, it can get exponentially more expensive)

I am not exactly sure how we can detect the rate of deaths due to LLM use itself (the 1 number) but I took the most conservative number.

And that is also the fact that we won't know if LLM's might save a life but I am 99.9% sure that might not be the case and once again it wouldn't be verifiable itself so we are shooting things in the dark

And we can have a much more sensitive job with better context (you know what you are working at and you know how valuable it is/can save lives and everything) whereas no amount of words can convey that danger to LLM's

To put it simply, the LLM might not know the difference between this life or death situation machine's code at times or a sloppy website created by it.

I just don't think its worth it especially in this context at all even a single % of LLM code might not be worth it here.


> we won't know if LLM's might save a life

I had friend who was in crisis while the rest of us were asleep. Talking with ChatGPT kept her alive. So we know the number is at least one. If you go to the Dr ChatGPT thread, you'll find multiple reports of people who figured out debilitating medical conditions via ChatGPT in conjunction with a licensed human doctor, so we can be sure the numbers greater than zero. It doesn't make headlines the same way Adam's suicide does, and not just because OpenAI can't be the ones to say it.


Great for her, I hope she's doing okay now. (I do think we humans can take each other for granted)

If talking to chatgpt helps anyone mentally, then sure great. I can see as to why but I am a bit concerned that if we remove a human from the loop then we can probably get way too easily disillusioned as well which is what is happening.

These are still black boxes but in the context of traffic lights code (even partially) feels to me something that the probability of it might not saving a life significantly overwhelms the opposite.


ChatGPT psychosis also exists so it goes both ways, I just don't want the negative voices to drown out the positive ones (or vice versa).

As far as traffic lights go, this predates ChatGPT, but IBM's Watson, which is also rather much a black box where you stuff data in, and instructions come out; they've been doing traffic light optimization for years. IBM's got some patents on it, even. Of course that's machine learning, but as they say, ML is just AI that works.


I've had good luck when giving the AI its own feedback loop. On software projects, it's letting the AI take screenshots and read log files, so it can iterate on errors without human input. On hardware projects, it's a combination of solenoids, relays, a pi and pizerow, and a webcam. I'm not claiming that an AI could do the above mentioned project, just that (some) hardware projects can also get humans out of the loop.

Don’t you understand? That’s why all these AI companies are praying for humanoid robots to /just work/ - so we can replace humans mentally and physically ASAP!

I'm sure those will help. But that doesn't solve the problem the parent stated. Those robots can't solve those real world problems until they can reason, till they can hypothesize, till they can experiment, till they can abstract all on their own. The problem is you can't replace the humans (unilaterally) until you can create AGI. But that has problem of its own, as you now have to contend with previously creating a slave class of artificial life forms.

  > until they can reason, till they can hypothesize, till they can experiment, till they can abstract all on their own
at that point we will have to let them vote...

I completely agree - my comment was sarcastic and in jest.

My bad. Getting hard to tell these days lol

No worries - you’ve added useful context for those who may be misguided by these greedy corporations looking to replace us all. Maybe it helps them reconsider their point of view!

But you admit that fewer humans would be needed as “LLMs would have been good at subsets of that project”, so some impact already and these AI tools only get better.

If that is the only thing that you took out of that conversation, then I don't really believe that that job might've been suitable for you in the first place.

Now I don't know which language they used for the project (could be python or could be C/C++ or could be rust) but its like "python would have been good at subsets of that project", so some impact already and these python tools only get better

Did python remove the jobs? No. Each project has their own use case and in some LLM's might be useful, in others not.

In their project, LLM's might be useful for some parts but their majority of the work was doing completely new things with a human in feedback.

You are also forgetting trust factor, yes lets have your traffic lights system be written by a LLM, surely. Oops, the traffic lights glitched and all waymos (another AI) went beserk and oops accidents/crash happened which might cost millions.

Personally I wouldn't trust even a subset of LLM code and much rather have my country/state/city to pay to real developers that can be accountable & good quality control checks for such critical points to the point that no LLM in this context should be a must

For context, if LLM use can even impact 1 life every year. The value of 1 person is 7.5-13 million$

Over a period of 10 years in this really really small glitch of LLM, you end up in 10 years losing 75 million$

Yup go ahead save a few thousand dollars right now by not paying people enough in the first case to use LLM to then lose 75 million $ (on the least case scenario)


I doubt you have a clue regarding my suitability for any project, so I’ll ignore the passive l-aggressive ad hominem.

Anyway, it seems you are walking back your statement regarding LLM being useful for parts of your project, or ignoring the impact on personnel count. Not sure what you were trying to say then.


I went back because of course I could've just pointed out one picture but still wanted to give the whole picture.

my conclusion is rather the fact that this is a very high stakes project (both emotionally and mentally and economically) and AI are still black boxes with chances of being much more error prone (atleast in this context) and chances of it missing something to cause the -75 million and deaths of many is more likely and also that in such a high stakes project, LLM's shouldn't be used and having more engineers in the team might be worth it.

> I doubt you have a clue regarding my suitability for any project, so I’ll ignore the passive l-aggressive ad hominem.

Aside from the snark presented at me. I agree. And this is why you don't see me in a project regarding such high stakes project and neither should you see an LLM at any costs in this context. These should be reserved to the caliber of people who have both experience in the industry and are made of flesh.


Human beings are basically black boxes as far as the human brain is concerned. We don't blindly trust the code coming out of those black boxes, it seems illogical to do the same for LLMs.

Yes but at the end of the day I can't understand this take because what are we worried about for (atleast in this context) a few hundred thousand dollars for a human job than LLM?

I don't understand if its logical to deploy an LLM in any case, the problem is chances of LLM code slipping are very much more likely than the code of people who can talk to each other and decide on all meetings exactly how they wish to write and they got 10's of years of experience to back it up

If I were a state, there are so so many ways of getting money rather easily (hundreds of thousands of $ might seem a lot but they aren't for state) and plus you are forgetting that they went in manually and talked to real people


I was also curious about this quote, and it sounded to me too like Donne (or Pascal or Robert Boyle, a bit).

But Gemini 3.0 knew what it was, and it is from Omar Khayyám like the sibling commenter said, but from the little-known E. H. Whinfield translation (1883) rather than the more famous Fitzgerald one:

—-

221. (395.)

Such as I am, Thy power created me,

Thy care hath kept me for a century!

Through all these years I make experiment,

If my sins or Thy mercy greater be.

——-

Link to the actual page in Google books:

https://books.google.co.uk/books?id=NN_TAAAAMAAJ&q=Experimen...


Question for the well-informed people reading this thread: do SoTA models like Opus, Gemini and friends actually need output schema enforcement still, or has all the the RLVR training they do on generating code and json etc. made schema errors vanishingly unlikely? Because as a user of those models, they almost never make syntax mistakes in generating json and code; perhaps they still do output schema enforcement for "internal" things like tool call schemas though? I would just be surprised if it was actually catching that many errors. Maybe once in a while; LLMs are probabilistic after all.

(I get why you need structured generation for smaller LLMs, that makes sense.)


Schemas can get pretty complex (and LLMs might not be the best at counting). Also schemas are sometimes the first way to guard against the stochasticity of LLMs.

With that said, the model is pretty good at it.


Yes. Most common failure mode for sota models is to put ```json\n first, but they often do just fail often enough to be worth calling api with json response schema.

1000% I was just doing some spot checking of GPT-5.2 for evaluating model migration and the tool I used didn't have the setup to use schema constrained inference.

The model is like: "Here is what I came up with... ```{json}``` and this is why I am proud of it!"


This is going to be task-dependent, as well as limited by your (the implementer's) ability and comfort with structuring the task in solid multi-shot prompts that cover a large distribution of expected inputs, which will also help increase the ability for the model to successfully handle less common or edge case inputs-- the ones the would most typically require human-level reasoning. It can be useful to supplement this with a "tool" use for RAG lookup against a more extensive store of examples, or any time the full reference material isn't practical to dump into context. This requires thoughtful chunking.

It also requires testing. Don't think of it as a magic machine that should be able to do anything, think of it like a new employee smart enough and with enough background knowledge to do the task, if given proper job documentation. Test whether few-shot or many shot prompting works better: there's growing information about use cases where one or the other confers an advantage but so much of this is task dependent.

Consider your tolerance for errors and plan some escalation method: Hallucinations occur in part because models "have to" give an answer. Make sure that any critical cases where an error would be problematic have some way for the model to bail out with "i don't know" for human review. The first layer of escalation doesn't even have to be a human, it could be a separate model, eg Opus instead of Sonnet, or the same model but with a different setup prompt explicitly designed for handling certain cases without cluttering up context of the first one. Splitting things in this way, if there's a logical break point, is also a great way to save on token cost: If you can send only 10k of tokens in a system prompt instead of 50k and just choose which of 5 10k prompts to use for different cases then you save 80% of upstream token $$.

Consider running the model deterministic: 0 temp, same seed. It makes any errors you encounter easier to trace and debug.

Something to consider with respect to cost though: Many tasks that a SoTA can do with very little or no scaffolding can be done with these cheaper models and may not take much more scaffolding. If a SoTA giving reliable responses with zero shot prompting there's a decent chance you can save a ton of money with a flash model if you provide it one or few shot prompts. Open weight models even more so.

My anecdotal experience is that open models like Google's gemma and OpenAI's gpt-oss have behaviors more similar to their paid counterparts than other open models, making them reasonable candidates to try if you're getting good results from the paid models but they're perhaps overkill for the task.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: