Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The current AI hype wave has really hit a nerd soft spot - that we're steps away from AGI. Surely if a computer can make plausible-looking but incorrect sentences we're days away from those sentences being factually accurate! The winter after this is gonna be harsh.



Using Claude 3.5 Sonnet in Cursor Composer already shows huge benefits for coding. I'm more productive than ever before. The models are still getting better and better. I'm not saying AGI is right around the corner or that we will reach it, but the benefits are undeniable. o1 added test-time compute. No need to be snarky.


It’s not snark, our industry is run on fear. If there is the tiniest flicker of potential, we will spend piles of money out of fear of being left behind. As you age, it becomes harder to deny.. 10 years ago, I was starting to believe that my kids would never learn to drive or possibly buy a car, here we are ten years later and not that much has changed, I know you can take a robotaxi in some cities but nearly all interstate trucking has someone driving.

Coding AI assistants have done some impressive things, I’ve been amazed at how they sniffed out some repetitive tasks I was hacking on and I just tab completed pages of code that was pretty much correct. There is use. I pay for the feature. I don’t know if it’s worth 35% of the world’s energy consumption and all new fabrication resources over the next handful of years being dedicated to ‘ai chips.’ We arent looking for a better 2.0, we are expecting an exponentially better “2.0” and those are very rare.


That doesn't mean this is a bad investment for VCs. GPT is being directly integrated in iOS and is a top app on both markets. We've also barely scrapped the surface for potential niche applications that go beyond just a generalist chatbox interface. API use will likely continue to explode as the mountain of startups building off it come online. Voice stuff will probably kill off Alexa/Google Home.

I don't think the bulk of this VC money is predicated on AGI being around the corner.

But the general trend hopping nature of big VC money is real. Still, VCs manage to continue to make a profit despite this, otherwise the industry would have died off or shrunk the 10 other years HN critiqued this behaviour, so on the whole they must be doing something right.


VCs mostly make money by selling a narrative about investing in the next big thing and then collecting management fees, not by beating the market. If the public sours on AI we need a new hype to replace it and keep tech money flowing at the same rate. A lot of funding seems to follow fads and be disproportionate to value generated (I remember when there were a bazillion people building social networks because that was hot).


There is a tremendous opportunity in bridging the gap between Can be automated and Isn't automated due to technical/cost/time limitation. GPT's are perfect for this.

There are so many things that can be automated out there that currently aren't. Other industries are extremely manual and process driven still. Many here tend to underestimate this.

Some programmers here will argue it's error prone or creating technical debt but most people don't care, if it works it works, one can worry about it breaking in 5 years time after its saved you considerably time and money.


Don’t get me wrong, there is some very cool and useful stuff there. I think it’s a bit disingenuous to even talk about AGI at this point though and when you look at the power requirements and the need for $7trillion in investment just to build chips, I really don’t know. Underdeliver and AI loses some of the hype again, and like the parent post said, it will be a long winter. Are gpts worth more than Apple, MS, Alphabet and Amazon all together?


> ten years later and not that much has changed, I know you can take a robotaxi in some cities but

Uhh, that's a pretty big change.


There's no accounting for taste, but keep in mind that all of these services are currently losing money, so how much would you actually be willing to pay for the service you're currently getting in order to let it break even? There was a report that Microsoft is losing $20 for every $10 spent on Copilot subscriptions, with heavy users costing them as much as $80 per month. Assuming you're one of those heavy users, would you pay >$80 a month for it?

Then there's chain-of-thought being positioned as the next big step forwards, which works by throwing more inferencing at the problem, so that cost can't be amortized over time like training can...


I would pay hundreds of dollars per month for the combination of cursor and claude - I could not get my head around it when my beginner lever colleague said "I just coded this whole thing using cursor".

It was an entire web app, with search filters, tree based drag and drop GUIs, the backend api server, database migrations, auth and everything else.

Not once did he need to ask me a question. When I asked him "how long did this take" and expected him to say "a few weeks" (it would have taken me - a far more experienced engineer - 2 months minimum).

His answer was "a few days".

What I'm not saying is "AGI is close" but I've seen tangible evidence (only in the last 2 months), that my 20 year software engineering career is about to change and massively for the upside. Everyone is going to be so much more productive using these tools is how I see this.


Current LLMs fail if what you're coding is not the most common of tasks. And a simple web app is about as basic as it gets.

I've tried using LLMs for some libraries I'm working on, and they failed miserably. Trying to make an LLM implement a trait with a generic type in Rust is a game of luck with very poor chances.

I'm sure LLMs can massively speed up tasks like front-end JavaScript development, simple Python scripts, or writing SQL queries (which have been written a million times before).

But for anything even mildly complex, LLMs are still not suited.


I don't think if complexity is the right metric.

front-end JS can easily also become very complex

I think a better metric is how close you are to reinventing a wheel for the thousands time. Because that is what LLMs are good at: Helping you write code which nearly the same way has already been written thousands of times.

But that is also something you find in backend code, too.

But that is also something where we as a industry kinda failed to produce good tooling. And worse if you are in the industry it's kinda hard to spot without very carefully taking a hounded (mental) steps back from what you are used to and what biases you might have.


LLM Code Assistants have succeeded at facilitating reusable code. The grail of OOP and many other paradigms.

We should not have an entire industry of 10,000,000 devs reinventing the JS/React/Spring/FastCGi wheel. Im sure those humans can contribute in much better ways to society and progress.


> LLM Code Assistants have succeeded at facilitating reusable code.

I'd have said the opposite. I think LLMs facilitate disposable code. It might use the same paradigms and patterns, but my bet is that most LLM written code is written specifically for the app under development. Are there LLM written libraries that are eating the world?


I believe you're both saying the same thing. LLMs write "re-usable code" at the meta level.

The code itself is not clean and reusable across implementations, but you don't even need that clean packaged library. You just have an LLM regenerate the same code for every project you need it in.

The LLM itself, combined with your prompts, is effectively the reusable code.

Now, this generates a lot of slop, so we also need better AI tools to help humans interpret the code, and better tools to autotest the code to make sure it's working.

I've definitely replaced instances where I'd reach for a utility library, instead just generating the code with AI.

I think we also have an opportunity to merge the old and the new. We can have AI that can find and integrate existing packages, or it could generate code, and after it's tested enough, help extract and package it up as a battle tested library.


Agreed. But this terrifies me. The goal of reusable code (to my mind) is that with everybody building from the same foundations we can enable more functional and secure software. Library users contributing back (even just bug reports) is the whole point! With LLMs creating everything from scratch, I think we're setting ourselves on a path towards less secure and less maintainable software.


I (20+ years experience programmer) find it leads to a much higher quality output as I can now afford to do all the mundane, time-consuming housekeeping (refactors, more tests, making things testable).

E.g. let's say I'm working on a production thing and features/bugfixes accumulate and some file in the codebase starts to resemble spaghetti. The LLM can help me unfuck that way faster and get to a state of very clean code, across many files at once.


What LLM do you use? I've not gotten a lot of use out of Copilot, except for filling in generic algorithms or setting up boilerplate. Sometimes I use it for documentation but it often overlooks important details, or provides a description so generic as to be pointless. I've heard about Cursor but haven't tried it yet.


Cursor is much better than Copilot. Also, change it to use Claude, and then use the Inspector with ctrl-I


This is the thing it works both ways, it's really good at interpreting existing codebases too.

Could potentially mean just a change in time allocation/priority. As it's easier and faster to locate and potentially resolve issues later, it is less important for code to be consistent and perfectly documented.

Not fool proof and who knows how that could evolve, but just an alternative view. One of these big names in the industry said we'll have AGI when it speaks it's own language. :P.


I had similar experiences:

1. Aasked ChatGPT to write a simple echo server in C but with this twist: use io_uring rather than the classic sendmsg/recvmsg. The code it spat out wouldn't compile, let alone work. It was wrong on many points. It was clearly pieces of who-knows-what cut and pasted together. However after having banged my head on the docs for a while I could clearly determine from which sources the code io_uring code segments were coming. The code barely made any sense and it was completely incorrect both syntactically and semantically.

2. Asked another LLM to write an AWS IAM policy according to some specifications. It hallucinated and used predicates that do not exist at all. I mean, I could have done it myself if I just could have made predicates up.

> But for anything even mildly complex, LLMs are still not suited.

Agreed, and I'm not sure we are any close to them being.


Yep. LLMs don’t really reason about code, which turns out to not be a problem for a lot of programming nowadays. I think devs don’t even realize that the substrate they build on requires this sort of reasoning.

This is probably why there’s such a divide when you try to talk about software dev online. One camp believes that it boils down to duct taping as many ready made components together all in pursuit of impact and business value. Another wants to really understand all the moving parts to ensure it doesn’t fall apart.


My test is to take a sized chunk of memory containing a TrueType/OpenType font and output a map of glyphs to curves. Bot is nowhere close.


Roughly LLMs are great at things that involve a series of (near) 1-1 correspondences like “translate 同时采访了一些参与其中的活跃用户 to English” or “How do I move something up 5px in CSS without changing the rest of the layout?” but if the relationship of several parts is complex (those Rust traits or anything involving a fight with the borrow checker) or things have to go in some particular order it hasn’t seen (say US states in order of percent water area) they struggle.

SQL is a good target language because the translation from ideas (or written description) is more or less linear, the SQL engine uses entirely different techniques to turn that query into a set of relational operators which can be rewritten for efficiency and compiled or interpreted. The LLM and the SQL engine make a good team.


I’d bet that about 90% of software engineers today are just rewriting variations of what’s already been done. Most problems can be reduced to similar patterns. Of course, the quality of a model depends on its training data—if a library is new or the language isn’t widely used, the output may struggle. However, this is a challenge people are actively working on, and I believe it’s solvable.

LLMs are definitely suited for tasks of varying complexity, but like any tool, their effectiveness depends on knowing when and how to use them.


> Current LLMs fail if what you're coding is not the most common of tasks

Succeeding on the most common tasks (which isn't exactly what you said) is identical to "they're useful".


And I would go further… these “common tasks” cover 80% of the work in even the most demanding engineering or research positions.


That’s absolutely not my experience. I struggle to find tasks in my day to day work where LLMs are saving me time. One reason is that the systems and domains I work with are hardly represented at all on the internet.


I have the same experience. I'm in gamesdev and we've been encouraged to test out LLM tooling. Most of us at/above the senior level report the same experience: it sucks, it doesn't grasp the broader context of the systems that these problems exist inside of, even when you prompt it as best as you can, and it makes a lot of wild assed, incorrect assumptions about what it doesn't know and which are often hard to detect.

But it's also utterly failed to handle mundane tasks, like porting legacy code from one language and ecosystem to another, which is frankly surprising to me because I'd have assumed it would be perfectly suited for that task.


In my experience, AI for coding is having a rather stupid very junior dev at your beck and call but who can produce the results instantly. It's just often very mediocre and getting it fixed often takes longer than writing it on your own.


My experience is that it varies a lot by model, dev, and field — I've seen juniors (and indeed people with a decade of experience) keeping thousands of lines of unused code around for reference, or not understanding how optionals work, or leaving the FAQ full of placeholder values in English when the app is only on the German market, and so on. Good LLMs don't make those mistakes.

But the worst LLMs? One of my personal tests is "write Tetris as a web app", and the worst local LLM I've tried, started bad and then half way through switched to "write a toy ML project in python".


I think this illustrates the biggest failure mode when people start using LLMs: asking it to do too much in one step.

It’s a very useful tool, not magic.


> Not once did he need to ask me a question. When I asked him "how long did this take" and expected him to say "a few weeks" (it would have taken me - a far more experienced engineer - 2 months minimum).

> Current LLMs fail if what you're coding is not the most common of tasks. And a simple web app is about as basic as it gets.

These two complexity estimates don’t seem to line up.


That's still valuable though: For problem validation. It lowers the table stakes for building any sort of useful software, which all start simple.

Personally, I just use the hell out of Django for that. And since tools like that are already ridiculously productive, I don't see much upside from coding assistants. But by and large, so many of our tools are so surprisingly _bad_ at this, that I expect the LLM hype to have a lasting impact here. Even _if_ the solutions aren't actually LLMs, but just better tools, since we reconfigured how long something _should_ take.


The problem Django solves is popular, which is why we have so many great frameworks that shorten the implementation time (I use Laravel for that). Just like game engines or GUI libraries, assuming you understand the core concepts of the domain. And if the tool was very popular and the LLMs have loads of data to train on, there may be a small productivity tick by finding common patterns (small because if the patterns are common enough, you ought to find a library/plugin for it).

Bad tools often falls in three categories. Too simple, too complex, or unsuitable. For the last two, you'd better switch but there's the human element of sunken costs.


I work in video games, I've tried several AI assistants for C++ coding and they are all borderline useless for anything beyond writing some simple for loops. Not enough training data to be useful I bet, but I guess that's where the disparity is - web apps, python....that has tonnes of publicly available code that it can train on. Writing code that manages GPU calls on a PS5? Yeah, good luck with that.


Presumably Sony is sitting on decades worth of code for each of the PlayStation architectures. How long before they're training their own models and making those available to their studios' developers?


I don't think sony have these codes, more likely the finished build. And all the major studios have game engines for their core product (or they license one). The most difficult part is writing new game mechanics or supporting a new platform.


So you are basically saying "it failed on some of my Rust tasks, and those other languages aren't even real programming languages, so it's useless".

I've used LLMs to generate quite a lot of Rust code. It can definitely run into issues sometimes. But it's not really about complexity determining whether it will succeed or not. It's the stability of features or lack thereof and the number of examples in the training dataset.


I realize my comment seems dismissive in a manner I didn't intend. I'm sorry for that, I didn't mean to belittle these programming tasks.

What I meant by complexity is not "a task that's difficult for a human to solve" but rather "a task for which the output can't be 90% copied from the training data".

Since frontend development, small scripts and SQL queries tend to be very repetitive, LLMs are useful in these environments.

As other comments in this thread suggested: If you're reinventing the wheel (but this time the wheel is yellow instead of blue), the LLM can help you get there much faster.

But if you're working with something which hasn't been done many times before, LLMs start struggling. A lot.

This doesn't mean LLMs aren't useful. (And I never suggested that.) The most common tasks are, per definition, the most common tasks. Therefore LLMs can help in many areas, and are helpful to a lot of people.

But LLMs are very specialized in that regard, and once you work on a task that doesn't fit this specialization, their usefulness drops, down to being useless.


Which model exactly? You understand that every few months we are getting dramatically better models? Did you try the one that came out within the last week or so (o1-preview).


I did use o1-preview.


I can't understand how anyone can use these tools (copilot especially) to make entire projects from scratch and expand them later. They just lead you down the wrong path 90% of the time.

Personally I much prefer Chatgpt. I give it specific small problems to resolve and some context. At most 100 lines of code. If it gets more the quality goes to shit. In fact copilot feels like chatgpt that was given too much context.


I hear it all the time on HN that people are producing entire apps with LLMs, but I just don't believe it.

All of my experiences with LLMs have been that for anything that isn't a braindead-simple for loop is just unworkable garbage that takes more effort to fix than if you just wrote it from scratch to begin with. And then you're immediately met with "You're using it wrong!", "You're using the wrong model!", "You're prompting it wrong!" and my favorite, "Well, it boosts my productivity a ton!".

I sat down with the "AI Guru" as he calls himself at work to see how he works with it and... He doesn't. He'll ask it something, write an insanely comprehensive prompt, and it spits out... Generic trash that looks the same as the output I ask of it when I provide it 2 sentences total, and it doesn't even work properly. But he still stands by it, even though I'm actively watching him just dump everything he just wrote up for the AI and start implementing things himself. I don't know what to call this phenomenon, but it's shocking to me.

Even something that should be in its wheelhouse like producing simple test cases, it often just isn't able to do it to a satisfactory level. I've tried every one of these shitty things available in the market because my employer pays for it (I would never in my life spend money on this crap), and it just never works. I feel like I'm going crazy reading all the hype, but I'm slowly starting to suspect that most of it is just covert shilling by vested persons.


The other day I decided to write a script (that I needed for a project, but ancillary, not core code) entirely with CoPilot. It wasn't particularly long (maybe 100 lines of python). It worked. But I had to iterate so much with the LLM, repeating instructions, fixing stuff that didn't run, that it took a fair bit longer than if I had just written it myself. And this was a fairly vanilla data science type of script.


Most of the time the entire apps are just a timer app or something simple. Never a complex app with tons of logic in them. And if you're having to write paragraphs of texts to write something complex then might as well just write that in a programming language, I mean isn't that what high-level programming language was built for? (heh). Also, you're not the only one who's had the thought that someone is vested in someway to overhype this.


You can write the high level structure yourself and let it complete the boilerplate code within the functions, where it's less critical/complicated. Can save you time.


Oh for sure. I use it as smart(ish) autocomplete to avoid typing everything out/looking up in docs everytime but the thought of prompt engineering to make an app is just bizarre to me. It almost feels like it has more friction than actually writing the damn thing yourself.


You aren’t the only one that feels this way.

After 20 years of being held accountable for the quality of my code in production, I cannot help but feel a bit gaslit that decision-makers are so elated with these tools despite their flaws that they threaten to take away jobs.


Here is another example [0]. 95% of the code was taken as it is from the examples of the documentation. If you still need to read the code after it was generated, you may have well read the documentation first.

When they say treat it like an intern, I'm so confused. An intern is there to grow and hopefully replace you as you get promoted or leave. The tasks you assign to him are purposely kept simple for him to learn the craft. The monotonous ones should be done by the computer.

[0]: https://gist.github.com/simonw/97e29b86540fcc627da4984daf5b7...


I think to the extent this works for some people it’s as a way to trick their brains into “fixing” something broken rather than having to start from scratch. And for some devs, that really is a more productive mode, so maybe it works in the end.

And that’s fine if the dev realizes what’s going on but when they attribute their own quirks to AI magic, that’s a problem.


As a non-programmer at a non-programming company:

I use it to write test systems for physical products. We used to contract the work out or just pay someone to manually do the tests. So far it has worked exceptionally well for this.

I think the core issue of the "do LLMs actually suck" is people place different (and often moving) goalposts for whether or not it sucks.


I just wrote a fairly sizable app with an LLM. This is the first complete app I've written using it. I did write some of the core logic myself leaving the standard crud functions and UI for the LLM.

I did it in little pieces and started over with fresh context each time the LLM started to get off in the weeds. I'm very happy with the result. The code is clean and well commented, the tests are comprehensive and the app looks nice and performs well.

I could have done all this manually too but it would have taken longer and I probably would have skimped out on some tests and gave up and hacked a few things in out of expedience.

Did the LLM get things wrong on occasion? Yes. Make up api methods that don't exist? Yes. Skip over obvious standard straightforward and simple solutions in favor of some rat's nest convoluted way to achieve the same goal? Yes.

But that is why I'm here. It's a different style of programming (and one that I don't enjoy nearly as much as pounding the keyboard). It's more high level thinking and code review involved and less worrying about implementation detail.

It might not work as well in domains which training data doesn't exist in. Also certainly if someone expects to come in with no knowledge and just paste code without understanding, reading and pushing back, they will have a non working mess pretty shortly. But overall these tools dramatically increase productivity in some domains is my opinion.


> but I'm slowly starting to suspect that most of it is just covert shilling by vested persons.

It's almost as if the horde of former kleptocurrency bros have found a promising new seam of fool's gold to mine


I have the same observation as well. The hype is getting generated mostly by people who're selling AI courses or AI-related products.

It works well as a smart documentation search where you can ask follow-up questions or when you know what the output should look like if you see it but can't type it directly from the memory.

For code assistants (aka copilot / cursor), it works if you don't care about the code at all and ok with any solution if it's barely working (I'm ok with such code for my emacs configuration).


LLMs are great to go from 0 to 2b but you wanted to go to 1 so you remove and modify lots of things, get back to 1 and then go to 2.

Lots of people are terrible at going from 0 to 1 in any project. Me included. LLMs helped me a lot solving this issue. It is so much easier to iterate over something.


I think it’s more that if you want to believe it’s magic future tech then it looks like it.

If you aren’t on board then it looks impressive but flawed and not even close to living up to the hype.


Just for fun, give it a function you wrote, and ask it if it can make any improvements. I reckon I accept about a third of what it suggests.


Not a bad use, though I argue being able to do that critique yourself has a compounding effect over time that is worthwhile.


Well... I have to critique the critique, else how do I know which two thirds to reject?

In theory I'm learning from the LLM during this process (much like a real code review). In practice, it's very rare that it teaches me something, it's just more careful than I am. I don't think I'm ever going to be less slap-dash, unfortunately, so it's a useful adjunct for me.


> 20 year software engineering career is about to change

I have also been developing for 20+ years.

And have heard the exact same thing about IDEs, Search Engines, Stack Overflow, Github etc.

But in my experience at least how fast I code has never been the limiting factor in my project's success. So LLMs are nice and all but isn't going to change the industry all that much.


There will be a whole industry of people who fix what AI has created. I don't know if it will be faster to build the wrong thing and pay to have it fixed or to build the right thing from the get go, but after having seen some shit, like you, I have a little idea.


That industry will only form if LLMs don't improve from here. But the evidence, both theoretical and empirical, is quite the opposite. In fact one of the core reasons transformers gained so much traction is because they scale so well.

If nothing really changes in 3-5 years, then I'd call it a flop. But the writing is on the wall that "scale = smarts", and what we have today still looks like a foundational stage for LLM's.


> In fact one of the core reasons transformers gained so much traction is because they scale so well.

> If nothing really changes in 3-5 years, then I'd call it a flop

Transformers have been used for what 6 years now? Will you in 6 years say "I'll decide if they don't change the world in another 6 years?"


If the difference between now and 6 years in the future is the same as the difference between now and 6 years ago, a lot of people here will be eating their hats.


Why? What exactly have we got for the (how many hundred) billions of dollars poured into GPUs running transformers over the past 6 years?


You don't believe that models 100x better than today (OG transformers were pretty bad) would be fruitful for society?


Self-driving cars have been 3-5 years away for what, a decade now?


I never paid much attention to Elon.


Correction: a whole industry of AI that will fix what AI has created.


Will AI also be on call when things break in production?


no, the original comment was correct


yes, but does your colleague even fully understand what was generated? Does he have a good mental map of the organization of the project?

I have a good mental map of the projects I work on because I wrote them myself. When new business problems emerge, I can picture how to solve them using the different components of those applications. If I hadn't actually written the application myself, that expertise would not exist.

Your colleague may have a working application, but I seriously doubt he understands it in the way that is usually needed for maintaining it long term. I am not trying to be pessimistic, but I _really_ worry about these tools crippling an entire generation of programmers.


AI assistants are also quite good at helping you create a high level map of a codebase. They are able to traverse the whole project structure and functionality and explain to you how things are organized and what responsibilities are. I just went back to an old project (didn't remember much about it) and used Cursor to make a small bug fix and it helped me get it done in no time. I used it to identify where the issue might be based on logs and then elaborate on potential causes before then suggesting a solution and implementing it. It's the ultimate pair programmer setup.


> I just went back to an old project (didn't remember much about it) and used Cursor to make a small bug fix and it helped me get it done in no time.

That sounds quite useful. Does Cursor feed your entire project code (traversing all folders and files) into the context?


Do you ever verify those explanations, though? Because I occasionally try having an LLM summarise an article or document I just read, and it's almost always wrong. I have my doubts that they would fare much better in "understanding" an entire codebase.

My constant suspicion is that most results people are so impressed with were just never validated.


I wouldn’t even be so sure the application “works”. All we heard is that it has pretty UI and an API and a database, but does it do something useful and does it do that thing correctly? I wouldn’t be surprised if it totally fails to save data in a restorable way, or to be consistent in its behavior. It certainly doesn’t integrate meaningfully with any existing systems, and as you say, no human has any expertise in how it works, how to maintain it, troubleshoot it, or update it. Worse, the LLM that created it also doesn’t have any of that expertise.


> I _really_ worry about these tools crippling an entire generation of programmers.

Isn’t that the point? Degrade the user long enough that the competing user is on-par or below the competence of the tool so that you now have an indispensable product and justification of its cost and existence.

P.S. This is what I understood from a lot of AI saints in news who are too busy parroting productivity gains without citing other consequences, such as loss of understanding of the task or expertise to fact-check.


Me too, but a more optimistic view is that this is just a nascent form of higher-level programming languages. Gray-beards may bemoan that us "young" developers (born after 1970) can't write machine code from memory, but it's hardly a practical issue anymore. Analogously, I imagine future software dev to consist mostly of writing specs in natural language.


No one can write machine code from memory other by writing machine for years and just memorizing them. Just like you can't start writing Python without prior knowledge.

> Analogously, I imagine future software dev to consist mostly of writing specs in natural language.

https://www.commitstrip.com/en/2016/08/25/a-very-comprehensi...?


> Me too, but a more optimistic view is that this is just a nascent form of higher-level programming languages.

I like this take. I feel like a significant portion of building out a web app (to give an example) is boilerplate. One benefit of (e.g., younger) developers using AI to mock out web apps might be to figure out how to get past that boilerplate to something more concise and productive, which is not necessarily an easy thing to get right.

In other words, perhaps the new AI tools will facilitate an understanding of what can safely be generalized from 30 years of actual code.


Web apps require a ton of boilerplate. Almost every successful web framework uses at least one type of metaprogramming, many have more than one (reflection + codegen).

I’d argue web frameworks don’t even help a lot in this regard still. They pile on more concepts to the leaky abstractions of the web. They’re written by people that love the web, and this is a problem because they’re reluctant to hide any of the details just in case you need to get to them.

Coworker argued that webdev fundamentally opposes abstraction, which I think is correct. It certainly explains the mountains of code involved.


I admit that my own feelings about this are heavily biased, because I _truly_ care about coding as a craft; not just a means to an end. For me, the inclusion of LLMs or AI into the process robs it of so much creativity and essence. No one would argue that a craftsman produces furniture more quickly than Wayfair, but all people would agree that the final product would be better.

It does seem inevitable that some large change will happen to our profession in the years to come. I find it challenging to predict exactly how things will play out.


I suppose the craft/art view of coding will follow the path of chess - machines gradually overtake humans but it's still an artform to be good at, in some sense.


I've coded python scripts that let me take csv data from hornresp and convert it to 3d models I can import into sketchup. I did two coding units at uni, so whilst I can read it... I can't write it from scratch to save my life. I can debug and fix scripts gpt gives me. I did the hornresp script in about 40 mins. It would have taken me weeks to learn what it produced.

I'm not a mathematician, hell i did general maths at school. Currently I've been talking through scripting a method to mix dsd audio files natively without converting to tradional pcm. I'm about to use gpt to craft these scripts. There is no way I could have done this myself without years of learning. Now all I have to do is wait half a day so I can use my free gpt o credits to code it for me (I'm broke af so can't afford subs). The productivity gains are insane. I'd pay for this in a heartbeat if I could afford it.


I really believe that the front-end part can be mostly automated (the html/CSS at least), copilot is close imho (microsoft+github, I used both), but really they're useless to do anything else complex without making to much calls, proposing bad data structures, using bad /old code design.


The frontend part was already automated. We called it Dreamweaver and RAD tools.


Thank you, now I realize where I've had this feeling before!

Working with AI-generated code to add new features feels like working with Dreamweaver-generated code, which was also unpleasant. It's not written the same way a human would write it, isn't written with ease of modification in mind, etc.


Copilot is pretty bad compared to cursor with sonnet. I have used Copilot for quite a long time so I can tell.


I am curiouse, how complex was the app? I use cursor too and am very satisfied with it. It seem that is very good at code that must have been written so many times before (think react components, node.js REST api endpoints etc.) but it starts to fall of when moving into specific domains.

And for me that is the best case scenario, it takes away the part we have to code / solve already solved problems again and again so we can focus more on the other parts of software engineering beyond writing code.


Fairly standard greenfield projects seem to be the absolute best scenario for an LLM. It is impressive, but that's not what most professional software development work is, in my experience. Even once I know what specifically to code I spend much more time ensuring that code will be consistent and maintainable with the rest of the project than with just getting it to work. So far I haven't found LLMs to be all that good at that sort of work.


Did you take a look at the code generated? Was it well designed and amenable to extension / building on top of?

I've been impressed with the ability to generate "throw away" code for testing out an idea or rapidly prototyping something.


Considering the current state of the industry, and the prevailing corporate climate, are you sure your job is about to get easier, or are you about to experience cuts to both jobs and pay?


The problem is that it only works for basic stuff for which there is a lot of existing example code out there to work with.

In niche situations it's not helpful at all in writing code that works (or even close). It is helpful as a quick lookup for docs for libs or functions you don't use much, or for gotchas that you might otherwise search StackOverflow for answers to.

It's good for quick-and-dirty code that I need for one-off scripts, testing, and stuff like that which won't make it into production.


So what is his plan to fix all the bugs that claude hallucinated in the code ?


I'm confident you have not used Cursor Composer + Claude 3.5 Sonnet. I'd say the level of bugs is no higher than that of a typical engineer - maybe even lower.


There's no LLM for which that is true or we'd all be fired.


In my experience it is true, but only for relatively small pieces of a system at the time. LLMs have to be orchestrated by a knowledgeable human operator to build a complete system any larger than a small library.


In the long term, sure. Short term, when that happens, we're going to be on Wile E. Cyote physics and keep up until we look down and notice the absence of ground.


If all you bring to the table is the ability to reimplement simple web apps to spec, then sooner or later you probably will be fired.


It's only as good as its training data.

Step outside of building basic web/CRUD apps and its accuracy drops off substantially.

Also almost every library it uses is old and insecure.


Yet most work seems to be CRUD related and most SaaS businesses starting up just really need those things mainly.


That last point represents the biggest problem this technology will leave us with. Nobody's going to train LLMs on new libraries or frameworks when writing original code takes an order of magnitude longer than generating code for the 2023 stack.


With LLM's like gemini, which have massive context windows, you can just drop the full documentation for anything in the context window. It dramatically improves output.


I use phind which does searches to provide additional context


I am confident you didn't understand my comment. I didn't say anything about "level of bugs".


Claude is actually surprisingly good at fixing bugs as well. Feed it a code snippet and either the error message or a brief description of the problem and it will in many cases generate new code that works.


Sounds like CRUD boilerplate. Sure, it's great to have AI build this out and it saves a ton of time, but I've yet to see any examples (online or otherwise) or people building complex business rules and feature sets using AI.

The sad part is beginners using the boilerplate code won't get any practice building apps and will completely fail at the complex parts of an app OR try to use AI to build it and it will be terrible code.


I hear these stories, and I have to wonder, how useful is the app really? Was it actually built to address a need or was it built to learn the coding tool? Is it secure, maintainable, accessible, deployable, and usable? Or is it just a tweaked demo? Plenty of demo apps have all those features, but would never serve as the basis for something real or meet actual customer needs.


Yeah AI can give you a good base if its something thats been done before (which admittedly, 99% of SE projects are), especially in the target language.

Yeah, if you want tic-tac-toe or snake, you can simply ask ChatGPT and it will spit out something reasonable.

But this is not much better than a search engine/framework to be honest.

Asking it to be "creative" or to tweak existing code however ...


Yes, the value of a single engineer can easily double. Even a junior - and it's much easier for them to ask Claude for help than the senior engineer on the team (low barrier for unblock).


> There was a report that Microsoft is losing $20 for every $10 spent on Copilot subscriptions, with heavy users costing them as much as $80 per month. Assuming you're one of those heavy users, would you pay >$80 a month for it?

I'm probably one of those "heavy users", though I've only been using it for a month to see how well it does. Here's my review:

Large completions (10-15 lines): It will generally spit out near-working code for any codemonkey-level framework-user frontend code, but for anything more it'll be at best amusing and a waste of time.

Small completions (complete current line): Usually nails it and saves me a few keystrokes.

The downside is that it competes for my attention/screen space against good old auto-completion, which costs me productivity every time it fucks up. Having to go back and fix identifiers in which it messed up the capitalization/had typos, where basic auto-complete wouldn't have failed is also annoying.

I'd pay about about $40 right now because at least it has some entertainment value, being technologically interesting.


I find tools where I am manually shepherding the context into an LLM to work much better than Copilot at current. If I think thru the problem enough to articulate it and give the model a clear explanation, and choose the surrounding pieces of context (the same stuff I would open up and look at as a dev) I can be pretty sure the code generated (even larger outputs) will work and do what I wanted, and be stylistically good. I am still adding a lot in this scenario, but it's heavier on the analysis and requirements side, and less on the code creation side.

If what I give it is too open ended, doesn't have enough info, etc, I'll still get a low quality output. Though I find I can steer it by asking it to ask clarifying questions. Asking it to build unit tests can help a lot too in bolstering, a few iterations getting the unit tests created and passing can really push the quality up.


1) The costs will go down over time, much of the cost is the margin of NVIDIA and training new models

2) Absolutely. Thats like one hour of an engineer salary for a whole month.


> The costs will go down over time, much of the cost is the margin of NVIDIA and training new models

Isn't each new model bigger and heavier and thus requries more compute to train?


Yes, but 1) you only need to train the model once and the inference is way cheaper. Train one great model (i.e. Claude 3.5) and you can get much more than $80/month worth out of it. 2) the hardware is getting much better and prices will fall drastically once there is a bit of a saturation of the market or another company starts putting out hardware that can compete with NVIDIA


> Train one great model (i.e. Claude 3.5) and you can get much more than $80/month worth out of it

Until the competition outcompetes you with their new model and you have to train a new superior one, because you have no moat. Which happens what, around every month or two?

> the hardware is getting much better and prices will fall drastically once there is a bit of a saturation of the market or another company starts putting out hardware that can compete with NVIDIA

Where is the hardware that can compete with NVIDIA going to come from? And if they don't have competition, which they don't, why would they bring down prices?


> Until the competition outcompetes you with their new model and you have to train a new superior one, because you have no moat. Which happens what, around every month or two?

Eventually one of you runs out of money, but your customers keep getting better models until then; and if the loser in this race releases the weights on a suitable gratis license then your businesses can both lose.

But that still leaves your customers with access to a model that's much cheaper to run than it was to create.


The point is not that every lab will be profitable. There only needs to be one model in the end to increase our productivity massively, which is the point I'm making.

Huge margins lead to a lot of competition trying to catch up, which is what makes market economies so successful.


Gemini models are trained and run on Google's in house TPU's, which frankly are incredible compared to H100's. In fact Claude was trained on TPUs.

Google however does not sell these, you can only lease time on them via GCP.


Then those new models get distilled into smaller ones.

Raising the max intelligence of the models tends to raise the intelligence of all the models via distillation.


If it makes software developers 10% more productive there sure would be many companies who'd pay $80 a month per seat.


Maybe there are people out there working in coding sweatshops churning out boilerplate code 8 hours a day, 50 weeks a year - people who's job is 100% coding (not what I would call software engineers or developers - just coders). It's easy to imagine that for such people (but do they even exist?!) there could be large productivity gains.

However, for a more typical software engineer, where every project is different, you have full lifecycle responsibility from design through coding, occasional production support, future enhancements, refactorings, updates for 3rd party library/OD updates, etc/etc, then how much of your time is actually spent pure coding (non-stop typing) ?! Probably closer to 10-25%, and certainly no-where near 100%. The potential overall time saving from a tool that saves, let's say, 10-25% of your code typing is going to be 1-5%, which is probably far less than gets wasted in meetings, chatting with your work buddies, or watching bullshit corporate training videos. IOW the savings is really just inconsequential noise.

In many companies the work load is cyclic from one major project to the next, with intense periods of development interspersed with quieter periods in-between. Your productivity here certainly isn't limited by how fast you can type.


I'm skeptical about the whole ordeal, but at $80/mo it would still be worth it unless you sit somewhere at the very bottom of outsourcing well.


A 1% time saving for a $100k/yr position is still worth $83/month. And accounting for overhead, someone who costs the company $100k only gets a $60k salary.

If you pay Silicon Valley salaries this seems like a no-brainer. There are bigger time wasters elsewhere, but this is an easy win with minimal resistance or required culture change


Yeah, but companies need to see the savings on the bottom line, in real dollars, before they are going to be spending $1000/seat for this stuff. A theoretical, or actual, 1-5% of time saved typing is most likely not going to mean you can hire fewer people and actually reduce payroll, so even if the 1-5% were to show up on internal timesheets (it won't!), this internal accounting will not be reflected on the bottom line.


It's like saying "AI is going to replace book writers because they are so much more productive now". All you will get is more mediocre content that someone will have to fix later - the same with code.

10% more productive. What does that mean? If you mean lines of code, then it's an incredibly poor metric. They write more code, faster. Then what? What are the long-term consequences? Is it ultimately a wash, or even a detriment?

https://stackoverflow.blog/2024/03/22/is-ai-making-your-code...


LLMs set a new minimum level; because of this they can fill in the gaps in a skillet — if I really suck at writing unit tests, they can bring me up from "none" to "it's a start". Likewise all the other specialities within software.

Personally I am having a lot of fun, as an iOS developer, creating web games. No market in that, not really, but it's fun and I wouldn't have time to update my CSS and JS knowledge that was last up-to-date in 1998.


It actually makes them less productive and creates havoc in codebases with hidden bugs and verbose code that ppl are copy pasting.


Also at some point you can run the equivalent model locally. There is no long term moat here i think and facebook seems hellbent of ensuring there will be no new google from llms


I think physics at some point will get in the way, well at least for a while. An H100 costs like $20k-$30k and there's only so much compression/efficiency they can gain without beginning to lose intelligence, purely because you can't compute out of thin air.


Is there any reason to believe costs won’t come down with scale and hardware iteration, just like they did for everything else?

Short term pricing inefficiency is not relevant to long term impact.


Of course, but every token generated by a 100B model is going to take minimally 100B FLOPS, and if this is being used as an IDE typing assistant then there is going to be a lot of tokens being generated.

If there is a common shift to using additional runtime compute to improve quality of output, such as OpenAI's GPT-o1, then FLOPs required goes up massively (OpenAI has said it takes exponential increase in FLOPS/cost to generate linear gains in quality).

So, while costs will of course decrease, those $20-30K NVIDEA chips are going to be kept burring, and are not going to pay for themselves ...

This may end up like the shift to cloud computing that sounds good in theory (save the cost of running your own data center), but where corporate America balks when the bill comes in. It may well be that the endgame for corporate AI is to run free tools from the likes of Meta (or open source) in their own datacenter, or maybe even locally on "AI PCs".


Which is why the work to improve the results of small models is so important. Running a 3B or even 1B model as typing assistant and reserving the 100B model for refactoring is a lot more viable.


> but every token generated by a 100B model is going to take minimally 100B FLOPS

Drop the S, I think. There’s no time dimension.

And FLOP is a generalized capability meaning you can do any operation. Hardware optimizations for ML can deliver the same 100B computations faster and cheaper by not being completely generalized. Same way ray tracing acceleration works: it does not use the same amount of compute as ray tracing in general CPU’s.


Sure, ANN computations are mostly multiplication (or multiply and add) - multiply an ANN input by a weight (parameter) and accumulate, parallelized into matrix multiplication which is the basic operation supported by accelerators like GPUs and TPUs.

Still, even with modern accelerators it's lot of computation, and is what drives the price per token of larger models vs smaller ones.


I can already pay $0 a month and use uncensored local models for both text and images.

Llama, Mixtral, Stable diffusion and Flux are a lot of fun and free to run locally, you should try them out.


You can pay $0 for those models because a company paid $lots to train them and then released them for free. Those models aren't going away now of course, but lets not pretend that being able to download the product of millions of dollars worth of training completely free of charge is sustainable for future developments. Especially when most of the companies releasing these open models are wildly unprofitable and will inevitably bankrupt themselves when investments dry up unless they change their trajectory.


Much could be said about open source libraries that companies release for free to use (kubernetes, react, firecracker, etc). It might be strategically make sense for them so in the meantime we’ll just reap the benefits.


All of these require maintenance, and mostly it's been a treadmill just applying updates to React codebases. Complex tools are brittle and often only makes sense at the original source.


You’re acting as if computing power isn’t going to get better. With time training the models will get faster.

Let me use CG rendering as an example. Back in the day only the big companies could afford to do photoreal 3D rendering because only they had access to the compute and even then it would take days to render a frame.

Eventually people could do these renders at home with consumer hardware but it still took forever to render.

Now we can render photoreal with path tracing at near realtime speeds.

If you could go back twenty years and show CG artists the Unreal Engine 5 and show them it’s all realtime they would lose their minds.

I see the same for A.I., now it’s only the big companies that can do it, then we will be able to do it at home but it will be slow and finally we will be able to train it at home for quick and cheap.


The flipside to that metaphor is that high-end CG productions never stopped growing in scope to fill bigger and better hardware - yes you can easily render CG from back in the day on a shoestring budget now, but rendering Avatar 2 a couple of years ago still required a cluster with tens of thousands of CPU cores. Unless there's a plateau in the amount of compute you can usefully pour into training a model, those with big money to spend are always going to be several steps ahead of what us mere mortals can do.


VRAM isn't free, you just put it in the capex pile instead of opex


> There's no accounting for taste, but keep in mind that all of these services are currently losing money, so how much would you actually be willing to pay for the service you're currently getting in order to let it break even

Ok models already run locally; that aside, as the hosted ones are kinda similar quality to interns (though varying by field), the answer is "what you'd pay an intern". Could easily be £1500/month, depending on domain.


When was this profitability report, because the cost per token generation has dropped significantly.

When GPT4 was launched last year, the API cost was about $36/M blended tokens, but you can now get GPT4o tokens for about $4.4/M tokens, Gemini 1.5 Pro for $2.2/M or DeepSeek-V2 (as 21B A/236B W model that matches GPT4 on coding) for as low as $0.28/M tokens (over 100X cheaper for the same quality output over the course of about 1.5 years).

The just released Qwen2.5-Coder-7B-Instruct (Apache 2.0 licensed) also basically matches/beats GPT4 on coding benchmarks and quantized can not only can run at a decent speed on just about any consumer gaming GPU, but on most new CPUs/NPUs as well. This is about a 250X smaller model than GPT4.

There are now a huge array of open weight (and open source) models that are very capable and that can be run locally/on the edge.


There's no accounting for taste, but keep in mind that all of these services are currently losing money, so how much would you actually be willing to pay for the service you're currently getting in order to let it break even?

For ChatGPT in its current state, probably $1K/month.


$80 a month is a no brainer given the productivity multiplier.


Just a thought exercise. If we would have an AI with the intellectual capabilities of a Ph.D holding professor in a hard science. How much would it be worth for you to have access to that AI?

100,000 ? 500,000 ?


0 unless what I'm interested in is that Professor's very narrowly tailored niche. It's called Piled Higher and Deeper for a reason.


I don't find this very compelling. Hardware is becoming more available and cheaper as production ramps up, and smaller models are constantly seeing dramatic improvements.


CoT is not RL'ing over reasoning traces, costs have come down 87.5% since that article, and I agree generally that "free" is a bad price point


- it won't work.

- ok it works, but it won't be useful.

- ok it's useful, but it won't scale.

- ok it scales, but it won't make any money.

- ok it makes money, but it's not going to last.

etc etc


Retrospectively framing technologies that succeeded despite doubts at the time discounts those that failed.

After all, you could have used the exact same response in defense of web3 tech. That doesn't mean LLMs are fated to be like web3, but similarly the outcome that the current expenditure can be recouped is far from a certainty just because there are doubters.


There certainly has been some goal post moving over the past few months. A lot of the people in here have some kind of psychological block when it comes to technology that may potentially replace them one day.


Yeah currently the sentiment seems to be "okay fine it works for simple stuff but won't deal with my complex query so it can be dismissed outright." Save yourselves some time and use it for that simple stuff folks.


People hate paying specifically for stuff.

If Copilot came for free and Azure cost a tiny bit more, nobody would even blink.


I would, and I don't use chatgpt as much as other people. I would pay for it for each of my employees.


It's called investment. You need to spend money to make money. Their costs will certainly come down.


Definitely. My time is valuable and I would spend multiples more on the current subscription costs.


Why do you assume they’re losing money on inference?


It's a usefull coding tool - but at the same time it displays a lack of intelligence in the responses provided.

Like it will generate code like `x && Array.isarray(x)` because `x && x is something` is a common pattern I guess - but it's completely pointless in this context.

It will often do roundabout shit solutions when there's trivial stuff built into the tool/library when you ask it to solve some problem. If you're not a domain expert or search for better solutions to check it you'll often end up with slop.

And the "reasoning" feels like the most generic answers while staying on topic, like "review this code" will focus on bullshit rather than prioritizing the logic errors or clearing up underlying assumptions, etc.

That said it's pretty good at bulk editing - like when I need to refactor crufty test cases it saves a bunch of typing.


idk. about Claude 3.5

but if you remove implicit subventions from the AI/AGI hype then for many such tools the cost to benefit calculation of creating and operating will become ... questionable

furthermore the places where such tools tend to shine the most often places where the IT industry has somewhat failed, like unnecessary verbose and bothersome to use tools, missing tooling and troublesome code reuse (so you write the same code again and again). And this LLM based tools are not fixing the problem they just kinda hiding it. And that has me worried a bit because it makes it much much less likely for the problem to ever be fixed. Like I think there is a serious chance for this tooling causing the industry to be stuck on a quite sub-par plato for many many years.

So while they clearly help, especially if you have to reinvent the wheel for a thousands time, it's hard to look at them favorably.


> And that has me worried a bit because it makes it much much less likely for the problem to ever be fixed.

How will that ever get solved, in this universe? Look at what C++ does to C, what TypeScript does to JavaScript, what every standard does to the one before. It builds on top, without fixing the bottom, paving over the holes.

If AI helps generate sane low level code, maybe it will help you make less buffer overflow mistakes. If AI can help test and design your firewall and network rules, maybe it will help you avoid exposing some holes in your CUPS service. Why not, if we're never getting rid of IP printing or C? Seems like part of the technological progress.


The scaling laws coming to mind. This concern becomes trivial as we scale. Its like worrying that your calculator app running on your phone could be more efficient when adding two numbers.


The problem is that the scaling law goes in two directions, things becoming (potential exponential) cheaper because they are done (produced) a lot and things becoming _(potential exponential) more expensive_ because the scale you try to archive so so much beyond what is sustainable (or in other word the higher the demand for a limited resource becomes the more expensive it becomes).

Similar this law isn't really a law for a good reason, it doesn't always work. Not everything gets cheaper (in a relevant amount) at scale.


Hopefully it will be able to also reduce boilerplate and do reasonable DRY abstractions if repetition becomes too much.

E.g. I feel like it should be possible to first blast out a lot of repetitive code and then for LLM to go over all of it and abstract it reasonably, while tests are still passing.


Code generator in the editor has been around for ages and serves primarily to maximise boilerplate and minimise DRY. Expecting the opposite from a new code generator will yield disappointment.


I mean LLM can go through all the files in a source code and find repetitions that can be abstracted. Reorganize files into more appropriate structures etc. It just needs an optimal algorithm to provide optimal context for it.


It's not snark, it's calling out a fundamental error of extrapolating a short term change in progress to infinity.

It's like looking at the first version of an IDE that got intellisense/autocomplete and deciding that we'll be able to write entire programs by just pressing tab and enter 10,000 times.


2 things can be true at the same time.

Op is addressing the hype that there is some linear path of improvement here and chatgpt 8.5 will be AGI.

To which people always seem to jump in with but it’s useful for me and makes me code faster. Which is fine and valid, just beside the point


Do you think AI companies will be able to afford running massive compute farms solely so coders can get suggestions?

I do not claim to know what the future holds, but I do feel the clock is ticking on the AI hype. OpenAI blew people's minds with GPTs, and people extrapolated that mind-blowing experience into a future with omniscient AI agents, but those are nowhere to be seen. If investors have AGI in mind, and it doesn't happen soon enough, I can see another winter.

Remember, the other AI winters were due to a disconnect between expectations and reality of the current tech. They also started with unbelievable optimism that ended when it became clear the expectations were not reality. The tech wasn't bad back then either, it just wasn't The General Solution people were hoping for.


I feel like these new tools have helped me get simple programming tasks done really quickly over the last 18 months. They seem like a faster, better and more accurate replacement for googling and Stackoverflow.

They seem very good at writing SQL for example. All the commas are in the right place and exactly the right amount of brackets square curly and round. But when they get it wrong, it really shows up the lack of intelligence. I hope the froth and bubble in the marketing of these tools matures into something with a little less hyperbole because they really are great just not intelligent.


how come MS Teams is still trash when everyone is being so much more productive? Shouldn't MS - sitting at the source - be able to create software wonders like all the weekend warriors using AI?


If you like Cursor, you should definitely check out ClaudeDev (https://github.com/saoudrizwan/claude-dev) It's been a hit in the Ai dev community and I've noticed many folks prefer it over Cursor. It's free and open-source. You use your API credits instead of subscription and it supports other LLMs like DeepSeek too.


The economics dont make sense at all:

Either you pay more and more to keep your job as it gets better, or the company pays any amount for it so they can replace you over and over as a barely useful cog.

The current state of it being cheap only exist as it is in beta and they need more info from you, the expert, until it no longer needs you


I have yet to watch people be THAT more productive using, say, Copilot. Outside of some annoying boilerplate that I did not have to write myself, I don't know what kind of code you are writing that makes it all so much easier. This gets worse if you are using less trendy languages.

No offense, but I have only seen people who barely coded before describe being "very productive" with AI. And, sure, if you dabble, these systems will spit out scripts and simpler code for you, making you feel empowered, but they are not anywhere near being helpful with a semi-complex codebase.


I’ve tried enough times to generate code with AI: any attempt to generate non absolutely trivial piece of code that I can do intoxicated and sleep deprived, is just junk. It takes more time and effort to correct the AI output as starting from 0.

Let’s see in some years… long winter ahead.


I tried many times. Things that AI is good at:

- Generate boilerplate

- Generate extremely simple code patterns. You need a simple CRUD API? Yeah, it can do it.

- Generate solutions for established algorithms. Think of solutions for leetcode exercises.

So yeah, if that's your job as a developer, that was a massive productivity boost.

Playing with anything beyond that and I got varying degrees of failure. Some of which are productivity killers.

The worst is when I am trying to do something in a language/framework I am not familiar with, and AI generates plausibly sounding but horribly wrong bullshit. It sends me in some deadends that take me a while to figure out, and I would have been better just looking it up by myself.


And the solutions for these already existed:

- Generate boilerplate : Snippets, templates, and code generators

- Generate extremely simple code patterns : Frameworks

- Generate solutions for established algorithms : Libraries.


Lol seriously there are deterministic commands I can run that give me correct and verified boilerplate to stand up APIs. Why would I trust some probabilistic analysis of all code found online (while dissipating ungodly amounts of energy and water) to do it instead?


When I heard people talk about writing specs in natural language, I want to ask them if they want fuzzy results too. Like 10x10=20 or having you account debited from x+e money where x is what you ask and e is any real number. Or having your smoke detector interpreting it’s sensor fuzzily too.


Absolutely.

My point is that I don't think AI can meaningfully output code that would be useful beyond that, because that code is not available in its training data.

Whenever I see people going on about how AI made then super productive, the only thing I ask myself is "My brother in Christ, what the fuck are you even coding?"


I've definitely noticed Copilot making it less annoying to write code because I don't have to type as much. But I wonder if that significant reduction in subjective annoyance causes people to overestimate how much actual time they're saving.


While I get where your fascinaton comes from...

> I'm more productive than ever before.

You realize that another way to read that sentence is "I am a really bad coder".


Maybe I am, but I'm getting pretty rich doing it, so there is that. :)


The fact that you make money using AI, has nothing to do with its usefulness for society/humanity.

There are people who are getting “pretty rich” by trafficking humans, or selling drugs. Would you want to live in a society where such activities are encouraged? In the end, we need to look at technological progress (or any progress for that matter) as where it will bring us to in the future, rather than what it allows you to do now.

It also pisses me off that software engineering has such a bad reputation that everyone, from common folks to the CEO of nvidia, is shitting on it. You don’t hear phrases like “AI is going to change medicine/structural engineering”, because you would shit your pants if you had to sit in a dentist chair, while the dentist would ask ChatGPT how to perform a root canal; or if you had to live in a house designed by a structural engineer whose buddy was Claude. And yet, somehow, everyone is ready to throw software engineers under the bus and label them as "useless"/easily replaceable by AI.


I‘m not making money because of AI, I make a lot of money because I‘m a good programmer. My current income has little to do with ai (99% build before GPT). So relax please.


"I'm getting paid so I don't really care" is the most destructive instance a human can take. Why do you think we're about to disrupt the Holocene climate optimum that gave birth to modern civilization?


I’m getting pretty rich programming not using AI. This is an answer to me being a bad coder. My income has nothing to do with AI apart from maybe 1% me being more productive since Claude 3.5 dropped. Be assured I’m not going to destroy the planet.


By yourself? Hardly. We all are by demanding top dollar for sitting on computers while demanding to buy cheap things without limits.

But we're getting paid right? And the Prime van will keep delivering.


I think this is what the kids call "copium". To be honest, when people think like this it makes me smile. I'd rather compete against people programming on punchcards.


Usually I learn my way around the reference docs for most languages I use but CSS has about 50 documents to navigate. I’ve found Copilot does a great job with CSS questions though for Java I really do run into cases where it tells me that Optional doesn’t have a method that I know is there.


LLMs make mediocre engineers into slightly less mediocre engineers, and non-engineers into below mediocre engineers. They do nothing above the median. I've tried dozens of times to use them productively.

Outside of very very short isolated template creation for some kind of basic script or poorly translating code from one language to another, they have wasted more time for me than they saved.

The area they seem to help people, including me, the most in is giving me code for something I don't have any familiarity with that seems plausible. If it's an area I've never worked in before, it could maybe be useful. Hence why the less breadth of knowledge in programming you have, the more useful it is. The problem is that you don't understand the code it produces so you have to entirely be reliant on it, and that doesn't work long term.

LLMs are not and will not be ready to replace programmers within the next few years, I guarantee it. I would bet $10k on it.


Nothing snarky about pointing out AGI is nowhere near


> I'm more productive than ever before.

Who are you and what are you being so productive in?

These code assistants are wholly unable to help with the day to day work I do.

Sometimes I use them to remind me what flags to use with a tarball[0], so replaced SO, but anything of consequence or creativity and they flounder.

What are you getting out of this excess productivity? A pay raise? More time with your loved ones?

[0] https://xkcd.com/1168/ (addressing the tool tip, but hilariously, in regards the comics content that would be a circumstance where I would absolutely avoid trusting one of these ‘assistants’)


I’ll be waiting for these developer benefits to translate into tangible end user benefits in software.


OP could have been more substantive, but there is no contradiction between "current AI tools are sincerely useful" and "overinflated claims about the supposed intelligence of these tools will lead to an AI winter." I am quite confident both are true about LLMs.

I use Scheme a lot, but the 1970s MIT AI folks' contention that LISPs encapsulate the core of human symbolic reasoning is clearly ridiculous to 2020s readers: LISP is an excellent tool for symbolic manipulation and it has no intelligence whatsoever even compared to a jellyfish[1], since it cannot learn.

GPTs are a bit more complicated: they do learn, and transformer ANNs seem meaningfully more intelligent than jellyfish or C. elegans, which apparently lack "attention mechanisms" and, like word2vec, cannot form bidirectional associations. Yet Claude-3.5 and GPT-4o are still unable to form plans, have no notions of causality, cannot form consistent world models[2] and plainly don't understand what numbers actually mean, despite their (misleading) successes in symbolic mathematics. Mice and pigeons do have these cognitive abilities, and I don't think it's because God seeded their brains with millions of synthetic math problems.

It seems to me that transformer ANNs are, at any reasonable energy scale, much dumber than any bird or mammal, and maybe dumber than all vertebrates. There's a huge chunk we are missing. And I believe what fuels AI boom/bust cycles are claims that certain AI is almost as intelligent as a human and we just need a bit more compute and elbow grease to push us over the edge. If AI investors, researchers, and executives had a better grasp of reality - "LISP is as intelligent as a sponge", "GPT is as intelligent as a web-spinning spider, but dumber than a jumping spider" - then there would be no winter, just a realization that spring might take 100 years. Instead we see CS PhDs deluding themselves with Asimov fairy tales.

[1] Jellyfish don't have brains but their nerve nets are capable of Pavlovian conditioning - i.e., learning.

[2] I know about that Othello study. It is dishonest. Unlike those authors, when I say "world model" I mean "world."


I guess it depends on what we mean by "AI winter". I completely agree that the current insane levels of investment aren't justified by the results, and when the market realises this it will overreact.

But at the same time there is a lot of value to capture here by building solid applications around the capabilities that already exist. It might be a winter more like the "winter" image recognition went through before multimodal LLMs than the previous AI winter


I think the upcoming AI bust will be similar to the 2000s dotcom bust - ecommerce was not a bad idea or a scam! And neither are transformers. But there are cultural similarities:

a) childish motivated reasoning led people to think a fairly simple technology could solve profoundly difficult business problems in the real world

b) a culture of "number goes up, that's just science"

c) uncritical tech journalists who weren't even corrupt, just bedazzled

In particular I don't think generative AI is like cryptocurrency, which was always stupid in theory, and in practice it has become the rat's nest of gangsters and fraudsters which 2009-era theory predicted. After the dust settles people will still be using LLMs and art generators.


I see the same way. My current strategy is what I think I should have done in the dotcom bubble: carefully avoid pigeonholing myself in the hype topics while learning the basics so I can set up well positioned teams after the dust settles.


What LLM abilities, if you saw them demonstrated, would cause you to change your mind?


Let's start with a multimodal[1] LLM that doesn't fail vacuously simple out-of-distribution counting problems.

I need to be convinced that an LLM is smarter than a honeybee before I am willing to even consider that it might be as smart as a human child. Honeybees are smart enough to understand what numbers are. Transformer LLMs are not. In general GPT and Claude are both dramatically dumber than honeybees when it comes to deep and mysterious cognitive abilities like planning and quantitative reasoning, even if they are better than honeybees at human subject knowledge and symbolic mathematics. It is sensible to evaluate Claude compared to other human knowledge tools, like an encyclopedia or Mathematica, based on the LLM benchmarks or "demonstrated LLM abilities." But those do not measure intelligence. To measure intelligence we need make the LLM as ignorant as possible so it relies on its own wits, like cognitive scientists do with bees and rats. (There is a general sickness in computer science where one poorly-reasoned thought experiment from Alan Turing somehow outweighs decades of real experiments from modern scientists.)

[1] People dishonestly claim LLMs fail at counting because of minor tokenization issues, but

a) they can count just fine if your prompt tells them how, so tokenization is obviously not a problem

b) they are even worse at counting if you ask them to count things in images, so I think tokenization is irrelevant!


"This is actually good for Bitcoin"


One time long ago there were people living on an island who had never had contact with anybody else. They marveled at the nature around them and dreamed of harnessing it. They looked up at the moon at night and said "Some day we will go there."

But they lived in grass huts and the highest they had ever been off the ground was when they climbed a tree.

One day a genius was born on the island. She built a structure taller than the tallest tree. "I call it a stepladder," she said. The people were amazed. They climbed the stepladder and looked down upon the treetops.

The people proclaimed "All we have to do now is make this a little higher and we can reach the moon!"


This analogy breaks down when you consider what "attention is all you need" did for us.

A mere 5 years ago I was firmly on the camp (alongside linguists mainly) that believed intelligence requires more than just lots of data and a next token predictor. I was clearly wrong and would have lost a $1000 bet it I had put my money where my mouth was back then. Anyone not noticing how far things have come I think are mostly moving goalposts and falling to hindsight bias.

A better analogy is that the genius person in the village built a step ladder made of carbon nanotubules. Some people proclaimed "All with have to do is keep going and we can reach the moon with a very tall ladder!' Other people - many quite smart - proclaimed reasonably: "This is impossible. You are not realizing the unique challenges and materials we have not yet researched we need to build something like that."

Some in society kept building. The ladder kept getting higher. They run into issues like oxygen and balance so the ladder is redesigned into an elevator. Challenges came and were thought insurmountable until redesigns were still found to work with the miraculous carbon nanotubule material which seemed like a panacea for every construction ill.

Regardless of how high the now elevator gets and regardless of how many times the elevator gets higher than what the naysayers firmly believed is impossible the same naysayers keep saying they will never get much higher.

And higher the elevator grows.

Eventually a limit is reached, but that limit ends up being far higher than any naysayers ever thought possible. And when the limit is reached the naysayers all gathered and said "told you this would be the limit and that it was impossible!'

The naysayers failed to see the carbon nanotubules for the revolutionary potential that it had, even if they were correct that it wasn't enough.

And little did everyone know that their society was mere months ago from another genius being born that would give them another catalyst on the order of carbon nano tubules that would again lead to dramatic unexpected and long-term gains to how high the elevator can grow.


Right, but here's the deal. Those same people tried and failed and learned from there mistakes, built rockets instead and still went to the moon.

Kind of an important last bit of the story, there.

Oh and in the span of human history, this happened in a blip of time...

We've barely finished inventing computers and the internet and we already are developing AI.

Ya'll need perspective.


Really feel like this story backfires when you remember that people did go to the moon. It only took 63 years to go from the first airplane to landing on the moon.


but we reached the moon, didn't we?


Impressive stepladder.


...or perhaps unimpressive trees


Reminds me of autonomous vehicles a couple of years back. Or even AI a couple of years back, remember Watson? The hype cycle was faster to close that time.


IBM Watson was more than a couple years back. The Jeopardy event was in 2011. It's currently 2024. As for cars, I don't know what you're referring to specifically, and the hype is still ongoing as far as I can tell.

It has taken 10+ years to get to present day, from the start of the "deep learning revolution" around 2010. I vaguely recall Uber promising self-driving pickups somewhere around 8-10 years ago. A main difference between current AI systems and the systems behind the cyclical hype cycles ongoing since the 1950s is that these systems are actually delivering impressive and useful results, increasingly so, to a much larger amount of people. Waymo alone services tens of thousands of autonomous rides per month (edit: see sibling comment, I was out of date, it's currently hundreds of thousands of rides per month -- but see, increasingly), and LLMs are waaaaay beyond the grandparent's flippant characterization of "plausible-looking but incorrect sentences". That's markov chains territory.


> Waymo alone services tens of thousands of autonomous rides per month (edit: see sibling comment, I was out of date, it's currently hundreds of thousands of rides per month -- but see, increasingly)

But they aren't particularly autonomous, there's a fleet of humans watching the Waymos carefully and frequently intervening for the case where every 10-20 miles or so the system makes a stupid decision that needs human intervention: https://www.nytimes.com/interactive/2024/09/03/technology/zo...

I think Waymo only releases the "critical" intervention rate, which is quite low. But for Cruise the non-critical interventions was every 5 miles and I suspect Waymos are similar. It appears that Waymos are way too easily confused and left to their own devices make awful decisions about passing emergency vehicles, etc.

Which is in fact consistent with what self-driving skeptics were saying all the way back in 2010: deep learning could get you 95% of the way there but it will take many decades - probably centuries! - before we actually have real self-driving cars. The remote human operators will work for robotaxis and buses but not for Teslas.

(Not to mention the problems that will start when robotaxis get old and in need of automotive maintenance, but the system didn't have any transmission problem scenarios in its training data. At no time in my life has my human intelligence been more taxed than when I had a tire blowout on the interstate while driving an overloaded truck.)


The link you gave does not support your claims about Waymo, it's just speculation.

What "critical" intervention rate are you talking about? What network magically supports the required low latencies to remotely respond to an imminent accident?

How does your theory square with events like https://www.sfchronicle.com/sf/article/s-f-waymo-robotaxis-f... that required a service team to physically go and deal with the stuck cars, rather than just dealing with them via some giant remotely intervening team that's managed to scale to 10x rides in a year? (Hundreds of thousands per month absolutely.)

Sure, there's no doubt a lot of human oversight going on still, probably "remote interventions" of all sorts (but not tele-operating) that include things like humans marking off areas of a map to avoid and pushing out the update for the fleet, the company is run by humans... But to say they aren't particularly autonomous is deeply wrong.

I would be interested if you can dig up some old skeptics, plural, saying probably centuries. May take centuries, sure, I've seen such takes, they were usually backed by an assumption that getting all the way there requires full AGI and that'll take who knows how long. It's worth noticing that a lot of such tasks assumed to be "AGI-complete" have been falling lately. It's helpful to be focused on capabilities, not vague "what even is intelligence" philosophizing.

Your parenthetical seems pretty irrelevant. First, models work outside their training sets. Second, these companies test such scenarios all the time. You'll even note in the link I shared that Waymo cars were at the time programmed to not enter the freeway without a human behind the wheel, because they were still doing testing. And it's not like "live test on the freeway with a human backup" is the first step in testing strategy, either.


> What "critical" intervention rate are you talking about? What network magically supports the required low latencies to remotely respond to an imminent accident?

I was being vague - Waymo tests the autonomous algorithms with human drivers before they are deployed in remote-only mode. Those human drivers rarely but occasionally have to yank control from the vehicle. This is a critical intervention, and it seems like the rates are so low that riders almost never encounter a problem (though it does happen). Waymo releases this data, but doesn't release data on "non-critical interventions" where remote operators help with basic problem solving during normal operations. This is the distinction I was making and didn't phrase it very clearly. I think those people are intervening at least every 10-20 miles. And since those interventions always involve common-sense reasoning about some simple edge case, my claim is that the cars need that common-sense reasoning in order to get rid of the humans in the loop. I am not convinced that there's even enough drivers in the world to generate the data current AI needs to solve those edge cases - things like "the fire department ordered brand new trucks and the system can't recognize them because the data literally doesn't exist."

> First, models work outside their training sets.

This is incredibly ignorant, pure "number go up" magical thinking. Models work for simple interpolations outside their training data, but a mechanical failure is not an interpolation, it's a radically different change which current systems must be specifically trained on. AI does not have the ability to causally extrapolate based on physical reasoning like humans. I had never experienced a tire blowout but I knew immediately what went wrong, relying on tactile sensations to determine something was wrong in the rear right + basic conceptual knowledge of what a car is to determine the tire must have exploded. Even deep learning's strongest (reality-based) advocates acknowledge this sort of thinking is far beyond current ANNs. Transformers would need to be trained on the scenario data. There are mitigations that might work: simply coming to a slow stop when a separate tire diagnostic redlines, etc. But these might prove bitter and unreliable.

> Second, these companies test such scenarios all the time.

No they don't! The only company I am aware of which has tested tire blowouts is Kodiak Robotics, and that seemed to be a slick product demo rather than a scientific demonstration. I am not aware of any public Waymo results.


> Which is in fact consistent with what self-driving skeptics were saying all the way back in 2010: deep learning could get you 95% of the way there but it will take many decades - probably centuries! - before we actually have real self-driving cars. The remote human operators will work for robotaxis and buses but not for Teslas.

If this is the end result, this is already a substantial business savings.


Centuries seems like quite a stretch, we haven't even been doing this computer stuff for one century yet.


The problem is not "computers," it's intelligence itself. We still don't know how even the simplest neurons actually work, nor the simplest brains. And we're barely any closer to scientific definitions of "intelligence," "consciousness," etc than we were in the 1800s. There are many decades of experiments left to do, regardless of how fancy computers might be. I suspect it will take centuries before we make dog-level AI because it will take centuries to understand how dogs are able to reason.


Yeah I have no idea what these people are talking about. The current gen of AI is qualitiatively different than previous attempts. For one, GPT et al are already useful without any kind of special prompting.

I'd also like to challenge people to actually consider how often humans are correct. In my experience, it's actually very rare to find a human that speaks factually correctly. Many professionals, including doctors (!), will happily and confidently deliver factually incorrect lies that sound correct. Even after obvious correction they will continue to spout them. Think how long it takes to correct basic myths that have established themselves in the culture. And we expect these models, which are just getting off the ground, to do better? The claim is they process information more similarly to how humans do. If that's true, then the fact they hallucinate is honestly a point in their favor. Because... in my experience, they hallucinate exactly the way I expect humans to.

Please try it, ask a few experts something and I guarantee you that further investigation into the topic will reveal that one or more of them are flat out incorrect.

Humans often simply ignore this and go based on what we believe to be correct. A lot of people do it silently. Those who don't are often labeled know-it-alls.


Yon don't ask a neurosurgeon how to build an house, just like you don't ask a plane pilot how to drill a tunnel. Expertise is localized. And the most important thing is that humans learn.


> And the most important thing is that humans learn.

Implementation detail that will be solved as the price of AI training decreases. Right now only inference is feasible at scale. Transformers are excellent here since they show great promise at 'one shot' learning meaning they can be 'trained' for the same cost as inference. Hence the sudden boom in AI . We finally have a taste of what could be should we be able to not only inference but also train models at scale.


Humans learn from seeing I don't think we are the stage of training models with videos / images dataset. We only reached the plateau with text dataset to train with.


When you do something that is extraordinarily hard, sometimes it takes longer than you expect. But now we're here: https://techcrunch.com/2024/08/20/waymo-is-now-giving-100000...


To be fair, is waymo "only" AI? I'm guessing it's a composite of GPS (car on a rail), some high detailed mapping, and then yes, some "AI" involved in recognition and decision making of course but the car isn't an AGI so to speak? Like it wouldn't know how to change a tyre or fix the engine or drive some where the mapping data isn't yet available ?


Where did I say that it's AGI? I was addressing the parent's comment:

> "Reminds me of autonomous vehicles a couple of years back".

I don't think any reasonable interpretation of "autonomous vehicle" includes the ability to change a tyre. My point is that sometimes hype becomes reality. It might just take a little longer than expected.


Ok maybe I just never saw the hype, just another engineering and data challenge that was going to be solved one way or another.


I see you haven’t tried the latest FSD build from Tesla.


The one that keeps making major, scary mistakes?


Hasn’t made any major mistakes for me. It’s not perfect of course, but still


> The winter after this is gonna be harsh.

The winter is going to be warm because of all the heat generated by GPUs ;)


If this winter comes, the sudden availability of cheap used enterprise GPUs is going to be a major boon for hobbyist AI training. We will all have warm homes and sky high electricity bills


as will the summer, spring and autumn.

global warming is killing us all


"Plausible-looking but incorrect sentences" is cheap, reflexive cynicism. LLMs are an incredible breakthrough by any reasonable standard. The reason to be optimistic about further progress is that we've seen a massive improvement in capabilities over the past few years and that seems highly likely to continue for the next few (at least). It's not going to scale forever, but it seems pretty clear that when the dust settles we'll have LLMs significantly more powerful than the current cutting edge -- which is already useful.

Is it going to scale to "superintelligence?" Is it going to be "the last invention?" I doubt it, but it's going to be a big deal. At the very least, comparable to google search, which changed how people interact with computers/the internet.


>when the dust settles we'll have LLMs significantly more powerful than the current cutting edge -- which is already useful.

LLMs, irrespective of how powerful, are all subject to the fundamental limitation that they don't know anything. The stochastic parrot analogy remains applicable and will never be solved because of the underlying principles inherent to LLMs.

LLMs are not the pathway to AGI.


I sometimes wonder if we’re just very advanced stochastic parrots.

Repeatedly, we’ve thought that humans and animals were different in kind, only to find that we’re actually just different in degree: elephants mourn their dead, dolphins have sex for pleasure, crows make tools (even tools out of multiple non-useful parts! [1]). That could be true here.

LLMs are impressive. Nobody knows whether they will or won’t lead to AGI (if we could even agree on a definition – there’s a lot of No True Scotsman in that conversation). My uneducated guess is that that you’re probably right: just continuing to scale LLMs without other advancements won’t get us there.

But I wish we were all more humble about this. There’s been a lot of interesting emergent behavior with these systems, and we just don’t know what will happen.

[1]: https://www.ox.ac.uk/news/2018-10-24-new-caledonian-crows-ca...


I swear I read this exact same thread in nearly every post about OpenAI on HN. It's getting to a point where it almost feels like it's all generated by LLMs


You mean the standard refrain of "we too are stochastic parrots"? Yes, that argument gets trotted out over and over.

LLM proponents seem unwilling to accept that we comprehend the words we speak/write in a way that LLMs are not capable of doing.


I was referring to the whole thread, so it includes the "LLMs are nothing but stochastic parrots" bit too.


> LLM proponents seem unwilling to accept that we comprehend the words we speak/write in a way that LLMs are not capable of doing.

Maybe their salary depends on them not understanding it.


Networks correspond to diagrams correspond to type theories — and LLMs learn such a theory and reason in that internal language (as in, topos theory).

That effective theory is knowledge, literally.

People harping about “stochastic parrot” are just people repeating a shallow meme — ironically, like a stochastic parrot.


In the scheme of things I'd say most people don't know shit. And that's perfectly fine because we can't reasonably expect the average person to know all the things.

LLM models are very far off from humans in reasoning ability, but acting like most of the things humans do aren't just riffing on or repeating previous data is wrong, imo. As I've said before, humans been the stochastic parrots all along.


Arguing over terminology like "AGI" and the verb "to know" is a waste of time. The question is what tools can be built from them and how can people use those tools.


Agreed.

I thought a forum of engineers would be more interested in the practical applications and possible future capabilities of LLMs, than in all these semantic arguments about whether something really is knowledge or really is art or really is perfect


I'm directly responding to a comment discussing the popular perception that we, as a society, are "steps away" from AGI. It sounds like you agree that we aren't anywhere close to AGI. If you want to discuss the potential for LLMs to disrupt the economy there's definitely space for that discussion but that isn't the comment I was making.


Whether we should call what LLMs do “knowing” isn’t really relevant to how far away we are from AGI, what matters is what they can actually do, and they can clearly do at least some things that show what we would call knowledge if a human did it, so I think this is just humans wanting to feel we’re special


>they can clearly do at least some things that show what we would call knowledge if a human did it

Hard disagree. LLMs merely present the illusion of knowledge to the casual observer. A trivial cross examination usually is sufficient to pull back the curtain.


Noam Chomsky and Doug Hofstader had the same opinion. Last I checked Doug has recanted his skepticism and is seriously afraid for the future of humanity. I’ll listen to him and my own gut than some random internet people still insisting this is all a nothing burger.


The thing is my gut is telling me this is a nothing burger, and I'll listen to my own gut before yours - a random internet person insisting this is going to change the world.

So what exactly is the usefulness of this discussion? You think "I'll trust my gut" is a useful argument in a debate?


Trusting your gut isn't a useful debate tactic, but it is a useful tool for everybody to use personally. Different people will come to different conclusions, and that's fine. Finding a universal consensus about future predictions will never happen, it's an unrealistic goal. The point of the discussion isn't to create a consensus; it's useful because listening to people with other opinions can shed light on some blind spots all of us have, even if we're pretty sure the other guys are wrong about all or most of what they're saying.

FWIW my gut happens to agree with yours.


I'm convinced that the "LLMs are useless" contingent on HN is just psychological displacement.

It hurts the pride of technical people that there's a revolution going on that they aren't involved in. Easier to just deny it or act like it's unimpressive.


Or it's technical people who have been around for a few of these revolutions, which revolved and revolved until they disappeared into nothing but a lot of burned VC, to recognise the pattern? That's where I'd place my own cynicism. My bullshit radar has proven to be pretty reliable over the past few decades in this industry, and it's been blaring on highest levels for a while about this.


Deep learning has already proven its worth. Google translate is an example on the older side. As LLMs go, I can take a picture of a tree or insect and upload it and have an LLM identify it in seconds. I can paste a function that doesn't work into an LLM and it will usually identify the problems. These are truly remarkable steps forward.

How can I account for the cynicism that's so common on HN? It's got to be a psychological mechanism.


> "Plausible-looking but incorrect sentences" is cheap, reflexive cynicism. LLMs are an incredible breakthrough by any reasonable standard

No it isn't. The previous state of the art was markov chain level random gibberish generation. What OP described is an enormous step up from that.


> and that seems highly likely to continue for the next few (at least)

Why? Text training data is already exhausted.


Yes, turns out in the context of machines, all of the names we've given to things and concepts is not a very large set in the scheme of things.

Next focus will hopefully be on reasoning abilities. Probably gonna take another decade and a similar paper to attention is all you need before we see any major improvements...but then again all eyes are on these models atm so perhaps it'll be sooner than that.


>"Plausible-looking but incorrect sentences" is cheap, reflexive cynicism.

Literally today I used Bing and it was making up API parameters.

Code example looked fine, but didnt reflect reality.


People saying LLM to replace programming jobs is like saying blockchain is going to replace home/car titles and proof of ownership.


> The winter after this is gonna be harsh.

That'll be investor types who bring this stupid "winter" on, because they run their lives on hype and baseless predictions.

Technology types on the other hand don't give a shit about predictions, and just keep working on interesting stuff until it happens, whether it takes 1 year or 20 years or 500 years. We don't throw a tantrum and brew up a winter storm just because shit didn't happen in the first year.

In early 2022 there was none of this ChatGPT stuff. Now, we're only 2 years later. That's not a lot of time for something already very successful. Humans have been around for several tens of thousand years. Just be patient.

If investors ran the show in the 1960s expecting to reach the moon with an 18 month runway, we'd never have reached the moon.


>The winter after this is gonna be harsh.

The difference is current ML already has real use cases right now in its current form. Some examples are OCR, text to speech, speech to text, translation, recommendations (for eg. Facebook Tiktok etc.) and simple NLP tasks ("was [topic] mentioned in the following paragraph"). Even if AGI is proved impossible, these are real use cases that hold billions in value. And ML research is also considered a prestigious and interesting field academically and that will likely not change even if investors give up on funding AGI.


> OCR, text to speech, speech to text, translation, recommendations

You missed the point of the parent comment's post. He's talking about the current post chatbot GenAI hype (i.e., the massive amounts of funding being poured into companies specifically after this turning point).


Two points:

1. You don't need massive amounts of funding to work on ML. A good deal of important work in ML was done in universities (eg. GANs, DPM, DDPM, DDIM) or were published before the hype (Attention). The only qualifier here is that training cost a lot right now. Even so, you don't need billions to train and costs may go down as memory costs come down and hardware competition increases.

2. You don't need VC type investors to fund ML research. Large tech companies like Facebook, Google, Microsoft, ByteDance and Huawei will continue investing in ML no matter what, even if the total amount they invest goes down (which I personally don't think it will). Even if they shift away from chatbots and only focus on simpler NLP tasks as described above, related research will still continue as all these tasks are related. For example, Attention was originally developed for translation and Llama 3.2 isn't just a chatbot and can also do general image description, which is clearly important to Facebook and ByteDance for recommendations and to Google for image search and ads. Understating what people like and what they are looking at is a difficult NLP problem and one that many tech companies would like to solve. And better image descriptions could then improve existing image datasets by allowing better text-image pairs, which could then improve image generation. So hard NLP, image generation and translation are all related and are increasingly converging into single multimodal LLMS. That is, the best OCR, image generation translation etc. models may be ones that also understand language in general (ie. broad and difficult NLP tasks). The issue is that OP assumes it must be AGI or bust.


AI (or more properly, ML) is all around us and creating value everywhere. This is true whether or not we EVER reach AGI. Honestly, I stop reading/listening whenever I read/hear mention of AGI.


Also causing lots of harm, too, e.g. police departments using AI systems that get the wrong suspect.


As per usual people blaming the tool when they should blame the tools using the tool.

The fault lies with humans using AI for something sensitive, without having the AI pass through certification etc. Part of the problem is glacial pace of laws around things, but that's nothing new isn't it; us humans being whiny, argumentative, inefficient, emotional meat bags about every little thing. I wonder, once we do make AGI, if it will wonder why it took us so damn long to tax the disgustingly wealthy, implement ww public healthcare, UBI, etc and solve the housing crisis by gasp building more houses...

We evolved, so our deep, deep underlying motivations pretty much always circulate around self-preservation and reproduction (resource contention).


Ideally your court system does not permit AI testimony.


Court dates can be a long time after the initial arrest. Some people have been held months to years in pre-court jail, even after they've been cleared of any wrong doing, because they can't afford the release fee. But even a few days could lose you your job, your kids if you're a single parent, your car or housing if you miss a payment, etc.


The damage is done by these systems long before any courts get involved.


I think this is one of those "controversial" topics where we're meant to be particularly careful to make substantial comments.


I think it's substantial to say that AI is currently overhyped because it's hitting a weak spot in human cognition. We sympathize with inanimate objects. We see faces in random patterns.

If a machine spits out some plausible looking text (or some cookie-cutter code copy-pasted from Stack Overflow) the human brain is basically hardwired to go "wow this is a human friend!". The current LLM trend seems designed to capitalize on this tendency towards sympathizing.

This is the same thing that made chatbots seem amazing 30 years ago. There's a minimum amount of "humanness" you have to put in the text and then the recipient fills in the blanks.


> If a machine spits out some plausible looking text (or some cookie-cutter code copy-pasted from Stack Overflow)

This is not a reasonable take on the current capabilities of LLMs.


It’s certainly been my experience with the technology.


But if nearly everyone else is saying this has real value to them and it's produced meaningful code way beyond what's in SO, then doesn't that just mean your experience isn't representative of the overall value of LLMs?


It could also mean a lot of people are misattributing their utility.


I don't think most people who are reasonably into AI think we're on the cusp of AGI. But I think it's made a lot of people who previously said "it will never be possible" rethink their feelings about it.

Definitely in the coming decade, we can prepare for a lot of the simpler tasks in an office to be taken over by AI. There are plenty of scenarios in which someone is managing a spreadsheet because an SME doesn't have the money to hire developers to automate & maintain that process - with advanced LLMs they can get it done by asking it to.


I'm definitely of two minds on this topic: 1. I'm getting value out of the current batch of models in the form of accurate Q&A/summaries as well as tweaking or generating clear prose or even useful imagery well beyond what I'd ever considered possible from computers before the last two years. 2. It definitely has limits and can be a struggle to get exactly what I want and the more I try to refine something the worse it gets if the initial answer wasn't perfect.

It really feels like a substantive step forward in terms of computer utility kind of like spreadsheets, databases, apps. We'll see how far it takes us down the line of human replacement though.


You are absolutely correct about where we are but don't underestimate what 100s of billions of $ can build as well. There are already credible teams working on "math AI" and "truth AI" which will likely end up combing bullshit generating LLMs with traditional but automated relational db retrievals and produce output that is both believable and correct.

IMO it will be done vertical by vertical, with no standard interface coming for a while.


That’s fine, the winter after this will be a productive one, because even a simulacrum of intelligence that GPT is, is useful to some extent.


I appreciate skepticism and differing opinions, but I'm always surprised by comments like these because it's just so different from my day-to-day usage.

Like, are we using entirely different products? How are we getting such different results?


I think the difference is people on HN are using these "AI" tools as coding assistance. For which, if you know what you're doing, they are pretty useful. They save trips to stack overflow or documentation diving and can spit out code that often is less time to fix/customize than it would have been to write. Cool.

A lot of the rest of the world are using it for other things. And at these other things, the results are less impressive. If you've had to correct a family member who got the wrong idea from whatever chat bot they asked, if you've ever had to point out the trash writing in an email someone just trusted AI to write on their behalf before it got sent to someone that mattered, or if you've ever just spent any amount of time on twitter with grok users, you should be exceptionally and profoundly aware of how unimpressive AI is for the rest of the world.

I feel we need less people complaining about the skepticism on HN and more people who understand these skeptics that hang out here already know how wonderful a productivity boost you're getting from the thing they're rightly skeptical about. Countering with "But my code productivity is up!" is next to useless information on this site.


I don't see why my personal anecdote is any less useful than GP's claim. GP's comment isn't nuanced skepticism about product gaps, or concrete examples of inaccuracy. It's a wholesale dismissal of any utility. AGI isn't even mentioned in the article. This also seems "next to useless".

I appreciate your anecdotes on failures/embarrassment for people outside of tech- there's pretty clearly a gap in experience, understanding, and marketing hype.

I don't think it's useless to ask what that gap is, and why GP got such poor results.


George Zarkadaki: In Our Own Image (2015) describes six metaphors people have used to explain human intelligence in the last two millennia. At first it was the gods infusing us with spirit. After that it's always been engineering: after the first water clocks and the qanat hydraulics seemed a good explanation of everything. The flow of different fluids in the body, the "humors" explained physical and mental function. Later it was mechanical engineering. Some of the greatest thinkers of the 1500s and 1600s -- including Descartes and Hobbes -- assured us it was tiny machines, tiny mechanical motions. In the 1800s Hermann von Helmholtz compared the brain to the telegraph. So of course after the invention of computers came the metaphor of the brain as a computer. This became absolutely pervasive and we have a very hard time describing our thinking without falling back to this metaphor. But, of course, it's just a metaphor and much as our brain is not a tiny machine made out of gear it's also not "prima facie digital" despite that's what John von Neumann claimed in 1958. It is, indeed, quite astonishing how everyone without any shred of evidence just believes this. It's not like John von Neumann gained some sudden insight into the actual workings of the brain. Much as his forefathers he saw semblance in the perceived workings of the brain and the latest in engineering and so he stated immediately that's what it is.

Our everyday lives should make it evident how much the working of our brain doesn't resemble that of our computers. Our experiences change our brains somehow but exactly how we don't have the faintest idea about and we can re-live these experiences somewhat which creates a memory but the mechanism is by no means perfect. There's the Mandela Effect https://pubmed.ncbi.nlm.nih.gov/36219739/ and of course "tip of my tongue" where we almost remember a word and then perhaps minutes or hours later it just bursts into our consciousness. If it's a computer why is learning so hard? Read something and bam, it's written in your memory, right? Right? Instead, there's something incredibly complex going on, in 2016 an fMRI study was made among the survivors of a plane crash and large swaths of the brain lit up upon recall. https://pubmed.ncbi.nlm.nih.gov/27158567/ Our current best guess is somehow its the connections among neurons which change and some of these connections together form a memory. There are 100 trillion connections in there so we certainly have our task cut.

And so we are here where people believe they can copy human intelligence when they do not even know what they are trying to copy falling for the latest metaphor of the workings of the human brain believing it to be more than a metaphor.


Helmholtz didn't say a brain was like a telegraph; he was talking about the peripheral nervous system. And he was right, signals sent from visual receptors and pain receptors are the same stuff being interpreted differently, just as telegraphing "Blue" and "Ouch" would be. That and the spirit of the gods have no place on this list and strain the argument.

Hydraulics, gear systems, and computers are all Turing complete. If you're not a dualist, you have to believe that each of these would be capable of building a brain.

The history described here is one where humans invent a superior information processor, notice that it and humans both process information, and conclude that they must be the same physically. The last step is obviously flawed, but they were hardly going to conclude that the brain processes information with electricity and neurotransmitters when the height of technology was the gear.

Nowadays, we know the physical substrate that the brain uses. We compare brains to computers even though we know there are no silicon microchips or motherboards with RAM slots involved. We do that because we figured out that it doesn't matter what a machine uses to compute; if it is Turing complete, it can compute exactly as much as any other computer, no more, no less.


That's interesting, but technology has always been about augmenting or mimicking human intelligence, though. The Turing test is literally about computers being able to mimic humans so well that real humans wouldn't be able to tell them apart. We're now past that point in some areas, but we never really prioritized thinking about what intelligence _actually_ is, and how we can best reproduce it.

At the end of the day, does it matter? If humans can be fooled by artificial intelligence in pretty much all areas, and that intelligence surpasses ours by every possible measurement, does it really matter that it's not powered by biological brains? We haven't quite reached that stage yet, but I don't think this will matter when we do.


> If humans can be fooled by artificial intelligence in pretty much all areas,

This is just preposterous. You can be fooled if you have no knowledge in the area but that's about it. With current tech there is, there can not be anything novel. Guernica was novel. No matter how you train any probabilistic model on every piece of art produced before Guernica it'll never ever create it.

There are novel novels (sorry for the pun) every few years. They delight us with genuinely new turns of prose, unexpected plot twists etc.

Also harken to https://garymarcus.substack.com/p/this-one-important-fact-ab... which also happens to include a verb made up on spot.

And yes we have cars which move faster than a human can but they don't compete in high jumps or climb rock walls. Despite we have a fairly good idea about the mechanical workings of the human body, muscles and joints and all that we can't make a "tin man", not by far. As impressive as Boston Dynamics demos are they are still very very far from this.


> With current tech there is, there can not be anything novel.

I wasn't talking about current tech, which is obviously not at human levels of intelligence yet. I would still say that our progress in the last 100 years, and the last 50 in particular, has been astonishing. What's preposterous is expecting that we can crack a problem we've been thinking about for millennia in just 100 years.

Do you honestly think that once we're able to build AI that _fully_ mimics humans by every measurement we have, that we'll care whether or not it's biological? That was my question, and "no" was my answer. Whether we can do this without understanding how biological intelligence works is another matter.

Also, AI doesn't even need to fully mimic our intelligence to be useful, as we've seen with the current tech. Dismissing it because of this is throwing the baby out with the bath water.


> Do you honestly think that once we're able to build AI that _fully_ mimics humans by every measurement we have,

What made you think that is measurable and if it is then we can build something like that ever?

I already linked https://garymarcus.substack.com/p/this-one-important-fact-ab... did you read it?


> What made you think that is measurable and if it is then we can build something like that ever?

What makes you think it isn't, and that we can't? The Turing test was proposed 75 years ago, and we have many cognitive tests today which current gen AI also passes. So we clearly have ways of measuring intelligence by whatever criteria we deem important. Even if those measurements are flawed, and we can agree that current AI systems don't truly understand anything but are just regurgitation machines, this doesn't matter for practical purposes. The appearance of intelligence can be as useful as actual intelligence in many situations. Humans know this well.

Yes, I read the article. There's nothing novel about saying that current ML tech is bad at outliers, and showcasing hallucinations. We can argue about whether the current approaches will lead to AGI or not, but that is beside the point I was making originally, which you keep ignoring.

Again, the point is: if we can build AI that mimics biological intelligence it won't matter that it's not biological. And a sidenote of: even if we're not 100% there, it can still be very useful.


Again, the point is: you can not build AI that mimics biological intelligence because you do not even have any idea what biological intelligence even is. Once again, what's Picasso's velocity of painting?


How does agriculture, or cars, or penicillin augment or mimick human intelligence?


That's beside my point, but they augment it. Agtech enhances our ability to feed ourselves; cars enhance our locomotor skills; medicine enhances our self-preservation skills, etc.


120 IQ on a mensa test. https://archive.ph/OZ0sj


Re:AGI

LLMs don’t have any ability to choose to update their policies and goals and decide on their own data acquisition tasks. That’s one of the key needs for an AGI. LLM systems just don’t do that / they are still primarily offline inference systems with mostly hand crafted data pipelines offline rlhf shaping etc…

There’s only a few companies working on on-policy RL in physical robotics. That’s the path to AGI

OpenAI is just another ad company with a really powerful platform and first mover advantage.

They are over leveraged and don’t have anywhere to go or a unique dataset.


They do — but only when they’re trained on past outputs and with a willing partner.

For instance, a number of my conversations with ChatGPT contain messages it attempted to steer its own future training with (were those conversations to be included in future training).


No but it's the first time that we have a clear picture how it could go.

And.its not just incorrect sentences it's weird questions which are getting answered a lot better than ever before.

Why are you so dismissive? Have you ever talked or wrote with a computer which felt anything like a modern LLM? I have not


this attitude is so ridiculously disingenuous. Surely if a computer can score incredibly well on math olympiad questions, among other things, "a computer can make plausible-looking but incorrect sentences" is dismissive at best.

I have no idea about AGI but honestly how can you use claude or chatgpt and come away unimpressed? It's like looking at spaceX and saying golly the space winter is going to be harsh because they haven't gotten to Mars yet.


There's a big difference between those two examples.

Mars is hard but there are paths forward. More efficient engines, higher energy density fuels, lighter materials, better shielding, etc, etc. It's hard but there are paths forward to make it possible with enough time and money. We have an understanding of how to get from what we have now to what makes Mars possible.

With LLMs, there is no path from LLM -> gAI. No amount of time, money or compute will make that happen. They are fundamentally a very 'simple' tool that is only really capable of one thing - predicting text. There is no intelligence. There is no understanding. There is no creativity or problem solving or thought of any kind. They just spit out text based on weighted probabilities. If you want gAI you have to go in a completely different direction that has no relationship with LLM tools.

Don't get me wrong, the work that's been done so far took a long time and is incredibly impressive, but it's a lot more smoke and mirrors than most people realize.


I'll grant that we could send humans to mars sooner if we really wanted to. My point is that not achieving a bigger dream doesn't make current progress a hype wave followed by a winter.

And "LLM's just make plausible looking but incorrect text" is silly when that text is more correct than the average adult a large percentage of the time.


_unless_ intelligence is really mostly an emergent property of something very similar to language, in which case we're most of the way there




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: