The current AI hype wave has really hit a nerd soft spot - that we're steps away...

JanSt · 2024-09-27T11:50:56 1727437856

Using Claude 3.5 Sonnet in Cursor Composer already shows huge benefits for coding. I'm more productive than ever before. The models are still getting better and better. I'm not saying AGI is right around the corner or that we will reach it, but the benefits are undeniable. o1 added test-time compute. No need to be snarky.

TheCondor · 2024-09-27T14:27:26 1727447246

It’s not snark, our industry is run on fear. If there is the tiniest flicker of potential, we will spend piles of money out of fear of being left behind. As you age, it becomes harder to deny.. 10 years ago, I was starting to believe that my kids would never learn to drive or possibly buy a car, here we are ten years later and not that much has changed, I know you can take a robotaxi in some cities but nearly all interstate trucking has someone driving.

Coding AI assistants have done some impressive things, I’ve been amazed at how they sniffed out some repetitive tasks I was hacking on and I just tab completed pages of code that was pretty much correct. There is use. I pay for the feature. I don’t know if it’s worth 35% of the world’s energy consumption and all new fabrication resources over the next handful of years being dedicated to ‘ai chips.’ We arent looking for a better 2.0, we are expecting an exponentially better “2.0” and those are very rare.

dmix · 2024-09-27T15:41:49 1727451709

That doesn't mean this is a bad investment for VCs. GPT is being directly integrated in iOS and is a top app on both markets. We've also barely scrapped the surface for potential niche applications that go beyond just a generalist chatbox interface. API use will likely continue to explode as the mountain of startups building off it come online. Voice stuff will probably kill off Alexa/Google Home.

I don't think the bulk of this VC money is predicated on AGI being around the corner.

But the general trend hopping nature of big VC money is real. Still, VCs manage to continue to make a profit despite this, otherwise the industry would have died off or shrunk the 10 other years HN critiqued this behaviour, so on the whole they must be doing something right.

tdeck · 2024-09-27T16:22:29 1727454149

VCs mostly make money by selling a narrative about investing in the next big thing and then collecting management fees, not by beating the market. If the public sours on AI we need a new hype to replace it and keep tech money flowing at the same rate. A lot of funding seems to follow fads and be disproportionate to value generated (I remember when there were a bazillion people building social networks because that was hot).

KoolKat23 · 2024-09-27T17:45:46 1727459146

There is a tremendous opportunity in bridging the gap between Can be automated and Isn't automated due to technical/cost/time limitation. GPT's are perfect for this.

There are so many things that can be automated out there that currently aren't. Other industries are extremely manual and process driven still. Many here tend to underestimate this.

Some programmers here will argue it's error prone or creating technical debt but most people don't care, if it works it works, one can worry about it breaking in 5 years time after its saved you considerably time and money.

TheCondor · 2024-09-27T21:00:18 1727470818

Don’t get me wrong, there is some very cool and useful stuff there. I think it’s a bit disingenuous to even talk about AGI at this point though and when you look at the power requirements and the need for $7trillion in investment just to build chips, I really don’t know. Underdeliver and AI loses some of the hype again, and like the parent post said, it will be a long winter. Are gpts worth more than Apple, MS, Alphabet and Amazon all together?

cma · 2024-09-27T17:25:12 1727457912

> ten years later and not that much has changed, I know you can take a robotaxi in some cities but

Uhh, that's a pretty big change.

jsheard · 2024-09-27T11:54:46 1727438086

There's no accounting for taste, but keep in mind that all of these services are currently losing money, so how much would you actually be willing to pay for the service you're currently getting in order to let it break even? There was a report that Microsoft is losing $20 for every $10 spent on Copilot subscriptions, with heavy users costing them as much as $80 per month. Assuming you're one of those heavy users, would you pay >$80 a month for it?

Then there's chain-of-thought being positioned as the next big step forwards, which works by throwing more inferencing at the problem, so that cost can't be amortized over time like training can...

binocarlos · 2024-09-27T12:00:20 1727438420

I would pay hundreds of dollars per month for the combination of cursor and claude - I could not get my head around it when my beginner lever colleague said "I just coded this whole thing using cursor".

It was an entire web app, with search filters, tree based drag and drop GUIs, the backend api server, database migrations, auth and everything else.

Not once did he need to ask me a question. When I asked him "how long did this take" and expected him to say "a few weeks" (it would have taken me - a far more experienced engineer - 2 months minimum).

His answer was "a few days".

What I'm not saying is "AGI is close" but I've seen tangible evidence (only in the last 2 months), that my 20 year software engineering career is about to change and massively for the upside. Everyone is going to be so much more productive using these tools is how I see this.

aniviacat · 2024-09-27T12:26:15 1727439975

Current LLMs fail if what you're coding is not the most common of tasks. And a simple web app is about as basic as it gets.

I've tried using LLMs for some libraries I'm working on, and they failed miserably. Trying to make an LLM implement a trait with a generic type in Rust is a game of luck with very poor chances.

I'm sure LLMs can massively speed up tasks like front-end JavaScript development, simple Python scripts, or writing SQL queries (which have been written a million times before).

But for anything even mildly complex, LLMs are still not suited.

dathinab · 2024-09-27T12:33:54 1727440434

I don't think if complexity is the right metric.

front-end JS can easily also become very complex

I think a better metric is how close you are to reinventing a wheel for the thousands time. Because that is what LLMs are good at: Helping you write code which nearly the same way has already been written thousands of times.

But that is also something you find in backend code, too.

But that is also something where we as a industry kinda failed to produce good tooling. And worse if you are in the industry it's kinda hard to spot without very carefully taking a hounded (mental) steps back from what you are used to and what biases you might have.

mrybczyn · 2024-09-27T12:47:58 1727441278

LLM Code Assistants have succeeded at facilitating reusable code. The grail of OOP and many other paradigms.

We should not have an entire industry of 10,000,000 devs reinventing the JS/React/Spring/FastCGi wheel. Im sure those humans can contribute in much better ways to society and progress.

itishappy · 2024-09-27T13:13:17 1727442797

> LLM Code Assistants have succeeded at facilitating reusable code.

I'd have said the opposite. I think LLMs facilitate disposable code. It might use the same paradigms and patterns, but my bet is that most LLM written code is written specifically for the app under development. Are there LLM written libraries that are eating the world?

dbmikus · 2024-09-27T13:49:57 1727444997

I believe you're both saying the same thing. LLMs write "re-usable code" at the meta level.

The code itself is not clean and reusable across implementations, but you don't even need that clean packaged library. You just have an LLM regenerate the same code for every project you need it in.

The LLM itself, combined with your prompts, is effectively the reusable code.

Now, this generates a lot of slop, so we also need better AI tools to help humans interpret the code, and better tools to autotest the code to make sure it's working.

I've definitely replaced instances where I'd reach for a utility library, instead just generating the code with AI.

I think we also have an opportunity to merge the old and the new. We can have AI that can find and integrate existing packages, or it could generate code, and after it's tested enough, help extract and package it up as a battle tested library.

itishappy · 2024-09-27T14:45:53 1727448353

Agreed. But this terrifies me. The goal of reusable code (to my mind) is that with everybody building from the same foundations we can enable more functional and secure software. Library users contributing back (even just bug reports) is the whole point! With LLMs creating everything from scratch, I think we're setting ourselves on a path towards less secure and less maintainable software.

thelastparadise · 2024-09-27T16:14:28 1727453668

I (20+ years experience programmer) find it leads to a much higher quality output as I can now afford to do all the mundane, time-consuming housekeeping (refactors, more tests, making things testable).

E.g. let's say I'm working on a production thing and features/bugfixes accumulate and some file in the codebase starts to resemble spaghetti. The LLM can help me unfuck that way faster and get to a state of very clean code, across many files at once.

erosivesoul · 2024-09-27T18:07:15 1727460435

What LLM do you use? I've not gotten a lot of use out of Copilot, except for filling in generic algorithms or setting up boilerplate. Sometimes I use it for documentation but it often overlooks important details, or provides a description so generic as to be pointless. I've heard about Cursor but haven't tried it yet.

dbmikus · 2024-09-27T18:55:15 1727463315

Cursor is much better than Copilot. Also, change it to use Claude, and then use the Inspector with ctrl-I

KoolKat23 · 2024-09-27T18:10:52 1727460652

This is the thing it works both ways, it's really good at interpreting existing codebases too.

Could potentially mean just a change in time allocation/priority. As it's easier and faster to locate and potentially resolve issues later, it is less important for code to be consistent and perfectly documented.

Not fool proof and who knows how that could evolve, but just an alternative view. One of these big names in the industry said we'll have AGI when it speaks it's own language. :P.

znpy · 2024-09-27T12:50:36 1727441436

I had similar experiences:

1. Aasked ChatGPT to write a simple echo server in C but with this twist: use io_uring rather than the classic sendmsg/recvmsg. The code it spat out wouldn't compile, let alone work. It was wrong on many points. It was clearly pieces of who-knows-what cut and pasted together. However after having banged my head on the docs for a while I could clearly determine from which sources the code io_uring code segments were coming. The code barely made any sense and it was completely incorrect both syntactically and semantically.

2. Asked another LLM to write an AWS IAM policy according to some specifications. It hallucinated and used predicates that do not exist at all. I mean, I could have done it myself if I just could have made predicates up.

> But for anything even mildly complex, LLMs are still not suited.

Agreed, and I'm not sure we are any close to them being.

mattgreenrocks · 2024-09-27T14:49:32 1727448572

Yep. LLMs don’t really reason about code, which turns out to not be a problem for a lot of programming nowadays. I think devs don’t even realize that the substrate they build on requires this sort of reasoning.

This is probably why there’s such a divide when you try to talk about software dev online. One camp believes that it boils down to duct taping as many ready made components together all in pursuit of impact and business value. Another wants to really understand all the moving parts to ensure it doesn’t fall apart.

typedef_struct · 2024-09-27T16:34:15 1727454855

My test is to take a sized chunk of memory containing a TrueType/OpenType font and output a map of glyphs to curves. Bot is nowhere close.

PaulHoule · 2024-09-27T12:48:27 1727441307

Roughly LLMs are great at things that involve a series of (near) 1-1 correspondences like “translate 同时采访了一些参与其中的活跃用户 to English” or “How do I move something up 5px in CSS without changing the rest of the layout?” but if the relationship of several parts is complex (those Rust traits or anything involving a fight with the borrow checker) or things have to go in some particular order it hasn’t seen (say US states in order of percent water area) they struggle.

SQL is a good target language because the translation from ideas (or written description) is more or less linear, the SQL engine uses entirely different techniques to turn that query into a set of relational operators which can be rewritten for efficiency and compiled or interpreted. The LLM and the SQL engine make a good team.

infecto · 2024-09-27T12:47:17 1727441237

I’d bet that about 90% of software engineers today are just rewriting variations of what’s already been done. Most problems can be reduced to similar patterns. Of course, the quality of a model depends on its training data—if a library is new or the language isn’t widely used, the output may struggle. However, this is a challenge people are actively working on, and I believe it’s solvable.

LLMs are definitely suited for tasks of varying complexity, but like any tool, their effectiveness depends on knowing when and how to use them.

ben_w · 2024-09-27T12:30:27 1727440227

> Current LLMs fail if what you're coding is not the most common of tasks

Succeeding on the most common tasks (which isn't exactly what you said) is identical to "they're useful".

abm53 · 2024-09-27T12:51:01 1727441461

And I would go further… these “common tasks” cover 80% of the work in even the most demanding engineering or research positions.

layer8 · 2024-09-27T13:35:51 1727444151

That’s absolutely not my experience. I struggle to find tasks in my day to day work where LLMs are saving me time. One reason is that the systems and domains I work with are hardly represented at all on the internet.

scruple · 2024-09-27T13:58:29 1727445509

I have the same experience. I'm in gamesdev and we've been encouraged to test out LLM tooling. Most of us at/above the senior level report the same experience: it sucks, it doesn't grasp the broader context of the systems that these problems exist inside of, even when you prompt it as best as you can, and it makes a lot of wild assed, incorrect assumptions about what it doesn't know and which are often hard to detect.

But it's also utterly failed to handle mundane tasks, like porting legacy code from one language and ecosystem to another, which is frankly surprising to me because I'd have assumed it would be perfectly suited for that task.

nicolas_t · 2024-09-27T14:35:18 1727447718

In my experience, AI for coding is having a rather stupid very junior dev at your beck and call but who can produce the results instantly. It's just often very mediocre and getting it fixed often takes longer than writing it on your own.

ben_w · 2024-09-28T12:20:21 1727526021

My experience is that it varies a lot by model, dev, and field — I've seen juniors (and indeed people with a decade of experience) keeping thousands of lines of unused code around for reference, or not understanding how optionals work, or leaving the FAQ full of placeholder values in English when the app is only on the German market, and so on. Good LLMs don't make those mistakes.

But the worst LLMs? One of my personal tests is "write Tetris as a web app", and the worst local LLM I've tried, started bad and then half way through switched to "write a toy ML project in python".

abm53 · 2024-10-05T12:52:33 1728132753

I think this illustrates the biggest failure mode when people start using LLMs: asking it to do too much in one step.

It’s a very useful tool, not magic.

bee_rider · 2024-09-27T13:47:14 1727444834

> Not once did he need to ask me a question. When I asked him "how long did this take" and expected him to say "a few weeks" (it would have taken me - a far more experienced engineer - 2 months minimum).

> Current LLMs fail if what you're coding is not the most common of tasks. And a simple web app is about as basic as it gets.

These two complexity estimates don’t seem to line up.

fhd2 · 2024-09-27T13:01:51 1727442111

That's still valuable though: For problem validation. It lowers the table stakes for building any sort of useful software, which all start simple.

Personally, I just use the hell out of Django for that. And since tools like that are already ridiculously productive, I don't see much upside from coding assistants. But by and large, so many of our tools are so surprisingly _bad_ at this, that I expect the LLM hype to have a lasting impact here. Even _if_ the solutions aren't actually LLMs, but just better tools, since we reconfigured how long something _should_ take.

skydhash · 2024-09-27T13:35:08 1727444108

The problem Django solves is popular, which is why we have so many great frameworks that shorten the implementation time (I use Laravel for that). Just like game engines or GUI libraries, assuming you understand the core concepts of the domain. And if the tool was very popular and the LLMs have loads of data to train on, there may be a small productivity tick by finding common patterns (small because if the patterns are common enough, you ought to find a library/plugin for it).

Bad tools often falls in three categories. Too simple, too complex, or unsuitable. For the last two, you'd better switch but there's the human element of sunken costs.

gambiting · 2024-09-27T13:16:28 1727442988

I work in video games, I've tried several AI assistants for C++ coding and they are all borderline useless for anything beyond writing some simple for loops. Not enough training data to be useful I bet, but I guess that's where the disparity is - web apps, python....that has tonnes of publicly available code that it can train on. Writing code that manages GPU calls on a PS5? Yeah, good luck with that.

maroonblazer · 2024-09-27T13:29:10 1727443750

Presumably Sony is sitting on decades worth of code for each of the PlayStation architectures. How long before they're training their own models and making those available to their studios' developers?

skydhash · 2024-09-27T13:37:46 1727444266

I don't think sony have these codes, more likely the finished build. And all the major studios have game engines for their core product (or they license one). The most difficult part is writing new game mechanics or supporting a new platform.

ilaksh · 2024-09-27T21:16:00 1727471760

So you are basically saying "it failed on some of my Rust tasks, and those other languages aren't even real programming languages, so it's useless".

I've used LLMs to generate quite a lot of Rust code. It can definitely run into issues sometimes. But it's not really about complexity determining whether it will succeed or not. It's the stability of features or lack thereof and the number of examples in the training dataset.

aniviacat · 2024-09-27T21:49:31 1727473771

I realize my comment seems dismissive in a manner I didn't intend. I'm sorry for that, I didn't mean to belittle these programming tasks.

What I meant by complexity is not "a task that's difficult for a human to solve" but rather "a task for which the output can't be 90% copied from the training data".

Since frontend development, small scripts and SQL queries tend to be very repetitive, LLMs are useful in these environments.

As other comments in this thread suggested: If you're reinventing the wheel (but this time the wheel is yellow instead of blue), the LLM can help you get there much faster.

But if you're working with something which hasn't been done many times before, LLMs start struggling. A lot.

This doesn't mean LLMs aren't useful. (And I never suggested that.) The most common tasks are, per definition, the most common tasks. Therefore LLMs can help in many areas, and are helpful to a lot of people.

But LLMs are very specialized in that regard, and once you work on a task that doesn't fit this specialization, their usefulness drops, down to being useless.

ilaksh · 2024-09-27T17:42:08 1727458928

Which model exactly? You understand that every few months we are getting dramatically better models? Did you try the one that came out within the last week or so (o1-preview).

aniviacat · 2024-09-27T19:11:45 1727464305

I did use o1-preview.

Roark66 · 2024-09-27T12:55:59 1727441759

I can't understand how anyone can use these tools (copilot especially) to make entire projects from scratch and expand them later. They just lead you down the wrong path 90% of the time.

Personally I much prefer Chatgpt. I give it specific small problems to resolve and some context. At most 100 lines of code. If it gets more the quality goes to shit. In fact copilot feels like chatgpt that was given too much context.

sensanaty · 2024-09-27T13:27:50 1727443670

I hear it all the time on HN that people are producing entire apps with LLMs, but I just don't believe it.

All of my experiences with LLMs have been that for anything that isn't a braindead-simple for loop is just unworkable garbage that takes more effort to fix than if you just wrote it from scratch to begin with. And then you're immediately met with "You're using it wrong!", "You're using the wrong model!", "You're prompting it wrong!" and my favorite, "Well, it boosts my productivity a ton!".

I sat down with the "AI Guru" as he calls himself at work to see how he works with it and... He doesn't. He'll ask it something, write an insanely comprehensive prompt, and it spits out... Generic trash that looks the same as the output I ask of it when I provide it 2 sentences total, and it doesn't even work properly. But he still stands by it, even though I'm actively watching him just dump everything he just wrote up for the AI and start implementing things himself. I don't know what to call this phenomenon, but it's shocking to me.

Even something that should be in its wheelhouse like producing simple test cases, it often just isn't able to do it to a satisfactory level. I've tried every one of these shitty things available in the market because my employer pays for it (I would never in my life spend money on this crap), and it just never works. I feel like I'm going crazy reading all the hype, but I'm slowly starting to suspect that most of it is just covert shilling by vested persons.

insane_dreamer · 2024-09-27T14:03:52 1727445832

The other day I decided to write a script (that I needed for a project, but ancillary, not core code) entirely with CoPilot. It wasn't particularly long (maybe 100 lines of python). It worked. But I had to iterate so much with the LLM, repeating instructions, fixing stuff that didn't run, that it took a fair bit longer than if I had just written it myself. And this was a fairly vanilla data science type of script.

shriek · 2024-09-27T15:29:45 1727450985

Most of the time the entire apps are just a timer app or something simple. Never a complex app with tons of logic in them. And if you're having to write paragraphs of texts to write something complex then might as well just write that in a programming language, I mean isn't that what high-level programming language was built for? (heh). Also, you're not the only one who's had the thought that someone is vested in someway to overhype this.

KoolKat23 · 2024-09-27T18:03:26 1727460206

You can write the high level structure yourself and let it complete the boilerplate code within the functions, where it's less critical/complicated. Can save you time.

shriek · 2024-09-27T19:06:19 1727463979

Oh for sure. I use it as smart(ish) autocomplete to avoid typing everything out/looking up in docs everytime but the thought of prompt engineering to make an app is just bizarre to me. It almost feels like it has more friction than actually writing the damn thing yourself.

mattgreenrocks · 2024-09-27T15:01:02 1727449262

You aren’t the only one that feels this way.

After 20 years of being held accountable for the quality of my code in production, I cannot help but feel a bit gaslit that decision-makers are so elated with these tools despite their flaws that they threaten to take away jobs.

skydhash · 2024-09-27T14:00:58 1727445658

Here is another example [0]. 95% of the code was taken as it is from the examples of the documentation. If you still need to read the code after it was generated, you may have well read the documentation first.

When they say treat it like an intern, I'm so confused. An intern is there to grow and hopefully replace you as you get promoted or leave. The tasks you assign to him are purposely kept simple for him to learn the craft. The monotonous ones should be done by the computer.

[0]: https://gist.github.com/simonw/97e29b86540fcc627da4984daf5b7...

skywhopper · 2024-09-27T15:46:54 1727452014

I think to the extent this works for some people it’s as a way to trick their brains into “fixing” something broken rather than having to start from scratch. And for some devs, that really is a more productive mode, so maybe it works in the end.

And that’s fine if the dev realizes what’s going on but when they attribute their own quirks to AI magic, that’s a problem.

Workaccount2 · 2024-09-27T13:50:40 1727445040

As a non-programmer at a non-programming company:

I use it to write test systems for physical products. We used to contract the work out or just pay someone to manually do the tests. So far it has worked exceptionally well for this.

I think the core issue of the "do LLMs actually suck" is people place different (and often moving) goalposts for whether or not it sucks.

mythrwy · 2024-09-27T16:47:48 1727455668

I just wrote a fairly sizable app with an LLM. This is the first complete app I've written using it. I did write some of the core logic myself leaving the standard crud functions and UI for the LLM.

I did it in little pieces and started over with fresh context each time the LLM started to get off in the weeds. I'm very happy with the result. The code is clean and well commented, the tests are comprehensive and the app looks nice and performs well.

I could have done all this manually too but it would have taken longer and I probably would have skimped out on some tests and gave up and hacked a few things in out of expedience.

Did the LLM get things wrong on occasion? Yes. Make up api methods that don't exist? Yes. Skip over obvious standard straightforward and simple solutions in favor of some rat's nest convoluted way to achieve the same goal? Yes.

But that is why I'm here. It's a different style of programming (and one that I don't enjoy nearly as much as pounding the keyboard). It's more high level thinking and code review involved and less worrying about implementation detail.

It might not work as well in domains which training data doesn't exist in. Also certainly if someone expects to come in with no knowledge and just paste code without understanding, reading and pushing back, they will have a non working mess pretty shortly. But overall these tools dramatically increase productivity in some domains is my opinion.

ChainOfFools · 2024-09-27T16:17:15 1727453835

> but I'm slowly starting to suspect that most of it is just covert shilling by vested persons.

It's almost as if the horde of former kleptocurrency bros have found a promising new seam of fool's gold to mine

achempion · 2024-09-27T13:54:42 1727445282

I have the same observation as well. The hype is getting generated mostly by people who're selling AI courses or AI-related products.

It works well as a smart documentation search where you can ask follow-up questions or when you know what the output should look like if you see it but can't type it directly from the memory.

For code assistants (aka copilot / cursor), it works if you don't care about the code at all and ok with any solution if it's barely working (I'm ok with such code for my emacs configuration).

meiraleal · 2024-09-27T15:14:21 1727450061

LLMs are great to go from 0 to 2b but you wanted to go to 1 so you remove and modify lots of things, get back to 1 and then go to 2.

Lots of people are terrible at going from 0 to 1 in any project. Me included. LLMs helped me a lot solving this issue. It is so much easier to iterate over something.

kranuckle · 2024-09-29T03:07:16 1727579236

I think it’s more that if you want to believe it’s magic future tech then it looks like it.

If you aren’t on board then it looks impressive but flawed and not even close to living up to the hype.

flir · 2024-09-27T13:48:22 1727444902

Just for fun, give it a function you wrote, and ask it if it can make any improvements. I reckon I accept about a third of what it suggests.

mattgreenrocks · 2024-09-27T15:02:48 1727449368

Not a bad use, though I argue being able to do that critique yourself has a compounding effect over time that is worthwhile.

flir · 2024-09-27T15:26:34 1727450794

Well... I have to critique the critique, else how do I know which two thirds to reject?

In theory I'm learning from the LLM during this process (much like a real code review). In practice, it's very rare that it teaches me something, it's just more careful than I am. I don't think I'm ever going to be less slap-dash, unfortunately, so it's a useful adjunct for me.

threeseed · 2024-09-27T12:08:58 1727438938

> 20 year software engineering career is about to change

I have also been developing for 20+ years.

And have heard the exact same thing about IDEs, Search Engines, Stack Overflow, Github etc.

But in my experience at least how fast I code has never been the limiting factor in my project's success. So LLMs are nice and all but isn't going to change the industry all that much.

pluc · 2024-09-27T12:17:05 1727439425

There will be a whole industry of people who fix what AI has created. I don't know if it will be faster to build the wrong thing and pay to have it fixed or to build the right thing from the get go, but after having seen some shit, like you, I have a little idea.

Workaccount2 · 2024-09-27T14:01:09 1727445669

That industry will only form if LLMs don't improve from here. But the evidence, both theoretical and empirical, is quite the opposite. In fact one of the core reasons transformers gained so much traction is because they scale so well.

If nothing really changes in 3-5 years, then I'd call it a flop. But the writing is on the wall that "scale = smarts", and what we have today still looks like a foundational stage for LLM's.

namaria · 2024-09-27T15:05:53 1727449553

> In fact one of the core reasons transformers gained so much traction is because they scale so well.

> If nothing really changes in 3-5 years, then I'd call it a flop

Transformers have been used for what 6 years now? Will you in 6 years say "I'll decide if they don't change the world in another 6 years?"

Workaccount2 · 2024-09-27T15:26:59 1727450819

If the difference between now and 6 years in the future is the same as the difference between now and 6 years ago, a lot of people here will be eating their hats.

namaria · 2024-09-27T16:02:36 1727452956

Why? What exactly have we got for the (how many hundred) billions of dollars poured into GPUs running transformers over the past 6 years?

Workaccount2 · 2024-09-27T16:57:26 1727456246

You don't believe that models 100x better than today (OG transformers were pretty bad) would be fruitful for society?

mattgreenrocks · 2024-09-27T15:11:36 1727449896

Self-driving cars have been 3-5 years away for what, a decade now?

Workaccount2 · 2024-09-27T15:27:30 1727450850

I never paid much attention to Elon.

dumbfounder · 2024-09-27T12:37:59 1727440679

Correction: a whole industry of AI that will fix what AI has created.

vocram · 2024-09-27T13:17:22 1727443042

Will AI also be on call when things break in production?

tempfile · 2024-09-27T13:22:16 1727443336

no, the original comment was correct

cml123 · 2024-09-27T13:31:54 1727443914

yes, but does your colleague even fully understand what was generated? Does he have a good mental map of the organization of the project?

I have a good mental map of the projects I work on because I wrote them myself. When new business problems emerge, I can picture how to solve them using the different components of those applications. If I hadn't actually written the application myself, that expertise would not exist.

Your colleague may have a working application, but I seriously doubt he understands it in the way that is usually needed for maintaining it long term. I am not trying to be pessimistic, but I _really_ worry about these tools crippling an entire generation of programmers.

alonsonic · 2024-09-27T13:44:31 1727444671

AI assistants are also quite good at helping you create a high level map of a codebase. They are able to traverse the whole project structure and functionality and explain to you how things are organized and what responsibilities are. I just went back to an old project (didn't remember much about it) and used Cursor to make a small bug fix and it helped me get it done in no time. I used it to identify where the issue might be based on logs and then elaborate on potential causes before then suggesting a solution and implementing it. It's the ultimate pair programmer setup.

insane_dreamer · 2024-09-27T14:05:57 1727445957

> I just went back to an old project (didn't remember much about it) and used Cursor to make a small bug fix and it helped me get it done in no time.

That sounds quite useful. Does Cursor feed your entire project code (traversing all folders and files) into the context?

Anamon · 2024-09-28T17:14:24 1727543664

Do you ever verify those explanations, though? Because I occasionally try having an LLM summarise an article or document I just read, and it's almost always wrong. I have my doubts that they would fare much better in "understanding" an entire codebase.

My constant suspicion is that most results people are so impressed with were just never validated.

skywhopper · 2024-09-27T15:52:32 1727452352

I wouldn’t even be so sure the application “works”. All we heard is that it has pretty UI and an API and a database, but does it do something useful and does it do that thing correctly? I wouldn’t be surprised if it totally fails to save data in a restorable way, or to be consistent in its behavior. It certainly doesn’t integrate meaningfully with any existing systems, and as you say, no human has any expertise in how it works, how to maintain it, troubleshoot it, or update it. Worse, the LLM that created it also doesn’t have any of that expertise.

n_ary · 2024-09-27T14:25:06 1727447106

> I _really_ worry about these tools crippling an entire generation of programmers.

Isn’t that the point? Degrade the user long enough that the competing user is on-par or below the competence of the tool so that you now have an indispensable product and justification of its cost and existence.

P.S. This is what I understood from a lot of AI saints in news who are too busy parroting productivity gains without citing other consequences, such as loss of understanding of the task or expertise to fact-check.

svantana · 2024-09-27T13:49:55 1727444995

Me too, but a more optimistic view is that this is just a nascent form of higher-level programming languages. Gray-beards may bemoan that us "young" developers (born after 1970) can't write machine code from memory, but it's hardly a practical issue anymore. Analogously, I imagine future software dev to consist mostly of writing specs in natural language.

skydhash · 2024-09-27T14:09:43 1727446183

No one can write machine code from memory other by writing machine for years and just memorizing them. Just like you can't start writing Python without prior knowledge.

> Analogously, I imagine future software dev to consist mostly of writing specs in natural language.

https://www.commitstrip.com/en/2016/08/25/a-very-comprehensi...?

allochthon · 2024-09-27T14:25:19 1727447119

> Me too, but a more optimistic view is that this is just a nascent form of higher-level programming languages.

I like this take. I feel like a significant portion of building out a web app (to give an example) is boilerplate. One benefit of (e.g., younger) developers using AI to mock out web apps might be to figure out how to get past that boilerplate to something more concise and productive, which is not necessarily an easy thing to get right.

In other words, perhaps the new AI tools will facilitate an understanding of what can safely be generalized from 30 years of actual code.

mattgreenrocks · 2024-09-27T15:09:54 1727449794

Web apps require a ton of boilerplate. Almost every successful web framework uses at least one type of metaprogramming, many have more than one (reflection + codegen).

I’d argue web frameworks don’t even help a lot in this regard still. They pile on more concepts to the leaky abstractions of the web. They’re written by people that love the web, and this is a problem because they’re reluctant to hide any of the details just in case you need to get to them.

Coworker argued that webdev fundamentally opposes abstraction, which I think is correct. It certainly explains the mountains of code involved.

cml123 · 2024-09-27T15:33:04 1727451184

I admit that my own feelings about this are heavily biased, because I _truly_ care about coding as a craft; not just a means to an end. For me, the inclusion of LLMs or AI into the process robs it of so much creativity and essence. No one would argue that a craftsman produces furniture more quickly than Wayfair, but all people would agree that the final product would be better.

It does seem inevitable that some large change will happen to our profession in the years to come. I find it challenging to predict exactly how things will play out.

svantana · 2024-09-28T10:41:52 1727520112

I suppose the craft/art view of coding will follow the path of chess - machines gradually overtake humans but it's still an artform to be good at, in some sense.

gtvwill · 2024-09-27T13:53:06 1727445186

I've coded python scripts that let me take csv data from hornresp and convert it to 3d models I can import into sketchup. I did two coding units at uni, so whilst I can read it... I can't write it from scratch to save my life. I can debug and fix scripts gpt gives me. I did the hornresp script in about 40 mins. It would have taken me weeks to learn what it produced.

I'm not a mathematician, hell i did general maths at school. Currently I've been talking through scripting a method to mix dsd audio files natively without converting to tradional pcm. I'm about to use gpt to craft these scripts. There is no way I could have done this myself without years of learning. Now all I have to do is wait half a day so I can use my free gpt o credits to code it for me (I'm broke af so can't afford subs). The productivity gains are insane. I'd pay for this in a heartbeat if I could afford it.

orwin · 2024-09-27T12:14:24 1727439264

I really believe that the front-end part can be mostly automated (the html/CSS at least), copilot is close imho (microsoft+github, I used both), but really they're useless to do anything else complex without making to much calls, proposing bad data structures, using bad /old code design.

skydhash · 2024-09-27T14:12:00 1727446320

The frontend part was already automated. We called it Dreamweaver and RAD tools.

epicureanideal · 2024-09-27T15:20:38 1727450438

Thank you, now I realize where I've had this feeling before!

Working with AI-generated code to add new features feels like working with Dreamweaver-generated code, which was also unpleasant. It's not written the same way a human would write it, isn't written with ease of modification in mind, etc.

JanSt · 2024-09-27T12:18:23 1727439503

Copilot is pretty bad compared to cursor with sonnet. I have used Copilot for quite a long time so I can tell.

StefanWestfal · 2024-09-27T12:30:33 1727440233

I am curiouse, how complex was the app? I use cursor too and am very satisfied with it. It seem that is very good at code that must have been written so many times before (think react components, node.js REST api endpoints etc.) but it starts to fall of when moving into specific domains.

And for me that is the best case scenario, it takes away the part we have to code / solve already solved problems again and again so we can focus more on the other parts of software engineering beyond writing code.

rurp · 2024-09-27T15:04:30 1727449470

Fairly standard greenfield projects seem to be the absolute best scenario for an LLM. It is impressive, but that's not what most professional software development work is, in my experience. Even once I know what specifically to code I spend much more time ensuring that code will be consistent and maintainable with the rest of the project than with just getting it to work. So far I haven't found LLMs to be all that good at that sort of work.

skapadia · 2024-09-27T12:56:07 1727441767

Did you take a look at the code generated? Was it well designed and amenable to extension / building on top of?

I've been impressed with the ability to generate "throw away" code for testing out an idea or rapidly prototyping something.

nativeit · 2024-09-27T13:34:29 1727444069

Considering the current state of the industry, and the prevailing corporate climate, are you sure your job is about to get easier, or are you about to experience cuts to both jobs and pay?

insane_dreamer · 2024-09-27T13:56:05 1727445365

The problem is that it only works for basic stuff for which there is a lot of existing example code out there to work with.

In niche situations it's not helpful at all in writing code that works (or even close). It is helpful as a quick lookup for docs for libs or functions you don't use much, or for gotchas that you might otherwise search StackOverflow for answers to.

It's good for quick-and-dirty code that I need for one-off scripts, testing, and stuff like that which won't make it into production.

apwell23 · 2024-09-27T12:03:33 1727438613

So what is his plan to fix all the bugs that claude hallucinated in the code ?

JanSt · 2024-09-27T12:07:27 1727438847

I'm confident you have not used Cursor Composer + Claude 3.5 Sonnet. I'd say the level of bugs is no higher than that of a typical engineer - maybe even lower.

hobs · 2024-09-27T12:11:11 1727439071

There's no LLM for which that is true or we'd all be fired.

joshuacc · 2024-09-27T12:18:23 1727439503

In my experience it is true, but only for relatively small pieces of a system at the time. LLMs have to be orchestrated by a knowledgeable human operator to build a complete system any larger than a small library.

ben_w · 2024-09-27T12:37:43 1727440663

In the long term, sure. Short term, when that happens, we're going to be on Wile E. Cyote physics and keep up until we look down and notice the absence of ground.

dagw · 2024-09-27T12:15:47 1727439347

If all you bring to the table is the ability to reimplement simple web apps to spec, then sooner or later you probably will be fired.

threeseed · 2024-09-27T12:14:38 1727439278

It's only as good as its training data.

Step outside of building basic web/CRUD apps and its accuracy drops off substantially.

Also almost every library it uses is old and insecure.

mewpmewp2 · 2024-09-27T12:45:40 1727441140

Yet most work seems to be CRUD related and most SaaS businesses starting up just really need those things mainly.

whatshisface · 2024-09-27T12:45:27 1727441127

That last point represents the biggest problem this technology will leave us with. Nobody's going to train LLMs on new libraries or frameworks when writing original code takes an order of magnitude longer than generating code for the 2023 stack.

Workaccount2 · 2024-09-27T14:03:38 1727445818

With LLM's like gemini, which have massive context windows, you can just drop the full documentation for anything in the context window. It dramatically improves output.

SubiculumCode · 2024-09-27T15:18:36 1727450316

I use phind which does searches to provide additional context

apwell23 · 2024-09-27T15:39:25 1727451565

I am confident you didn't understand my comment. I didn't say anything about "level of bugs".

dagw · 2024-09-27T12:12:47 1727439167

Claude is actually surprisingly good at fixing bugs as well. Feed it a code snippet and either the error message or a brief description of the problem and it will in many cases generate new code that works.

charlie0 · 2024-09-27T17:49:48 1727459388

Sounds like CRUD boilerplate. Sure, it's great to have AI build this out and it saves a ton of time, but I've yet to see any examples (online or otherwise) or people building complex business rules and feature sets using AI.

The sad part is beginners using the boilerplate code won't get any practice building apps and will completely fail at the complex parts of an app OR try to use AI to build it and it will be terrible code.

skywhopper · 2024-09-27T15:38:37 1727451517

I hear these stories, and I have to wonder, how useful is the app really? Was it actually built to address a need or was it built to learn the coding tool? Is it secure, maintainable, accessible, deployable, and usable? Or is it just a tweaked demo? Plenty of demo apps have all those features, but would never serve as the basis for something real or meet actual customer needs.

SJC_Hacker · 2024-09-27T14:02:11 1727445731

Yeah AI can give you a good base if its something thats been done before (which admittedly, 99% of SE projects are), especially in the target language.

Yeah, if you want tic-tac-toe or snake, you can simply ask ChatGPT and it will spit out something reasonable.

But this is not much better than a search engine/framework to be honest.

Asking it to be "creative" or to tweak existing code however ...

JanSt · 2024-09-27T12:03:38 1727438618

Yes, the value of a single engineer can easily double. Even a junior - and it's much easier for them to ask Claude for help than the senior engineer on the team (low barrier for unblock).

chmod775 · 2024-09-27T13:37:27 1727444247

> There was a report that Microsoft is losing $20 for every $10 spent on Copilot subscriptions, with heavy users costing them as much as $80 per month. Assuming you're one of those heavy users, would you pay >$80 a month for it?

I'm probably one of those "heavy users", though I've only been using it for a month to see how well it does. Here's my review:

Large completions (10-15 lines): It will generally spit out near-working code for any codemonkey-level framework-user frontend code, but for anything more it'll be at best amusing and a waste of time.

Small completions (complete current line): Usually nails it and saves me a few keystrokes.

The downside is that it competes for my attention/screen space against good old auto-completion, which costs me productivity every time it fucks up. Having to go back and fix identifiers in which it messed up the capitalization/had typos, where basic auto-complete wouldn't have failed is also annoying.

I'd pay about about $40 right now because at least it has some entertainment value, being technologically interesting.

jeremy151 · 2024-09-27T14:08:16 1727446096

I find tools where I am manually shepherding the context into an LLM to work much better than Copilot at current. If I think thru the problem enough to articulate it and give the model a clear explanation, and choose the surrounding pieces of context (the same stuff I would open up and look at as a dev) I can be pretty sure the code generated (even larger outputs) will work and do what I wanted, and be stylistically good. I am still adding a lot in this scenario, but it's heavier on the analysis and requirements side, and less on the code creation side.

If what I give it is too open ended, doesn't have enough info, etc, I'll still get a low quality output. Though I find I can steer it by asking it to ask clarifying questions. Asking it to build unit tests can help a lot too in bolstering, a few iterations getting the unit tests created and passing can really push the quality up.

JanSt · 2024-09-27T11:59:09 1727438349

1) The costs will go down over time, much of the cost is the margin of NVIDIA and training new models

2) Absolutely. Thats like one hour of an engineer salary for a whole month.

sofixa · 2024-09-27T12:06:47 1727438807

> The costs will go down over time, much of the cost is the margin of NVIDIA and training new models

Isn't each new model bigger and heavier and thus requries more compute to train?

JanSt · 2024-09-27T12:09:49 1727438989

Yes, but 1) you only need to train the model once and the inference is way cheaper. Train one great model (i.e. Claude 3.5) and you can get much more than $80/month worth out of it. 2) the hardware is getting much better and prices will fall drastically once there is a bit of a saturation of the market or another company starts putting out hardware that can compete with NVIDIA

sofixa · 2024-09-27T12:19:49 1727439589

> Train one great model (i.e. Claude 3.5) and you can get much more than $80/month worth out of it

Until the competition outcompetes you with their new model and you have to train a new superior one, because you have no moat. Which happens what, around every month or two?

> the hardware is getting much better and prices will fall drastically once there is a bit of a saturation of the market or another company starts putting out hardware that can compete with NVIDIA

Where is the hardware that can compete with NVIDIA going to come from? And if they don't have competition, which they don't, why would they bring down prices?

ben_w · 2024-09-27T13:06:49 1727442409

> Until the competition outcompetes you with their new model and you have to train a new superior one, because you have no moat. Which happens what, around every month or two?

Eventually one of you runs out of money, but your customers keep getting better models until then; and if the loser in this race releases the weights on a suitable gratis license then your businesses can both lose.

But that still leaves your customers with access to a model that's much cheaper to run than it was to create.

JanSt · 2024-09-27T12:26:06 1727439966

The point is not that every lab will be profitable. There only needs to be one model in the end to increase our productivity massively, which is the point I'm making.

Huge margins lead to a lot of competition trying to catch up, which is what makes market economies so successful.

Workaccount2 · 2024-09-27T14:10:42 1727446242

Gemini models are trained and run on Google's in house TPU's, which frankly are incredible compared to H100's. In fact Claude was trained on TPUs.

Google however does not sell these, you can only lease time on them via GCP.

robrenaud · 2024-09-27T14:40:27 1727448027

Then those new models get distilled into smaller ones.

Raising the max intelligence of the models tends to raise the intelligence of all the models via distillation.

ema · 2024-09-27T12:02:05 1727438525

If it makes software developers 10% more productive there sure would be many companies who'd pay $80 a month per seat.

HarHarVeryFunny · 2024-09-27T13:08:17 1727442497

Maybe there are people out there working in coding sweatshops churning out boilerplate code 8 hours a day, 50 weeks a year - people who's job is 100% coding (not what I would call software engineers or developers - just coders). It's easy to imagine that for such people (but do they even exist?!) there could be large productivity gains.

However, for a more typical software engineer, where every project is different, you have full lifecycle responsibility from design through coding, occasional production support, future enhancements, refactorings, updates for 3rd party library/OD updates, etc/etc, then how much of your time is actually spent pure coding (non-stop typing) ?! Probably closer to 10-25%, and certainly no-where near 100%. The potential overall time saving from a tool that saves, let's say, 10-25% of your code typing is going to be 1-5%, which is probably far less than gets wasted in meetings, chatting with your work buddies, or watching bullshit corporate training videos. IOW the savings is really just inconsequential noise.

In many companies the work load is cyclic from one major project to the next, with intense periods of development interspersed with quieter periods in-between. Your productivity here certainly isn't limited by how fast you can type.

varjag · 2024-09-30T07:38:53 1727681933

I'm skeptical about the whole ordeal, but at $80/mo it would still be worth it unless you sit somewhere at the very bottom of outsourcing well.

wongarsu · 2024-09-27T14:06:37 1727445997

A 1% time saving for a $100k/yr position is still worth $83/month. And accounting for overhead, someone who costs the company $100k only gets a $60k salary.

If you pay Silicon Valley salaries this seems like a no-brainer. There are bigger time wasters elsewhere, but this is an easy win with minimal resistance or required culture change

HarHarVeryFunny · 2024-09-27T15:19:41 1727450381

Yeah, but companies need to see the savings on the bottom line, in real dollars, before they are going to be spending $1000/seat for this stuff. A theoretical, or actual, 1-5% of time saved typing is most likely not going to mean you can hire fewer people and actually reduce payroll, so even if the 1-5% were to show up on internal timesheets (it won't!), this internal accounting will not be reflected on the bottom line.

renegade-otter · 2024-09-27T12:48:58 1727441338

It's like saying "AI is going to replace book writers because they are so much more productive now". All you will get is more mediocre content that someone will have to fix later - the same with code.

10% more productive. What does that mean? If you mean lines of code, then it's an incredibly poor metric. They write more code, faster. Then what? What are the long-term consequences? Is it ultimately a wash, or even a detriment?

https://stackoverflow.blog/2024/03/22/is-ai-making-your-code...

ben_w · 2024-09-27T13:14:54 1727442894

LLMs set a new minimum level; because of this they can fill in the gaps in a skillet — if I really suck at writing unit tests, they can bring me up from "none" to "it's a start". Likewise all the other specialities within software.

Personally I am having a lot of fun, as an iOS developer, creating web games. No market in that, not really, but it's fun and I wouldn't have time to update my CSS and JS knowledge that was last up-to-date in 1998.

apwell23 · 2024-09-27T12:04:46 1727438686

It actually makes them less productive and creates havoc in codebases with hidden bugs and verbose code that ppl are copy pasting.

christkv · 2024-09-27T12:31:10 1727440270

Also at some point you can run the equivalent model locally. There is no long term moat here i think and facebook seems hellbent of ensuring there will be no new google from llms

KoolKat23 · 2024-09-27T18:33:51 1727462031

I think physics at some point will get in the way, well at least for a while. An H100 costs like $20k-$30k and there's only so much compression/efficiency they can gain without beginning to lose intelligence, purely because you can't compute out of thin air.

brookst · 2024-09-27T11:59:36 1727438376

Is there any reason to believe costs won’t come down with scale and hardware iteration, just like they did for everything else?

Short term pricing inefficiency is not relevant to long term impact.

HarHarVeryFunny · 2024-09-27T13:27:37 1727443657

Of course, but every token generated by a 100B model is going to take minimally 100B FLOPS, and if this is being used as an IDE typing assistant then there is going to be a lot of tokens being generated.

If there is a common shift to using additional runtime compute to improve quality of output, such as OpenAI's GPT-o1, then FLOPs required goes up massively (OpenAI has said it takes exponential increase in FLOPS/cost to generate linear gains in quality).

So, while costs will of course decrease, those $20-30K NVIDEA chips are going to be kept burring, and are not going to pay for themselves ...

This may end up like the shift to cloud computing that sounds good in theory (save the cost of running your own data center), but where corporate America balks when the bill comes in. It may well be that the endgame for corporate AI is to run free tools from the likes of Meta (or open source) in their own datacenter, or maybe even locally on "AI PCs".

wongarsu · 2024-09-27T14:14:31 1727446471

Which is why the work to improve the results of small models is so important. Running a 3B or even 1B model as typing assistant and reserving the 100B model for refactoring is a lot more viable.

brookst · 2024-09-29T15:01:34 1727622094

> but every token generated by a 100B model is going to take minimally 100B FLOPS

Drop the S, I think. There’s no time dimension.

And FLOP is a generalized capability meaning you can do any operation. Hardware optimizations for ML can deliver the same 100B computations faster and cheaper by not being completely generalized. Same way ray tracing acceleration works: it does not use the same amount of compute as ray tracing in general CPU’s.

HarHarVeryFunny · 2024-10-01T17:58:50 1727805530

Sure, ANN computations are mostly multiplication (or multiply and add) - multiply an ANN input by a weight (parameter) and accumulate, parallelized into matrix multiplication which is the basic operation supported by accelerators like GPUs and TPUs.

Still, even with modern accelerators it's lot of computation, and is what drives the price per token of larger models vs smaller ones.

archerx · 2024-09-27T11:59:58 1727438398

I can already pay $0 a month and use uncensored local models for both text and images.

Llama, Mixtral, Stable diffusion and Flux are a lot of fun and free to run locally, you should try them out.

jsheard · 2024-09-27T12:03:44 1727438624

You can pay $0 for those models because a company paid $lots to train them and then released them for free. Those models aren't going away now of course, but lets not pretend that being able to download the product of millions of dollars worth of training completely free of charge is sustainable for future developments. Especially when most of the companies releasing these open models are wildly unprofitable and will inevitably bankrupt themselves when investments dry up unless they change their trajectory.

likium · 2024-09-27T12:41:07 1727440867

Much could be said about open source libraries that companies release for free to use (kubernetes, react, firecracker, etc). It might be strategically make sense for them so in the meantime we’ll just reap the benefits.

skydhash · 2024-09-27T14:20:14 1727446814

All of these require maintenance, and mostly it's been a treadmill just applying updates to React codebases. Complex tools are brittle and often only makes sense at the original source.

archerx · 2024-09-27T16:52:13 1727455933

You’re acting as if computing power isn’t going to get better. With time training the models will get faster.

Let me use CG rendering as an example. Back in the day only the big companies could afford to do photoreal 3D rendering because only they had access to the compute and even then it would take days to render a frame.

Eventually people could do these renders at home with consumer hardware but it still took forever to render.

Now we can render photoreal with path tracing at near realtime speeds.

If you could go back twenty years and show CG artists the Unreal Engine 5 and show them it’s all realtime they would lose their minds.

I see the same for A.I., now it’s only the big companies that can do it, then we will be able to do it at home but it will be slow and finally we will be able to train it at home for quick and cheap.

jsheard · 2024-09-27T18:41:17 1727462477

The flipside to that metaphor is that high-end CG productions never stopped growing in scope to fill bigger and better hardware - yes you can easily render CG from back in the day on a shoestring budget now, but rendering Avatar 2 a couple of years ago still required a cluster with tens of thousands of CPU cores. Unless there's a plateau in the amount of compute you can usefully pour into training a model, those with big money to spend are always going to be several steps ahead of what us mere mortals can do.

jazzyjackson · 2024-09-27T15:46:29 1727451989

VRAM isn't free, you just put it in the capex pile instead of opex

ben_w · 2024-09-27T12:33:21 1727440401

> There's no accounting for taste, but keep in mind that all of these services are currently losing money, so how much would you actually be willing to pay for the service you're currently getting in order to let it break even

Ok models already run locally; that aside, as the hosted ones are kinda similar quality to interns (though varying by field), the answer is "what you'd pay an intern". Could easily be £1500/month, depending on domain.

lhl · 2024-09-27T17:28:17 1727458097

When was this profitability report, because the cost per token generation has dropped significantly.

When GPT4 was launched last year, the API cost was about $36/M blended tokens, but you can now get GPT4o tokens for about $4.4/M tokens, Gemini 1.5 Pro for $2.2/M or DeepSeek-V2 (as 21B A/236B W model that matches GPT4 on coding) for as low as $0.28/M tokens (over 100X cheaper for the same quality output over the course of about 1.5 years).

The just released Qwen2.5-Coder-7B-Instruct (Apache 2.0 licensed) also basically matches/beats GPT4 on coding benchmarks and quantized can not only can run at a decent speed on just about any consumer gaming GPU, but on most new CPUs/NPUs as well. This is about a 250X smaller model than GPT4.

There are now a huge array of open weight (and open source) models that are very capable and that can be run locally/on the edge.

CamperBob2 · 2024-09-27T16:26:42 1727454402

There's no accounting for taste, but keep in mind that all of these services are currently losing money, so how much would you actually be willing to pay for the service you're currently getting in order to let it break even?

For ChatGPT in its current state, probably $1K/month.

gotaran · 2024-09-27T12:00:58 1727438458

$80 a month is a no brainer given the productivity multiplier.

jsemrau · 2024-09-27T13:41:32 1727444492

Just a thought exercise. If we would have an AI with the intellectual capabilities of a Ph.D holding professor in a hard science. How much would it be worth for you to have access to that AI?

100,000 ? 500,000 ?

salawat · 2024-09-27T14:58:41 1727449121

0 unless what I'm interested in is that Professor's very narrowly tailored niche. It's called Piled Higher and Deeper for a reason.

BaculumMeumEst · 2024-09-27T14:12:20 1727446340

I don't find this very compelling. Hardware is becoming more available and cheaper as production ramps up, and smaller models are constantly seeing dramatic improvements.

refulgentis · 2024-09-27T13:42:40 1727444560

CoT is not RL'ing over reasoning traces, costs have come down 87.5% since that article, and I agree generally that "free" is a bad price point

jejeyyy77 · 2024-09-27T13:23:31 1727443411

- it won't work.

- ok it works, but it won't be useful.

- ok it's useful, but it won't scale.

- ok it scales, but it won't make any money.

- ok it makes money, but it's not going to last.

etc etc

patrickjd · 2024-09-27T13:44:51 1727444691

Retrospectively framing technologies that succeeded despite doubts at the time discounts those that failed.

After all, you could have used the exact same response in defense of web3 tech. That doesn't mean LLMs are fated to be like web3, but similarly the outcome that the current expenditure can be recouped is far from a certainty just because there are doubters.

farts_mckensy · 2024-09-27T14:31:27 1727447487

There certainly has been some goal post moving over the past few months. A lot of the people in here have some kind of psychological block when it comes to technology that may potentially replace them one day.

KoolKat23 · 2024-09-27T18:37:01 1727462221

Yeah currently the sentiment seems to be "okay fine it works for simple stuff but won't deal with my complex query so it can be dismissed outright." Save yourselves some time and use it for that simple stuff folks.

lupire · 2024-09-27T14:27:14 1727447234

People hate paying specifically for stuff.

If Copilot came for free and Azure cost a tiny bit more, nobody would even blink.

mklepaczewski · 2024-09-27T12:08:28 1727438908

I would, and I don't use chatgpt as much as other people. I would pay for it for each of my employees.

HPsquared · 2024-09-27T11:58:57 1727438337

It's called investment. You need to spend money to make money. Their costs will certainly come down.

infecto · 2024-09-27T12:39:44 1727440784

Definitely. My time is valuable and I would spend multiples more on the current subscription costs.

fassssst · 2024-09-27T13:15:30 1727442930

Why do you assume they’re losing money on inference?

rafaelmn · 2024-09-27T12:45:01 1727441101

It's a usefull coding tool - but at the same time it displays a lack of intelligence in the responses provided.

Like it will generate code like `x && Array.isarray(x)` because `x && x is something` is a common pattern I guess - but it's completely pointless in this context.

It will often do roundabout shit solutions when there's trivial stuff built into the tool/library when you ask it to solve some problem. If you're not a domain expert or search for better solutions to check it you'll often end up with slop.

And the "reasoning" feels like the most generic answers while staying on topic, like "review this code" will focus on bullshit rather than prioritizing the logic errors or clearing up underlying assumptions, etc.

That said it's pretty good at bulk editing - like when I need to refactor crufty test cases it saves a bunch of typing.

dathinab · 2024-09-27T12:29:42 1727440182

idk. about Claude 3.5

but if you remove implicit subventions from the AI/AGI hype then for many such tools the cost to benefit calculation of creating and operating will become ... questionable

furthermore the places where such tools tend to shine the most often places where the IT industry has somewhat failed, like unnecessary verbose and bothersome to use tools, missing tooling and troublesome code reuse (so you write the same code again and again). And this LLM based tools are not fixing the problem they just kinda hiding it. And that has me worried a bit because it makes it much much less likely for the problem to ever be fixed. Like I think there is a serious chance for this tooling causing the industry to be stuck on a quite sub-par plato for many many years.

So while they clearly help, especially if you have to reinvent the wheel for a thousands time, it's hard to look at them favorably.

Demiurge · 2024-09-27T12:47:48 1727441268

> And that has me worried a bit because it makes it much much less likely for the problem to ever be fixed.

How will that ever get solved, in this universe? Look at what C++ does to C, what TypeScript does to JavaScript, what every standard does to the one before. It builds on top, without fixing the bottom, paving over the holes.

If AI helps generate sane low level code, maybe it will help you make less buffer overflow mistakes. If AI can help test and design your firewall and network rules, maybe it will help you avoid exposing some holes in your CUPS service. Why not, if we're never getting rid of IP printing or C? Seems like part of the technological progress.

KoolKat23 · 2024-09-27T18:47:13 1727462833

The scaling laws coming to mind. This concern becomes trivial as we scale. Its like worrying that your calculator app running on your phone could be more efficient when adding two numbers.

dathinab · 2024-09-28T14:02:25 1727532145

The problem is that the scaling law goes in two directions, things becoming (potential exponential) cheaper because they are done (produced) a lot and things becoming _(potential exponential) more expensive_ because the scale you try to archive so so much beyond what is sustainable (or in other word the higher the demand for a limited resource becomes the more expensive it becomes).

Similar this law isn't really a law for a good reason, it doesn't always work. Not everything gets cheaper (in a relevant amount) at scale.

mewpmewp2 · 2024-09-27T12:41:51 1727440911

Hopefully it will be able to also reduce boilerplate and do reasonable DRY abstractions if repetition becomes too much.

E.g. I feel like it should be possible to first blast out a lot of repetitive code and then for LLM to go over all of it and abstract it reasonably, while tests are still passing.

JonChesterfield · 2024-09-27T12:46:06 1727441166

Code generator in the editor has been around for ages and serves primarily to maximise boilerplate and minimise DRY. Expecting the opposite from a new code generator will yield disappointment.

mewpmewp2 · 2024-09-27T12:56:55 1727441815

I mean LLM can go through all the files in a source code and find repetitions that can be abstracted. Reorganize files into more appropriate structures etc. It just needs an optimal algorithm to provide optimal context for it.

__alexs · 2024-09-27T12:49:32 1727441372

It's not snark, it's calling out a fundamental error of extrapolating a short term change in progress to infinity.

It's like looking at the first version of an IDE that got intellisense/autocomplete and deciding that we'll be able to write entire programs by just pressing tab and enter 10,000 times.

ramblerman · 2024-09-27T12:09:14 1727438954

2 things can be true at the same time.

Op is addressing the hype that there is some linear path of improvement here and chatgpt 8.5 will be AGI.

To which people always seem to jump in with but it’s useful for me and makes me code faster. Which is fine and valid, just beside the point

gorjusborg · 2024-09-27T14:05:54 1727445954

Do you think AI companies will be able to afford running massive compute farms solely so coders can get suggestions?

I do not claim to know what the future holds, but I do feel the clock is ticking on the AI hype. OpenAI blew people's minds with GPTs, and people extrapolated that mind-blowing experience into a future with omniscient AI agents, but those are nowhere to be seen. If investors have AGI in mind, and it doesn't happen soon enough, I can see another winter.

Remember, the other AI winters were due to a disconnect between expectations and reality of the current tech. They also started with unbelievable optimism that ended when it became clear the expectations were not reality. The tech wasn't bad back then either, it just wasn't The General Solution people were hoping for.

gm3dmo · 2024-09-27T14:14:43 1727446483

I feel like these new tools have helped me get simple programming tasks done really quickly over the last 18 months. They seem like a faster, better and more accurate replacement for googling and Stackoverflow.

They seem very good at writing SQL for example. All the commas are in the right place and exactly the right amount of brackets square curly and round. But when they get it wrong, it really shows up the lack of intelligence. I hope the froth and bubble in the marketing of these tools matures into something with a little less hyperbole because they really are great just not intelligent.

jetsetk · 2024-09-27T13:13:18 1727442798

how come MS Teams is still trash when everyone is being so much more productive? Shouldn't MS - sitting at the source - be able to create software wonders like all the weekend warriors using AI?

aznumeric · 2024-09-27T12:56:57 1727441817

If you like Cursor, you should definitely check out ClaudeDev (https://github.com/saoudrizwan/claude-dev) It's been a hit in the Ai dev community and I've noticed many folks prefer it over Cursor. It's free and open-source. You use your API credits instead of subscription and it supports other LLMs like DeepSeek too.

beefnugs · 2024-09-27T16:38:32 1727455112

The economics dont make sense at all:

Either you pay more and more to keep your job as it gets better, or the company pays any amount for it so they can replace you over and over as a barely useful cog.

The current state of it being cheap only exist as it is in beta and they need more info from you, the expert, until it no longer needs you

renegade-otter · 2024-09-27T12:43:07 1727440987

I have yet to watch people be THAT more productive using, say, Copilot. Outside of some annoying boilerplate that I did not have to write myself, I don't know what kind of code you are writing that makes it all so much easier. This gets worse if you are using less trendy languages.

No offense, but I have only seen people who barely coded before describe being "very productive" with AI. And, sure, if you dabble, these systems will spit out scripts and simpler code for you, making you feel empowered, but they are not anywhere near being helpful with a semi-complex codebase.

f1shy · 2024-09-27T12:48:12 1727441292

I’ve tried enough times to generate code with AI: any attempt to generate non absolutely trivial piece of code that I can do intoxicated and sleep deprived, is just junk. It takes more time and effort to correct the AI output as starting from 0.

Let’s see in some years… long winter ahead.

surgical_fire · 2024-09-27T13:38:45 1727444325

I tried many times. Things that AI is good at:

- Generate boilerplate

- Generate extremely simple code patterns. You need a simple CRUD API? Yeah, it can do it.

- Generate solutions for established algorithms. Think of solutions for leetcode exercises.

So yeah, if that's your job as a developer, that was a massive productivity boost.

Playing with anything beyond that and I got varying degrees of failure. Some of which are productivity killers.

The worst is when I am trying to do something in a language/framework I am not familiar with, and AI generates plausibly sounding but horribly wrong bullshit. It sends me in some deadends that take me a while to figure out, and I would have been better just looking it up by myself.

skydhash · 2024-09-27T14:25:42 1727447142

And the solutions for these already existed:

- Generate boilerplate : Snippets, templates, and code generators

- Generate extremely simple code patterns : Frameworks

- Generate solutions for established algorithms : Libraries.

namaria · 2024-09-27T15:15:38 1727450138

Lol seriously there are deterministic commands I can run that give me correct and verified boilerplate to stand up APIs. Why would I trust some probabilistic analysis of all code found online (while dissipating ungodly amounts of energy and water) to do it instead?

skydhash · 2024-09-27T15:25:09 1727450709

When I heard people talk about writing specs in natural language, I want to ask them if they want fuzzy results too. Like 10x10=20 or having you account debited from x+e money where x is what you ask and e is any real number. Or having your smoke detector interpreting it’s sensor fuzzily too.

surgical_fire · 2024-09-27T16:36:02 1727454962

Absolutely.

My point is that I don't think AI can meaningfully output code that would be useful beyond that, because that code is not available in its training data.

Whenever I see people going on about how AI made then super productive, the only thing I ask myself is "My brother in Christ, what the fuck are you even coding?"

foldr · 2024-09-27T13:45:47 1727444747

I've definitely noticed Copilot making it less annoying to write code because I don't have to type as much. But I wonder if that significant reduction in subjective annoyance causes people to overestimate how much actual time they're saving.

lynx23 · 2024-09-27T12:07:01 1727438821

While I get where your fascinaton comes from...

> I'm more productive than ever before.

You realize that another way to read that sentence is "I am a really bad coder".

JanSt · 2024-09-27T12:29:13 1727440153

Maybe I am, but I'm getting pretty rich doing it, so there is that. :)

skwee357 · 2024-09-27T13:28:48 1727443728

The fact that you make money using AI, has nothing to do with its usefulness for society/humanity.

There are people who are getting “pretty rich” by trafficking humans, or selling drugs. Would you want to live in a society where such activities are encouraged? In the end, we need to look at technological progress (or any progress for that matter) as where it will bring us to in the future, rather than what it allows you to do now.

It also pisses me off that software engineering has such a bad reputation that everyone, from common folks to the CEO of nvidia, is shitting on it. You don’t hear phrases like “AI is going to change medicine/structural engineering”, because you would shit your pants if you had to sit in a dentist chair, while the dentist would ask ChatGPT how to perform a root canal; or if you had to live in a house designed by a structural engineer whose buddy was Claude. And yet, somehow, everyone is ready to throw software engineers under the bus and label them as "useless"/easily replaceable by AI.

JanSt · 2024-09-27T21:43:04 1727473384

I‘m not making money because of AI, I make a lot of money because I‘m a good programmer. My current income has little to do with ai (99% build before GPT). So relax please.

namaria · 2024-09-27T15:21:12 1727450472

"I'm getting paid so I don't really care" is the most destructive instance a human can take. Why do you think we're about to disrupt the Holocene climate optimum that gave birth to modern civilization?

JanSt · 2024-09-27T21:46:20 1727473580

I’m getting pretty rich programming not using AI. This is an answer to me being a bad coder. My income has nothing to do with AI apart from maybe 1% me being more productive since Claude 3.5 dropped. Be assured I’m not going to destroy the planet.

namaria · 2024-09-28T13:26:12 1727529972

By yourself? Hardly. We all are by demanding top dollar for sitting on computers while demanding to buy cheap things without limits.

But we're getting paid right? And the Prime van will keep delivering.

frankc · 2024-09-27T12:14:55 1727439295

I think this is what the kids call "copium". To be honest, when people think like this it makes me smile. I'd rather compete against people programming on punchcards.

PaulHoule · 2024-09-27T12:40:24 1727440824

Usually I learn my way around the reference docs for most languages I use but CSS has about 50 documents to navigate. I’ve found Copilot does a great job with CSS questions though for Java I really do run into cases where it tells me that Optional doesn’t have a method that I know is there.

ashkankiani · 2024-09-27T15:27:42 1727450862

LLMs make mediocre engineers into slightly less mediocre engineers, and non-engineers into below mediocre engineers. They do nothing above the median. I've tried dozens of times to use them productively.

Outside of very very short isolated template creation for some kind of basic script or poorly translating code from one language to another, they have wasted more time for me than they saved.

The area they seem to help people, including me, the most in is giving me code for something I don't have any familiarity with that seems plausible. If it's an area I've never worked in before, it could maybe be useful. Hence why the less breadth of knowledge in programming you have, the more useful it is. The problem is that you don't understand the code it produces so you have to entirely be reliant on it, and that doesn't work long term.

LLMs are not and will not be ready to replace programmers within the next few years, I guarantee it. I would bet $10k on it.

freejazz · 2024-09-27T15:27:22 1727450842

Nothing snarky about pointing out AGI is nowhere near

sufjkw · 2024-09-27T15:20:39 1727450439

> I'm more productive than ever before.

Who are you and what are you being so productive in?

These code assistants are wholly unable to help with the day to day work I do.

Sometimes I use them to remind me what flags to use with a tarball[0], so replaced SO, but anything of consequence or creativity and they flounder.

What are you getting out of this excess productivity? A pay raise? More time with your loved ones?

[0] https://xkcd.com/1168/ (addressing the tool tip, but hilariously, in regards the comics content that would be a circumstance where I would absolutely avoid trusting one of these ‘assistants’)

layer8 · 2024-09-27T13:16:32 1727442992

I’ll be waiting for these developer benefits to translate into tangible end user benefits in software.

aithrowawaycomm · 2024-09-27T12:22:29 1727439749

OP could have been more substantive, but there is no contradiction between "current AI tools are sincerely useful" and "overinflated claims about the supposed intelligence of these tools will lead to an AI winter." I am quite confident both are true about LLMs.

I use Scheme a lot, but the 1970s MIT AI folks' contention that LISPs encapsulate the core of human symbolic reasoning is clearly ridiculous to 2020s readers: LISP is an excellent tool for symbolic manipulation and it has no intelligence whatsoever even compared to a jellyfish[1], since it cannot learn.

GPTs are a bit more complicated: they do learn, and transformer ANNs seem meaningfully more intelligent than jellyfish or C. elegans, which apparently lack "attention mechanisms" and, like word2vec, cannot form bidirectional associations. Yet Claude-3.5 and GPT-4o are still unable to form plans, have no notions of causality, cannot form consistent world models[2] and plainly don't understand what numbers actually mean, despite their (misleading) successes in symbolic mathematics. Mice and pigeons do have these cognitive abilities, and I don't think it's because God seeded their brains with millions of synthetic math problems.

It seems to me that transformer ANNs are, at any reasonable energy scale, much dumber than any bird or mammal, and maybe dumber than all vertebrates. There's a huge chunk we are missing. And I believe what fuels AI boom/bust cycles are claims that certain AI is almost as intelligent as a human and we just need a bit more compute and elbow grease to push us over the edge. If AI investors, researchers, and executives had a better grasp of reality - "LISP is as intelligent as a sponge", "GPT is as intelligent as a web-spinning spider, but dumber than a jumping spider" - then there would be no winter, just a realization that spring might take 100 years. Instead we see CS PhDs deluding themselves with Asimov fairy tales.

[1] Jellyfish don't have brains but their nerve nets are capable of Pavlovian conditioning - i.e., learning.

[2] I know about that Othello study. It is dishonest. Unlike those authors, when I say "world model" I mean "world."

wongarsu · 2024-09-27T14:24:56 1727447096

I guess it depends on what we mean by "AI winter". I completely agree that the current insane levels of investment aren't justified by the results, and when the market realises this it will overreact.

But at the same time there is a lot of value to capture here by building solid applications around the capabilities that already exist. It might be a winter more like the "winter" image recognition went through before multimodal LLMs than the previous AI winter

aithrowawaycomm · 2024-09-27T14:39:25 1727447965

I think the upcoming AI bust will be similar to the 2000s dotcom bust - ecommerce was not a bad idea or a scam! And neither are transformers. But there are cultural similarities:

a) childish motivated reasoning led people to think a fairly simple technology could solve profoundly difficult business problems in the real world

b) a culture of "number goes up, that's just science"

c) uncritical tech journalists who weren't even corrupt, just bedazzled

In particular I don't think generative AI is like cryptocurrency, which was always stupid in theory, and in practice it has become the rat's nest of gangsters and fraudsters which 2009-era theory predicted. After the dust settles people will still be using LLMs and art generators.

namaria · 2024-09-27T15:18:53 1727450333

I see the same way. My current strategy is what I think I should have done in the dotcom bubble: carefully avoid pigeonholing myself in the hype topics while learning the basics so I can set up well positioned teams after the dust settles.

apsec112 · 2024-09-27T13:38:04 1727444284

What LLM abilities, if you saw them demonstrated, would cause you to change your mind?