I can't understand how anyone can use these tools (copilot especially) to make e...

sensanaty · 2024-09-27T13:27:50 1727443670

I hear it all the time on HN that people are producing entire apps with LLMs, but I just don't believe it.

All of my experiences with LLMs have been that for anything that isn't a braindead-simple for loop is just unworkable garbage that takes more effort to fix than if you just wrote it from scratch to begin with. And then you're immediately met with "You're using it wrong!", "You're using the wrong model!", "You're prompting it wrong!" and my favorite, "Well, it boosts my productivity a ton!".

I sat down with the "AI Guru" as he calls himself at work to see how he works with it and... He doesn't. He'll ask it something, write an insanely comprehensive prompt, and it spits out... Generic trash that looks the same as the output I ask of it when I provide it 2 sentences total, and it doesn't even work properly. But he still stands by it, even though I'm actively watching him just dump everything he just wrote up for the AI and start implementing things himself. I don't know what to call this phenomenon, but it's shocking to me.

Even something that should be in its wheelhouse like producing simple test cases, it often just isn't able to do it to a satisfactory level. I've tried every one of these shitty things available in the market because my employer pays for it (I would never in my life spend money on this crap), and it just never works. I feel like I'm going crazy reading all the hype, but I'm slowly starting to suspect that most of it is just covert shilling by vested persons.

insane_dreamer · 2024-09-27T14:03:52 1727445832

The other day I decided to write a script (that I needed for a project, but ancillary, not core code) entirely with CoPilot. It wasn't particularly long (maybe 100 lines of python). It worked. But I had to iterate so much with the LLM, repeating instructions, fixing stuff that didn't run, that it took a fair bit longer than if I had just written it myself. And this was a fairly vanilla data science type of script.

shriek · 2024-09-27T15:29:45 1727450985

Most of the time the entire apps are just a timer app or something simple. Never a complex app with tons of logic in them. And if you're having to write paragraphs of texts to write something complex then might as well just write that in a programming language, I mean isn't that what high-level programming language was built for? (heh). Also, you're not the only one who's had the thought that someone is vested in someway to overhype this.

KoolKat23 · 2024-09-27T18:03:26 1727460206

You can write the high level structure yourself and let it complete the boilerplate code within the functions, where it's less critical/complicated. Can save you time.

shriek · 2024-09-27T19:06:19 1727463979

Oh for sure. I use it as smart(ish) autocomplete to avoid typing everything out/looking up in docs everytime but the thought of prompt engineering to make an app is just bizarre to me. It almost feels like it has more friction than actually writing the damn thing yourself.

mattgreenrocks · 2024-09-27T15:01:02 1727449262

You aren’t the only one that feels this way.

After 20 years of being held accountable for the quality of my code in production, I cannot help but feel a bit gaslit that decision-makers are so elated with these tools despite their flaws that they threaten to take away jobs.

skydhash · 2024-09-27T14:00:58 1727445658

Here is another example [0]. 95% of the code was taken as it is from the examples of the documentation. If you still need to read the code after it was generated, you may have well read the documentation first.

When they say treat it like an intern, I'm so confused. An intern is there to grow and hopefully replace you as you get promoted or leave. The tasks you assign to him are purposely kept simple for him to learn the craft. The monotonous ones should be done by the computer.

[0]: https://gist.github.com/simonw/97e29b86540fcc627da4984daf5b7...

skywhopper · 2024-09-27T15:46:54 1727452014

I think to the extent this works for some people it’s as a way to trick their brains into “fixing” something broken rather than having to start from scratch. And for some devs, that really is a more productive mode, so maybe it works in the end.

And that’s fine if the dev realizes what’s going on but when they attribute their own quirks to AI magic, that’s a problem.

Workaccount2 · 2024-09-27T13:50:40 1727445040

As a non-programmer at a non-programming company:

I use it to write test systems for physical products. We used to contract the work out or just pay someone to manually do the tests. So far it has worked exceptionally well for this.

I think the core issue of the "do LLMs actually suck" is people place different (and often moving) goalposts for whether or not it sucks.

mythrwy · 2024-09-27T16:47:48 1727455668

I just wrote a fairly sizable app with an LLM. This is the first complete app I've written using it. I did write some of the core logic myself leaving the standard crud functions and UI for the LLM.

I did it in little pieces and started over with fresh context each time the LLM started to get off in the weeds. I'm very happy with the result. The code is clean and well commented, the tests are comprehensive and the app looks nice and performs well.

I could have done all this manually too but it would have taken longer and I probably would have skimped out on some tests and gave up and hacked a few things in out of expedience.

Did the LLM get things wrong on occasion? Yes. Make up api methods that don't exist? Yes. Skip over obvious standard straightforward and simple solutions in favor of some rat's nest convoluted way to achieve the same goal? Yes.

But that is why I'm here. It's a different style of programming (and one that I don't enjoy nearly as much as pounding the keyboard). It's more high level thinking and code review involved and less worrying about implementation detail.

It might not work as well in domains which training data doesn't exist in. Also certainly if someone expects to come in with no knowledge and just paste code without understanding, reading and pushing back, they will have a non working mess pretty shortly. But overall these tools dramatically increase productivity in some domains is my opinion.

ChainOfFools · 2024-09-27T16:17:15 1727453835

> but I'm slowly starting to suspect that most of it is just covert shilling by vested persons.

It's almost as if the horde of former kleptocurrency bros have found a promising new seam of fool's gold to mine

achempion · 2024-09-27T13:54:42 1727445282

I have the same observation as well. The hype is getting generated mostly by people who're selling AI courses or AI-related products.

It works well as a smart documentation search where you can ask follow-up questions or when you know what the output should look like if you see it but can't type it directly from the memory.

For code assistants (aka copilot / cursor), it works if you don't care about the code at all and ok with any solution if it's barely working (I'm ok with such code for my emacs configuration).

meiraleal · 2024-09-27T15:14:21 1727450061

LLMs are great to go from 0 to 2b but you wanted to go to 1 so you remove and modify lots of things, get back to 1 and then go to 2.

Lots of people are terrible at going from 0 to 1 in any project. Me included. LLMs helped me a lot solving this issue. It is so much easier to iterate over something.

kranuckle · 2024-09-29T03:07:16 1727579236

I think it’s more that if you want to believe it’s magic future tech then it looks like it.

If you aren’t on board then it looks impressive but flawed and not even close to living up to the hype.

flir · 2024-09-27T13:48:22 1727444902

Just for fun, give it a function you wrote, and ask it if it can make any improvements. I reckon I accept about a third of what it suggests.

mattgreenrocks · 2024-09-27T15:02:48 1727449368

Not a bad use, though I argue being able to do that critique yourself has a compounding effect over time that is worthwhile.

flir · 2024-09-27T15:26:34 1727450794

Well... I have to critique the critique, else how do I know which two thirds to reject?

In theory I'm learning from the LLM during this process (much like a real code review). In practice, it's very rare that it teaches me something, it's just more careful than I am. I don't think I'm ever going to be less slap-dash, unfortunately, so it's a useful adjunct for me.