Am I spending too much time on HN or is every post/comment section filled with this same narrative? Basically, LLMs are exciting but they produce messy code for which the dev feels no ownership. Managing a codebase written by an LLM is difficult because you have not cognitively loaded the entire thing into your head as you do with code written yourself. They're okay for one-off scripts or projects you do not intend to maintain.
This is blog post/comment section summary encountered many times per day.
The other side of it is people who seem to have 'gotten it' and can dispatch multiple agents to plan/execute/merge changes across a project and want to tell you how awesome their workflow is without actually showing any code.
I think you described it much more succinctly than most people do. It's been my exact experience as well. The LLM can develop much faster than I can build a mental model. It's very easy to get to a point where you don't know what's going on, a bunch of bugs have been introduced and you can't easily fix them or refactor because you're essentially the new guy on your own project. I find myself adjusting by committing code very frequently and periodically asking the LLM to explain it to me. I often ask the LLM to confirm things are working the way it says they are and it tends to find its own bugs that way.
I use an LLM primarily for smaller, focused data analysis tasks so it's possible to move fast and still stay reasonably on top of things if I'm even a little bit careful. I think it would be really easy to trash a large code base in a hurry without some discipline and skill in using LLM. I'm finding that developing prompts, managing context, controlling pace, staying organized and being able to effectively review the LLM's work are required skills for LLM-assisted coding. Nobody teaches this stuff yet so you have to learn it the hard way.
Now that I have a taste, I wouldn't give it up. There's so much tedious stuff I just don't want to have to do myself that I can offload to the LLM. After more than 20 years doing this, I don't have the same level of patience anymore. There are also situations where I know conceptually what I want to accomplish but may not know exactly how to implement it and I love the LLM for that. I can definitely accomplish more in less time than I ever did before.
One of my favorite ways to use AI is to get me started on things. I tend to drag my feet when starting something new, but LLMs can whip up something quick. Then I look at what it did and usually hate it. Maybe it structured the code in way that doesn't mesh with the way I think or it completely failed to use some new/esoteric library I rely on.
That hate fuels me to just do the work myself. It's like the same trick as those engagement-bait math problems that pop up on social media with the wrong answer.
The same. It’s mostly an example generator, where you know what to do, but can’t take the time to build a model of the language/framework/library. Then you look at the code and retain only the procedure and the symbols used.
I do the same thing, except if I hate something, I just ask the LLM to fix it. I can usually get to a starting point I'm pretty happy with, then I take over.
After that, I may ask an LLM to write particular functions, giving it data types and signatures to guide it.
I’m enjoying it - I wouldn’t be doing it otherwise. Perhaps you misunderstood me - I’m using it to automate things that are easy enough to do but time-consuming and generally uninteresting. It’s a useful assistant.
You could make the same comment about managers - does management sound like a fulfilling career to literally anyone (to some, it doesn’t!) Or about working on a team, where colleagues do work that you depend on.
It’s also very similar to the situation with compilers and interpreters for high level languages. An assembly language or machine language programmer might ask “does writing in a high level language sound like a fulfilling career to literally anyone?”
This all makes me suspect that your comment is coming from a place where you’ve already reached a conclusion and are now looking for excuses to justify it. Typical change resistance, essentially.
> you're essentially the new guy on your own project
Holy shit that's the best description of this phenomenon I've heard so far. The most stark version of this I've experienced is working on a side project with someone who isn't a software engineer who vibe coded a bunch of features without my input. The code looked like 6-8 different people had worked on it with no one driving architecture and I had to untangle how it all got put together.
The sweet spot for me is using it in places where I know the exact pattern I want to use to solve a problem and I can describe it in very small discrete steps. That will often take something that would have taken me an hour or two to hand code something tedious down to 5-10 minutes. I agree that there's no going back, even if all progress stopped now that's too huge of a gain to ignore it as a tool.
I have found that you can't let the LLM do the thinking part. It's really fast at writing, but only writes acceptable code, if the thinking has been done for it.
In some cases, this approach might even be slower than writing the code.
Really good thoughts here. You do become like the "new guy" on the project. It's becoming a black box. I think that should be some sort of signal to people...but my dear is people not caring or being complacent with this or not taking the time to review, read and learn. That's where the danger is.
I have retreated into only accepting small snippets from it. Asking it to write print functions, that sort of thing, or a specific loop that I hand-review. For the same reason.
“
I'm finding that developing prompts, managing context, controlling pace, staying organized and being able to effectively review the LLM's work are required skills for LLM-assisted coding
“
Did you not need all these skills / approaches / frameworks for yourself / coding with a team?
This is , I think, the key difference in those (such as myself) who find LLMs to massively increase velocity / quality / quantity of output and those who don’t.
I was already highly effective at being a leader / communicator / delegating / working in teams ranging from small , intimate , we shared a mental model / context up to some of the largest teams on the planet.
If someone wasn’t already a highly effective IC/manager/leader pre LLM, an LLM will simply accelerate how fast they crash into the dirt.
It takes substantial work to be a highly effective contributor / knowledge worker at any level. Put effort into that , and LLMs become absolutely indispensable, especially as a solo founder.
I don't mind when other programmers use AI, and use it myself. What I mind is the abdication of responsibility for the code or result. I don't think that we should be issuing a disclaimer when we use AI any more than when I used grep to do the log search. If we use it, we own the result of it as a tool and need to treat it as such. Extra important for generated code.
Isn't this what Brooks describes, stating more than 50 years ago that there's a fundamental shift when a system can no longer be held in a single mind, and the communication & coordination load that results from adding people? It seems with a single person offloading the work to an LLM right at the start they give up this efficiency before even beginning, so unless you're getting AI to do all the work it will eventually bite you...
Organizations which design systems (in the broad sense used here) are constrained to produce designs which are copies of the communication structures of these organizations.
— Melvin E. Conway, How Do Committees Invent?
> there's a fundamental shift when a system can no longer be held in a single mind
Should LLM users invest in both biological (e.g. memory palace) and silicon memory caches?
This law took several years to understand for me. Early on, I'd come into an infra situation and kind of incredulously say stuff like "Why not do (incredibly obvious thing)?" and be frustrated by it quite often.
Usually it's not because people think it can't be done, or shouldn't be done, it's because of this law. Like yes in an ideal world we'd do xyz, but department head of product A is a complete anti-productive bozo that no one wants to talk to or deal with, so we'll engineer around him kind of a thing. It's incredibly common once you see it play out you'll see it everywhere.
This analysis of the real-world effects of Conway's Law seems deeply horrifying, because the implication seems to be that there's nothing you can do to keep communication efficiency and design quality high while also growing an organisation.
I think you better link to a good article instead. Good grief, what a horror. A talking head rambling on for 60 minutes.
---
disclaimer: if low information density is your thing, then your mileage may vary. Video's are for documentaries, not for reading out an article in the camera.
After opening the "transcript" on these kinds of videos (from a link in the description, which may need to be expanded), a few lines of JavaScript can extract the actual transcript text without needing to wrestle with browser copy-and-paste. Presumably the entire process could be automated, without even visiting the link in a browser.
And probably a few minutes of commercials too. I get the impression this is an emerging generational thing, but unless it's a recorded university course or a very interesting and reputable person.. no thanks. What is weird is that the instinct to prefer video seems motivated by laziness, and laziness is actually an adaptive thing to deal with information overload.. yet this noble impulse is clearly self-defeating in this circumstance. Why wait and/or click-through ads for something that's low-density in the first place, you can't search, etc.
Especially now that you can transcript the video and quickly get AI to clean it up into a post, creating/linking a video potentially telegraphs stuff like: nothing much to say but a strong desire to be in the spotlight / narcissism / an acquisitiveness for clicks / engagement. Patiently enduring infinite ads while you're pursuing educational goals and assuming others are willing to, or assuming other people are paying for ad-free just because you do, all telegraphs a lack of respect for the audience, maybe also a lack of self-respect. Nothing against OP or this video in particular. More like a PSA about how this might come across to other people, because I can't be the only person that feels this way.
Always and entirely subjective of course, but I find Casey Muratori to be both interesting and reputable.
> What is weird is that the instinct to prefer video seems motivated by laziness, and laziness is actually an adaptive thing to deal with information overload...
What's even weirder is the instinct to not actually engage with the content of the linked video and a discussion on Conway's Law and organisational efficiency and instead head straight into a monologue about some kind of emerging generational phenomenon of laziness highlighted by a supposed preference for long video content, which seems somewhat ironic itself as ignoring the original subject matter to just post your preferences as 'PSA' is its own kind of laziness. To each their own I guess.
Although I do think the six-hour YouTube 'essays' really could do with some serious editing, so perhaps there's something there after all...
Okay, so you didn't even bother to take a few seconds to step through the video to see if there was anything other than the talking head (I'll help you out a bit, there is).
Either way, it's a step-by-step walk through of the ideas of the original article that introduced Conway's Law and a deeper inspection into ideas about _why_ it might be that way.
If that's not enough then my apologies but I haven't yet found an equivalent article that goes through the ideas in the same way but in the kind of information-dense format that I assume would help you hit your daily macros.
(I didn't downvote you btw)
But anyways, I did step through. And even in the section he should make his point, he couldn't. I rage quit this stuff.
Don't take it personally, you might have found great insight from it. But if you want to see my POV: I can scan, like most humans, a large text in seconds, processing it with a massive parallel network. When I find an anchor of interest, I can scan around for more context. I can go back to sections, to read it deeper.
A video is Gigabyte of download to convey a few bytes of information, dripping slowly over the span of an hour. A text is a few kilobytes, downloaded in an instant, and then it takes a few seconds to scan it, a minute to read some things deeper, and then I can decide if it is worth it to mine deeper. Even then the additional cost will be like 3 minutes.
But, to be fair, I know quite some people that do not have this ability. They struggle to dissect a text, to chop it apart and quickly pull out the information. But that could also be an issue of not being able to give full bandwidth to an information source. Some people can't focus on a text, but like to listen to books while driving for example.
That's completely fair, and I actually completely agree about the information density thing. I honestly prefer well-written concise documentation over tutorial videos for this same reason, however I've not yet found the equivalent text form of this video yet (automatic transcription aside) so it's really the only example I've got that seems to extract some of the most salient points from Conway's paper and puts forward an idea as to _why_ this phenonemon is. Perhaps a blog post in the making.
Man, I don't know what kind of world you live in, but an hour long video is a little too much to swallow when reading HN comments. I even gave it a chance, but had to close the tab after the guy just prepared you for something and then steered away to explain "what is a law". That's absurd.
Self-regulating in a way that is designed to favour smaller independent groups with a more complete understanding and ownership of whatever <thing> that team does?
Even putting aside the ethical issues, it's rare that I want to copy/paste code that I find into my own project without doing a thorough review of it. Typically if I'm working off some example I've found, I will hand-type it in my project's established coding style and add comments to clarify things that are not obvious to me in that moment. With an LLM's output, I think I would have to adopt a similar workflow, and right now that feels slower than just solving the problem myself. I already have the project's domain in my mental map, and explaining it to the agent is tedious and a time waste.
I think this is often overlooked, because on the one hand it's really impressive what the predictive model can sometimes do. Maybe it's super handy as an autocomplete, or an exploration, or for rapidly building a prototype? But for real codebases, the code itself isn't the important part. What matters is documenting the business logic and setting it up for efficient maintenance by all stakeholders in the project. That's the actual task, right there. I spend more time writing documentation and unit tests to validate that business logic than I do actually writing the code that will pass those tests, and a lot of that time is specifically spent coordinating with my peers to make sure I understand those requirements, that they were specified correctly, that the customer will be satisfied with the solution... all stuff an LLM isn't really able to replace.
Thanks for sharing this beautiful essay which I have never come across. The essay and its citations are thought-provoking reading.
IMO, LLMs of today are not capable of building theories (https://news.ycombinator.com/item?id=44427757#44435126). And, if we view programming as theory building, then LLMs are really not capable of coding. They will remain useful tools.
LLMS are great at generating scaffolding and boilerplate code which then I can iterate upon. I'm not going write
describe User do ...
it ".."
for the thousand time..
or write the controller files with CRUD actions..
LLMS can do these. I can then review the code, improve it and go from there.
They are also very useful for brain storming ideas, I treat it as a better google search. If I'm stuck trying to model my data, I can ask it questions and it gives me recommendations. I can then think about it and come up with an approach that makes sense.
I also noticed that LLMs really lack basic comprehension. For example, no matter how many times you provide the Schema file for it (or a part of it) , it still doesn't understand that a column doesn't exist on a model and will try to shove it in the suggested code.. very annoying.
All that being said, I have an issue with "vibe coding".. this is where the chaos happens as you blindly copy and paste everything and git push goodbye
We need to invent better languages and frameworks. Boilerplate code should be extremely minimal in the first place, but it appears to have exploded in the last decade.
- one big set of users who don't like it because it generates a lot of code and uses its own style of algorithms, and it's a whole lot of unfamiliar code that the user has to load up in their mind - as you said. Too much to comprehend, and quickly overwhelming.
And then to either side
- it unblocks users who simply couldn't have written the code on their own, who aren't even trying to load it into their head. They are now able to make working programs!
- it accelerates users who could have written it on their own, given enough time, but have figured out how to treat it as an army of junior coders, and learned to only maintain the high level algorithm in their head. They are now able to build far larger projects, fast!
More often than not the "AI" generates a large block of code that doesn't work, that I still have to read and understand - and it's more difficult to understand because it doesn't work, which is a huge waste of my time. Then I just end up writing the damn code myself, which I should have done in the first place - but my boss wants me to try using the AI.
The only thing the "AI" is marginally good at is as a fancy auto-complete that writes log statements based on the variable I just wrote into the code above it. And even this simple use case it gets it wrong a fair amount.
Overall the "AI" is a net negative for me, but maybe close to break-even thanks to the autocomplete.
That last bracket is basically the same as the tech based start-up story. You build the projects fast, but you build a ton of tech debt into it that you'll be forced to deal with unless it is a short lived project. Not that this is 100% bad, but something to know going in.
Depends. I think that becomes a question of the quality of the programmer - if they were doing it all themselves, the code quality of the (necessarily much smaller) projects would still vary between programmers. Now that variation is magnified, but if you're very good at what you do, I suspect it is still possible to create those projects without the tech debt. Though at the lower end of that bracket, I'd agree you tend to end up with giant balls of mud.
When you play architect and delegate all the work to junior developers it won't matter how good you are, you will incur a lot of tech debt. You simply cannot teach/guide every junior into writing good code as that would take more time than writing it yourself. This fact is baked into the juniors analogy.
IMHO it depends on how good you are at being a senior programmer / architect. Put the juniors where they can't do harm, and orchestrate them appropriately. The whole point of employing juniors is that you don't assign a senior to rewrite everything they do.
I'm in that last bracket. I don't really have LLMs do tasks that given enough time and scouring docs I couldn't have implemented myself. I set hard rules around architecture, components, general design patterns and then let the LLM go at it, after which review the result in multiple passes, like I would a junior's code. I could not care less about the minutiae of the actual implementation, as long as it conforms to my conventions and style guides and instructions.
Yeah. I think the trick is, you have to have been capable of doing it yourself, given time. Same as a senior engineer, they have to be capable of doing the tasks they assign to juniors.
> LLMs are exciting but they produce messy code for which the dev feels no ownership. [...] The other side of it is people who seem to have 'gotten it' and can dispatch multiple agents to plan/execute/merge changes across a project
Yup, can confirm, there are indeed people with differing opinions and experience/anecdotes on HN.
> want to tell you how awesome their workflow is without actually showing any code.
You might be having some AI-news-fatigue (I can relate) and missed a few, but there are also people who seem to have gotten it and do want to show code:
Here's one of my non-trivial open source projects where a large portion is AI built: https://github.com/senko/cijene-api (didn't keep stats, I'd eyeball it at conservatively 50% - 80%)
How is that that different than working in a large codebase with 25+ other devs.
My org has 160 engineers working on our e-commerce frontend and middle tiers. I constantly dive into repos and code I have no ownership of. The gitblame shows a contractor who worked here 3 years ago frequently.
Seems LLM does good in small, bad in medium, good again as small modules within big.
This is definitely something I feel is a choice. I've been experimenting quite a bit with AI generated code, and with any code that I intend to publish or maintain I've been very conscious in making the decision that I own the code and that if I'm not entirely happy with the AI generated output I have to fix it (or force the AI to fix it).
Which is a very different way of reviewing code than how you review another humans code, where you make compromises because you're equals.
I think this produces fine code, not particularly quickly but used well probably somewhat quicker (and somewhat higher quality code) than not using AI.
On the flip side on some throwaway experiments and patches to personalize open source products that I have absolutely no intention of upstreaming I've made the decision that the "AI" owns the code, and gone much more down the vibe coding route. This produces unmaintainable sloppy code, but it works, and it takes a lot less work than doing it properly.
I suspect the companies that are trying to force people to use AI are going to get a lot more of the "no human ownership" code than individuals like me experimenting because they think its interesting/fun.
Yes, it's very polarized. That being said, people have shown a lot of code produced by LLMs so I don't understand the dismissive argument you make at the end.
Below is a link to a great article by Simon Willison explaining an LLM assisted workflow and the resulting coded tools.
While I greatly appreciate all of Simon Willson's publishing, these tools don't meet the criteria of the OP's comment in my opinion. Willson's tools archive all do useful, but ultimately small tasks which mostly fit the "They're okay for one-off scripts or projects you do not intend to maintain" caveat from OP.
Meanwhile, it's not uncommon to see people on HN saying they're orchestrating multiple major feature implementations in parallel. The impression we get here is that Simon Willson's entire `tools` featureset could be implemented in a couple of hours.
I'd appreciate some links to the second set of people. Happy to watch YouTube videos or read more in-depth articles.
There's a third category I'd place myself in which is doing day to day work in shipping codebases with some history, using the tools to do a faster and better job of the work I'd do anyway. I think the net result is better code, and ideally on average less of it relative to the functionality because refactors are less expensive.
Many big systems are comprised of tools that do a good job at solving small tasks, carefully joined. That LLMs are not especially good at that joinery just means that's a part of the building process that stays manual.
"f you assume that this technology will implement your project perfectly without you needing to exercise any of your own skill you’ll quickly be disappointed."
"They’ll absolutely make mistakes—sometimes subtle, sometimes huge. These mistakes can be deeply inhuman—if a human collaborator hallucinated a non-existent library or method you would instantly lose trust in them"
"Once I’ve completed the initial research I change modes dramatically. For production code my LLM usage is much more authoritarian: I treat it like a digital intern, hired to type code for me based on my detailed instructions."
"I got lucky with this example because it helped illustrate my final point: expect to need to take over. LLMs are no replacement for human intuition and experience. "
I've been experimenting with them quite a bit for the past two weeks. So far the best productivity i've found from them is very tight hand-holding and clear instructions, objectives, etc. Very, very limited thinking. Ideally none.
What that gets me though is less typing fatigue and less decisions made partly due to my wrists/etc. If it's a large (but simple!) refactor, the LLM generally does amazing at that. As good as i would do. But it does that with zero wrist fatigue. Things that i'd normally want to avoid or take my time on it bangs out in minutes.
This coupled with Claude Code's recently Hook[1] introduction and you can help curb a lot of behaviors that are difficult to make perfect from an LLM. Ie making sure it tests, formats, Doesn't include emojis (boy does it like that lol), etc.
And of course a bunch of other practices for good software in general make the LLMs better, as has been discussed on HN plenty of times. Eg testing, docs, etc.
So yea, they're dumb and i don't trust their "thinking" at all. However i think they have huge potential to help us write and maintain large codebases and generally multiplying out productivity.
It's an art for sure though, and restraint is needed to prevent slop. They will put out so. much. slop. Ugh.
A lot of the time they are selling themselves as influencers on the subject. It’s often a way to get views or attention that they can use in the future.
The only way I've found LLMs to be useful for building real software, which isn't included in your list of use cases, is for "pseudo boiler-plate". That is there are some patterns that are tedious to write out, but not quite proper boiler-plate in the traditional sense, as so not as amenable to traditional solutions.
One example I deal with frequently is creating Pytorch models. Any real model is absolutely not something you want to leave in the hands of an LLM since the entire point of modeling is to incorporate your own knowledge into the design. But there is a lot of tedium, and room for errors, in getting the initial model wiring setup.
While, big picture, this isn't the 10x (or more) improvement that people like to imagine, I find in practice I personally get really stuck on the "boring parts". Reducing the time I spend on tedious stuff tends to have a pretty notable improvement in my overall flow.
I suspect that's at least partially because all of that doesn't stop the hype from being pushed on and on without mercy. Which in turn is probably because the perverse amounts of investment that went into this have to be reclaimed somehow with monetization. Imagine all those VCs having to realize that hundreds of billions of $$$ are lost to wishful hallucinations. Before they concede that there will of course be much astroturfing in the vein of your last paragraph.
And AI in general* poses existence-level questions (that could go either way: good or bad) regarding military applications, medical research, economic benefits, quality of life, human thriving, etc.
The idea that the future is going to “more or less be predictable” and “within the realm of normal” is a pretty bold claim when you look at history! Paradigm shifts happen. And many people think we’re in the middle of one — people that don’t necessarily have an economic interest in saying so.
* I’m not taking a position here about predicting what particular AI technologies will come next, for what price, with what efficiency and capabilities, and when. Lots of things could happen we can’t predict — like economic cycles, overinvestment, energy constraints, war, popular pushback, policy choices, etc. But I would probably bet that LLMs are just the beginning.
I believe it's the main topic because VCs have been trying to solve the problem of "expensive software developers" for a long time. The AI start-up hype train is real simply because that is how you get VC money these days. VC money contracted with the economy and post-Covid severely, and seemingly what is available is going to AI something-or-other. Somehow, the VC-orientated startup hype-train seems to have become the dominant voice in the zeitgeist of software development.
I think this narrative gets recycled because it's the shallow depth of reasoning afforded by thinking about technology only by thinking about the instruments of that technology and one's own personal experience with them, which is the perspective that is prioritized on HN.
Just one thought: I wonder if storing the prompt history together with the LLM code would make it easier to understand the thought process. I have noticed that I find it a little more difficult to read LLM code vs human code (that's written by decent devs)
I use LLM's for my developments constantly, so I have a pretty good grasp of what works and doesn't work.
Github copilot really gets it, as in: it's a "co-pilot", and you are the "pilot".
LLM's are able to generate code way faster than me, so in a lot of cases, writing a prompt, letting the LLM's generate a diff and me quickly reviewing, is faster than me writing/shuffling all the code. This is basically a "I know what has to be done, I just have to do it".
But let's be clear: LLM's are useful in small steps, not huge steps. Huge steps are only possible if you have a blank page.
I have code that was written by an LLM and I never really reviewed it: It's a visual effect in shaders that works, I quickly glanced over the code and it seemed complicated and fine. Since this is not crucial code, it works, and I don't have to touch it, it's fine for me.
Also an observation: once the LLM gets it wrong, it will continuously get it wrong after different instructions. After 1 or 2 failures, quickly decide to write it yourself.
The least amount of benefit any developer should get out of it is an "apply stack overflow suggestion". In the "old" days you searched google for your issue, read Stack Overflow comments, and try to apply it. LLM's let you shortcut this by going straight from prompt to diff suggestion.
You can push it a bit further than a "Stack Overflow on steroids", but don't expect it to maintain a codebase by itself.
"Also an observation: once the LLM gets it wrong, it will continuously get it wrong"
Oh yes, so very very much! I often see people keep iterating, not just with coding assistants but also general knowledge or research queries, when the first response is obviously wrong. I don't think that has ever ended up in a better response down the line. If the first response is garbage, then that means garbage is all this particular LLM is able to give you in return for what you ask. All the iterating does is travel through randomland - often until the person doing the prompting gets a response which they, for some reason, find more acceptable, even if it's just as wrong as all the others.
> is every post/comment section [related to AI/LLMs] filled with this same narrative? ... The other side of it is ...
I don't see why any of this should be surprising. I think it just reflects a lot of developers using this technology and having experiences that fall neatly into one of these two camps. I can imagine a lot of factors that might pull an individual developer in one direction or the other; most of them probably correlate, and people in the middle might not feel like they have anything interesting to say.
I've been saying this for a while. The issue is if you don't intimately know your code, you can't truly maintain it. What happens when the LLM can't figure out some obscure but that's costing you $$$,$$$ per minute? You think being unable to have the AI figure it out is an acceptable answer? Of course not. LLMs are good for figuring out bugs and paths forward, but don't bet your entire infrastructure on it. Use it as an assistant not a hammer.
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."
- Brian Kernighan
So if there's a bug in code that an LLM wrote, simply wait 6 months until the LLMs are twice as smart?
I am bewildered by the posts where people claim to have 20+ agents running at the same time on their repo. I’ve used o3, Claude 4 Opus, Gemini 2.5 Pro, and they can’t run for more than 15 minutes in the best of cases without fucking up or getting caught in a loop. Tasks that are slightly more complicated than your average code are beyond their comprehension.
I work in a team of 8. So previously I wrote 1/8th of the code. Now I write 1/16th with the other 1/16th being written by an LLM. Figuring out how to work effectively together as a team requires solving the same problem you describe. For me, an LLM is like having another junior developer on the team.
How you get good results as a team is to develop a shared mental model, and that typically needs to exist in design docs. I find that without design docs, we all agree verbally, and then are shocked at what everyone else thought we'd agreed on. Write it down.
LLMs, like junior devs, do much better with design docs. You can even let the junior dev try writing some design docs.
So if you're a solo developer, I can see this would be a big change for you. Anyone working on a team has already had to solve this problem.
On the subject of ownership: if I commit it, I own it. If the internet "goes down" and the commit has got my name on it, "but AI" isn't going to cut it.
Weirdly this is pretty much the story of code generators.
When I first got into web development there were people using HTML generators based on photoshop comps. They would produce atrocious HTML. FE developers started rewriting the HTML because otherwise you'd end up with brittle layout that was difficult to extend.
By the time the "responsive web" became a thing HTML generators were dead and designers were expected to give developers wireframes + web-ready assets.
Same thing pretty much happened with UML->Code generators, with different details.
There's always been a tradeoff between the convenience and deskilling involved in generated code and the long term maintainability.
There's also the fact that coding is fundamentally an activity where you try to use abstractions to manage complexity. Ideally, you have interfaces that are good enough that the code reads like a natural language because you're expressing what you what the computer to do at the exact correct layer of abstraction. Code generators tend to both cause and encourage bad interfaces. Often the impetus to use a code generator is the existing interfaces are bad, bureaucratic, or obscure. But using code generators ends up creating more of the same.
But also. But also, LLMs are incredibly powerful and capable tools at discovering and finding what the architecture of things is. They have amazing abilities to analyze huge code bases & to build documents and diagrams to map out the system. They can answer all manners of questions, to let us probe in.
Now, whether LLMs generate well architects systems is largely operator dependent. There's lots of low effort zero shot ways to give LLMs very little guidance and get out who knows what. But when I reflect on the fact that, for now, most code is legacy code, and most code is hideously under documented, most people reading code don't really have access to experts or artifacts to explain the code and it's architecture, my hope and belief is that LLMs are incredible tools to radically increase maintainability versus where we are now, that they are powerful peers in building the mental model of programming & systems.
> Managing a codebase written by an LLM is difficult because you have not cognitively loaded the entire thing into your head as you do with code written yourself
This happens with any sufficiently big/old codebase. We can never remember everything, even if we wrote it ourselves
I do agree with the sentiment and insight about the 2 branches of topics frequently seen lately on HN about AI-assisted coding
Would really like to see a live/video demo of semi-autonomous agents running in parallel and executing actual useful tasks on a decently complex codebase, ideally one that was entirely “manually” written by devs before agents are involved - and that actually runs a production system with either lots of users or paid customers
> This happens with any sufficiently big/old codebase. We can never remember everything, even if we wrote it ourselves
The important thing about a codebase wasn't ever really size or age, but whether it was a planned architecture or grown organically. The same is true post-LLM. Want to put AI in charge of tool-smithing inconsequential little widgets that are blocking you? Fine. Want to put AI in charge of deciding your overall approach and structure? Maybe fine. Worst of all is to put the AI in charge of the former, only to find later that you handed over architectural decisions at some point and without really intending to.
That sounds like a hard or, as if ten years of development of a large codebase was entirely known up-front with not a single change to the structure over time that happened as a result of some new information.
"We build our computers the way we build our cities—over time, without a plan, on top of ruins." -- Ellen Ullman
The only goal of a code generator is the code. I don't care whether it works or not (for specific scenarios and it could break 90% of the time). I want to see the generated code and, so far, I have never seen anything interesting besides todo lists made with ReactJS.
People who do this don’t want to see the code and perhaps even don’t care about the code. Code is just a means to an end, which is the product. It might be the wrong take from a software engineer’s perspective, but it is a take that works in at least some cases.
That applies just to one segment- the developers. What about people who wouldn’t be able to develop due to time constraints? I think it will be majority of coding LLM users. I am not talking about third group, the non-developers. But people who know languages, frameworks and design patterns. Not develop at all?
> The other side of it is people who seem to have 'gotten it' and can dispatch multiple agents to plan/execute/merge changes across a project and want to tell you how awesome their workflow is without actually showing any code.
This is a great read on the situation. Do you think these people are just making it up/generating baseless hype?
I think people are rightly hesitant to share code that has their name on it but for which they know nothing about.
I have seen a few of these full blown llm coded projects and every one of them has has some giant red flashing warning at the top of the README about the project being llm generated.
So I think it’s probably a mix of avoiding embarrassment and self preservation.
Interesting, to me it's still very much a human in the loop process and the person whose name is on the commit is ultimately responsible for what they commit.
> Managing a codebase written by an LLM is difficult because you have not cognitively loaded the entire thing into your head as you do with code written yourself.
I don't think that's the main reason. Well written code is easier to follow even when you haven't written it yourself (or maybe you did but forgot about it).
I argue you still need to cognitively load the solution, it's just that well written code allows you to (a) segment the code base effectively and (b) hold it at a higher level of abstraction.
Absolutely. The comment I was responding to argued the difficulty was that LLM code wasn't loaded cognitively. I'm arguing the problem is actually the code produced by LLMs tends to be messy and hard to follow beyond trivial examples.
I think there is a middle ground, where you still make the important decisions and let an LLM fill out the rest. You don't need every implementation detail in your head (though you should of course review it), but you need to keep control over the structure and data flow.
I'm somewhere between the two extremes.. it's not so bad that I want to put it down, and it is also not as good as what some claim. On average, it's better than hand coding everything though, and even for digging into code written by others
Laziness doesn’t stop just because technology improves. This is just intellectual laziness being blamed on LLMs, as usual. “I don’t bother to read my code anymore, so LLMs did it.” “I don’t practice coding anymore, it’s LLMs fault.” Blah. Blah Blah.
People will always blame someone or something else for laziness.
LLMs are fine with small things, such as creating or refactoring a small function, adding logs, writing a test, etc. But developing entire features or whole applications is stupid, being statistical models, the more input you feed, the more errors they accumulate
> Managing a codebase written by an LLM is difficult because you have not cognitively loaded the entire thing into your head as you do with code written yourself.
Wow you really nail the point, that's what I felt but I did not understand. Thanks for the comment.
Don't discount the number of articles that espouse a provocative position not held by the poster for the purpose of gaining traffic/attention/clout/influencer-points.
> The other side of it is people who seem to have 'gotten it' and can dispatch multiple agents to plan/execute/merge changes across a project and want to tell you how awesome their workflow is without actually showing any code.
There have been grifters hopping onto every trend. Have you noticed they never show you what exactly they built or if it was ever useful.
honestly my theory is part of it is people who are very caught up in the "craft" part of it and now hate these LLMs for producing shit that pretty much works but isn't like this "perfect specimen" of coding architecture that they now have to pour over.
honestly, the vast majority of basically CRUD apps out there we are inflating our skills a bit too much here. even if the code is junk you can adapt your mindset to accept what LLMs produce, clean it up a bit, and come out with something maintainable.
like do these people ever have to review code from other people or juniors? the feedback loop here is tighter (although the drawback is your LLM doesn't "learn").
i wouldn't use it for anything super novel or cutting edge i guess, but i don't know, i guess everyone on HN might be coding some super secret advanced project that an LLM can't handle....?
The fundamental limitation of LLMs writing code is that reading and understanding code is harder and slower than writing it. With other engineers that I work with there is an established level of trust where I do not need to deep dive into every PR. With LLMs it is like I am constantly doing code reviews for someone with whom I have zero trust. This is fundamentally a slow process, especially if you need to maintain this code in the long term and it is part of your 'core business code' that you work on 90% of the time. It also comes with all the downsides of no longer being an expert in your own codebase.
Ultimately I am responsible for any code I check in even if it was written by an LLM, so I need to perform these lengthy reviews. As others have said, if it is code that doesn't need to be maintained, then reviewing the code can be a much faster process. This is why it is so popular for hobby projects since you don't need to maintain the code if you don't want to, and it doesn't matter if you introduce subtle but catastrophic bugs.
Ultimately the tech feels like a net neutral. When you want to just throw the code away after it is very fast and good enough. If you are responsible for maintaining it, its slower than writing it yourself.
which is weird to me because i'm using in prod? literally if i care about style and structure i just say, look at these other few files and figure it out and it's fine.
if i need to work on something mission critical or new i do it by hand first. tests catch everything else. or you can just run it so that you review every change (like in claude code) as it comes in and can still grok the entire thing vs having to review multiple large files at the end.
thus i literally wonder what people are working on that requires this 100% focused mission critical style stuff at all times. i mean i don't think it's magic or AGI, but the general argument is always 1) works for hobby projects but not "production" 2) the LLM produces "messy code" which you have to review line by line as if you wrote it yourself which i've found to not be true at all.
Honestly, they produce what they were trained on -- mediocre code. That doesn't mean people can't use that to make money, deliver customer value, etc. but is it code that's going to win any awards or be inspiring? No. Not in the slightest. Probably never. Because you are what you eat.
The question is, what do you expect from an LLM? What do you want to use it for?
They're plenty useful but, with anything, you need to use it responsibly and with proper expectations.
I went from one camp to the other in the last month. I've been blown away by whats possible and here's what's working for me:
- Use Cline with Sonnet 4. Other models can work but this is the best balance of price and effectiveness.
- Always use "plan" mode first, and only after the plan mode looks good do you switch to "act" mode.
- Treat the LLM as though you are pair-programming with a junior engineer.
- Review every line that gets written as it gets written. Object or change it if you don't like it for any reason.
- Do test-driven development, and have the LLM always write tests first.
I have transitioned to using this full-time for coding and am loving the results. The code is better than what I used to write, because sometimes I can miss certain cases or get lazy. The code is better tested. The code gets written at least twice as fast. This is real production code that is being code reviewed by other humans.
This is blog post/comment section summary encountered many times per day.
The other side of it is people who seem to have 'gotten it' and can dispatch multiple agents to plan/execute/merge changes across a project and want to tell you how awesome their workflow is without actually showing any code.