You may be right in the general sentiment that not everyone with a PhD is a desirable candidate, but even if half of them were, that would be 5,000 fewer and that isn’t insignificant, especially in STEM fields.
> LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building.
I’ve always said I’m a builder even though I’ve also enjoyed programming (but for an outcome, never for the sake of the code)
This perfectly sums up what I’ve been observing between people like me (builders) who are ecstatic about this new world and programmers who talk about the craft of programming, sometimes butting heads.
One viewpoint isn’t necessarily more valid, just a difference of wiring.
I noticed the same thing, but wasn't able to put it into words before reading that. Been experimenting with LLM-based coding just so I can understand it and talk intelligently about it (instead of just being that grouchy curmudgeon), and the thought in the back of my mind while using Claude Code is always:
"I got into programming because I like programming, not whatever this is..."
Yes, I'm building stupid things faster, but I didn't get into programming because I wanted to build tons of things. I got into it for the thrill of defining a problem in terms of data structures and instructions a computer could understand, entering those instructions into the computer, and then watching victoriously while those instructions were executed.
If I was intellectually excited about telling something to do this for me, I'd have gotten into management.
Same. This kind of coding feels like it got rid of the building aspect of programming that always felt nice, and it replaced it entirely with business logic concerns, product requirements, code reviews, etc. All the stuff I can generally take or leave. It's like I'm always in a meeting.
>If I was intellectually excited about telling something to do this for me, I'd have gotten into management.
Exactly this. This is the simplest and tersest way of explaining it yet.
That's what I'm doing on my codebases, while I still can. I only use Claude if I need to work on a different team's code that uses it heavily. Nothing quite gets a groan from me like opening up a repo and seeing CLAUDE.md
I'd go one step higher, we're not builders, we're problem solvers.
Sometimes the problem needs building, sometimes not.
I'm an Engineer, I see a problem and want to solve it. I don't care if I have to write code, have a llm build something new, or maybe even destroy something. I want to solve the problem for the business and move to the next one, most of the time it is having a llm write code though.
Same same. Writing the actual code is always a huge motivator behind my side projects. Yes, producing the outcome is important, but the journey taken to get there is a lot of fun for me.
I used Claude Code to implement a OpenAI 4o-vision powered receipt scanning feature in an expense tracking tool I wrote by hand four years ago. It did it in two or three shots while taking my codebase into account.
It was very neat, and it works great [^0], but I can't latch onto the idea of writing code this way. Powering through bugs while implementing a new library or learning how to optimize my test suite in a new language is thrilling.
Unfortunately (for me), it's not hard at all to see how the "builders" that see code as a means to an end would LOVE this, and businesses want builders, not crafters.
In effect, knowing the fundamentals is getting devalued at a rate I've never seen before.
[^0] Before I used Claude to implement this feature, my workflow for processing receipts looked like this: Tap iOS Shortcut, enter the amount, snap a pic of the receipt, type up the merchant, amount and description for the expense, then have the shortcut POST that to my expenses tracking toolkit which, then, POSTs that into a Google Sheet. This feature amounted the need for me to enter the merchant and amount. Unfortunately, it often took more time to confirm that the merchant, amount and date details OpenAI provided were correct (and correct it when details were wrong, which was most of the the time) than it did to type out those details manually, so I just went back to my manual workflow. However, the temptation to just glance at the details and tap "This looks correct" was extremely high, even if the info it generated was completely wrong! It's the perfect analogue to what I've been witnessing throughout the rise of the LLMs.
What I have enjoyed about programming is being able to get the computer to do exactly what I want. The possibilities are bounded by only what I can conceive in my mind. I feel like with AI that can happen faster.
The examples that you and others provide are always fundamentally uninteresting to me. Many, if not most, are some variant of a CRUD application. I have yet seen a single ai generated thing that I personally wanted to use and/or spend time with. I also can't help but wonder what we might have accomplished if we devoted the same amount of resources to developing better tools, languages and frameworks to developers instead of automating the generation of boiler plate and selling developer's own skills back to them. Imagine if open source maintainers instead had been flooded with billions of dollars in capital. What might be possible?
And also, the capacities of llms are almost besides the point. I don't use llms but I have no doubt that for any arbitrary problem that can be expressed textually and is computable in finite time, in the limit as time goes to infinity, an llm will be able to solve it. The more important and interesting questions are what _should_ we build with llms and what should we _not_ build with them. These arguments about capacity are distracting from these more important questions.
Considering how much time developers spend building uninteresting CRUD applications I would argue that if all LLMs can do is speed that process up they're already worth their weight in bytes.
The impression I get from this comment is that no example would convince you that LLMs are worthwhile.
The problem with replying to the proof-demanders is that they'll always pick it apart and find some reason it doesn't fit their definition. You must be familiar with that at this point.
I looked closely enough to confirm there were no architectural mistakes or nasty gotchas. It's code I would have been happy to write myself, only here I got it written on my phone while riding the BART.
See this is a perfect example of OPs statement! I don't care about the lines, I care about the output! It was never about the lines of code.
Your comment makes it very clear there are different viewpoints here. We care about problem->solution. You care about the actual code more than the solution.
This gets at the heart of the quality of results issues a lot of people are talking about elsewhere here. Right now, if you treat them as a system where you can tell it what you want and it will do it for you, you're building a sandcastle. Instead of that, also describe the correct data structures and appropriate algorithms to use against them, as well as the particulars of how you want the problem solved, it's a different situation altogether. Like most systems, the quality of output is in some way determined by the quality of input.
There is a strange insistence on not helping the LLM arrive at the best outcome in the subtext to this question a lot of times. I feel like we are living through the John Henry legend in real time
> I got into it for the thrill of defining a problem in terms of data structures and instructions a computer could understand, entering those instructions into the computer, and then watching victoriously while those instructions were executed.
You can still do that with Claude Code. In fact, Claude Code works best the more granular your instructions get.
For some reason this makes me think of a jigsaw puzzle. People usually complete these puzzles because they enjoy the process where on the end you get a picture that you can frame if you want to. Some people seem to want to get the resulting picture. No interest in process at all.
I guess that's the same people who went to all those coding camps during their hay day because they heard about software engineering salaries. They just want the money.
When I last bought a Lego Technic set because I wanted to play with making mechanisms with gears and stuff, I assembled it according to the instructions, which was fun, and then the final result was also cool and I couldn't bear to dismantle it.
IMO, this isn't entirely a "new world" either, it is just a new domain where the conversation amplifies the opinions even more (weird how that is happening in a lot of places)
What I mean by that: you had compiled vs interpreted languages, you had types vs untyped, testing strategies, all that, at least in some part, was a conversation about the tradeoffs between moving fast/shipping and maintainability.
But it isn't just tech, it is also in methodologies and the words use, from "build fast and break things" and "yagni" to "design patterns" and "abstractions"
As you say, it is a different viewpoint... but my biggest concern with where are as industry is that these are not just "equally valid" viewpoints of how to build software... it is quite literally different stages of software, that, AFAICT, pretty much all successful software has to go through.
Much of my career has been spent in teams at companies with products that are undergoing the transition from "hip app built by scrappy team" to "profitable, reliable software" and it is painful. Going from something where you have 5 people who know all the ins and outs and can fix serious bugs or ship features in a few days to something that has easy clean boundaries to scale to 100 engineers of a wide range of familiarities with the tech, the problem domain, skill levels, and opinions is just really hard. I am not convinced yet that AI will solve the problem, and I am also unsure it doesn't risk making it worse (at least in the short term)
Much of my career has been spent in teams at companies with products that are undergoing the transition from "hip app built by scrappy team" to "profitable, reliable software" and it is painful. Going from something where you have 5 people who know all the ins and outs and can fix serious bugs or ship features in a few days to something that has easy clean boundaries to scale to 100 engineers of a wide range of familiarities with the tech, the problem domain, skill levels, and opinions is just really hard. I am not convinced yet that AI will solve the problem, and I am also unsure it doesn't risk making it worse (at least in the short term)
“””
This perspective is crucial. Scale is the great equalizer / demoralizer, scale of the org and scale of the systems. Systems become complex quickly, and verifiability of correctness and function becomes harder. Companies that built from day with AI and have AI influencing them as they scale, where does complexity begin to run up against the limitations of AI and cause regression? Or if all goes well, amplification?
But how can you be a responsible builder if you don't have trust in the LLMs doing the "right thing"? Suppose you're the head of a software team where you've picked up the best candidates for a given project, in that scenario I can see how one is able to trust the team members to orchestrate the implementation of your ideas and intentions, with you not being intimately familiar with the details.
Can we place the same trust in LLM agents? I'm not sure. Even if one could somehow prove that LLM are very reliable, the fact an AI agents aren't accountable beings renders the whole situation vastly different than the human equivalent.
I test all of the code I produce via LLMs, usually doing fairly tight cycles. I also review the unit test coverage manually, so that I have a decent sense that it really is testing things - the goal is less perfect unit tests and more just quickly catching regressions. If I have a lot of complex workflows that need testing, I'll have it write unit tests and spell out the specific edge cases I'm worried about, or setup cheat codes I can invoke to test those workflows out in the UI/CLI.
Trust comes from using them often - you get a feeling for what a model is good and bad at, and what LLMs in general are good and bad at. Most of them are a bit of a mess when it comes to UI design, for instance, but they can throw together a perfectly serviceable "About This" HTML page. Any long-form text they write (such as that About page) is probably trash, but that's super-easy to edit manually. You can often just edit down what they write: they're actually decent writers, just very verbose and unfocused.
I find it similar to management: you have to learn how each employee works. Unless you're in the Top 1%, you can't rely on every employee giving 110% and always producing perfect PRs. Bugs happen, and even NASA-strictness doesn't bring that down to zero.
And just like management, some models are going to be the wrong employee for you because they think your style guide is stupid and keep writing code how they think it should be written.
You don't simply put a body in a seat and get software. There are entire systems enabling this trust: college, resume, samples, referral, interviews, tests and CI, monitoring, mentoring, and performance feedback.
And accountability can still exist? Is the engineer that created or reviewed a Pull Request using Claude Code less accountable then one that used PICO?
> And accountability can still exist? Is the engineer that created or reviewed a Pull Request using Claude Code less accountable then one that used PICO?
The point is that in the human scenario, you can hold the human agents accountable. You cannot do that with AI. Of course, you as the orchestrator of agents will be accountable to someone, but you won't have the benefit of holding your "subordinates" accountable, which is what you do in a human team. IMO, this renders the whole situation vastly different (whether good or bad I'm not sure).
I remember leaving university going into my first engineering job, thinking "Where is all the engineering? All the problem solving and building complex system? All the math and science? Have I been demoted to a lowly programmer?"
Took me a few years to realize that this wasn't a universal feeling, and that many others found the programming tasks more fulfilling than any challenging engineering. I suppose this is merely another manifestation of the same phenomena.
I feel like this is the core issue that will actually stall LLM coding tools short of actually replacing coding work at large.
'Coders' make 'builders' keep the source code good enough so that 'builders' can continue building without breaking what they built.
If 'builders' become x10 productive and 'coders' become unable to keep up with unsurmountable pile of unmaintainable mess that 'builders' proudly churn out, 'bullders' will start to run into impossibility to build further without starting over and over again hoping that agents will be able to get it right this time.
> > LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building.
> I’ve always said I’m a builder even though I’ve also enjoyed programming (but for an outcome, never for the sake of the code)
> This perfectly sums up what I’ve been observing between people like me (builders) who are ecstatic about this new world and programmers who talk about the craft of programming, sometimes butting heads.
That's one take, sure, but it's a specially crafted one to make you feel good about your position in this argument.
The counter-argument is that LLM coding splits up engineers based on those who primarily like engineering and those who like managing.
You're obviously one of the latter. I, OTOH, prefer engineering.
I prefer engineering too, I tried management and I hated it.
It's just the level of engineering we're split on. I like the type of engineering where I figure out the flow of data, maybe the data structures and how they move through the system.
Writing the code to do that is the most boring part of my job. The LLM does it now. I _know_ how to do it, I just don't want to.
It all boils down to communication in a way. Can you communicate what you want in a way others (in this case a language model) understands? And the parts you can't communicate in a human language, can you use tools to define those (linters, formatters, editorconfig)?
I've done all that with actual humans for ... a decade? So applying the exact same thing to a machine is weirdly more efficient, it doesn't complain about the way I like to have my curly braces - it just copies the defined style. With humans I've found out that using impersonal tooling to inspect code style and flaws has a lot less friction than complaining about it in PR reviews. If the CI computer says no, people don't complain, they fix it.
Maybe there's an intermediate category: people who like designing software? I personally find system design more engaging than coding (even though I enjoy coding as well). That's different from just producing an opaque artifact that seems to solve my problem.
I think he's really getting at something there. I've been thinking about this a lot (in the context of trying to understand the persistent-on-HN skepticism about LLMs), and the framing I came up with[1] is top-down vs. bottom-up dev styles, aka architecting code and then filling in implementations, vs. writing code and having architecture evolve.
I can't argue that. The scale was already imbalanced as well, and vibe coding has lowered the bar even more, so the gap will continue to grow for now.
I'm just saying that LLMs aren't causing the divide. Accelerating yes, but I think simply equating AI usage to poor quality is wrong. Craftsmen now have a powerful tool as well, to analyze, nitpick, and refactor in ways that were previously difficult to justify.
It also seems premature for so many devs to jump to hardline "AI bad" stances. So far the tech is improving quite well. We may not be able to 1-shot much of quality yet, but it remains to be seen if that will hold.
Personally, I have hopes that AI will eventually push code quality much higher than it's ever been. I might be totally wrong of course, but to me it feels logical that computers would be very good at writing computer programs once the foundation is built.
I think the division is more likely tied to writing. You have to fundamentally change how you do your job, from one of writing a formal language for a compiler to one of writing natural language for a junior-goldfish-memory-allstar-developer, closer to management then to contributor.
This distinction to me separates the two primary camps
Yeah, I think this is a bit of insight I had not realized / been able to word correctly yet. There's developers who can let Claud go at it, and be fearless about it like me (though I mostly do it for side projects, but WOW) and then there's developers who will use it like a hammer or axe to help cut down or mold whatever is in their path.
I think both approaches are okay, the biggest thing for me is the former needs to test way more, and review the code more, as developers we don't read code enough, with the "prompt and forget" approach we have a lot of free time we could spend reading the code, asking the model to refactor and refine the code. I am shocked when I hear about hundreds of thousands of lines in some projects. I've rebuilt Beads from the ground up and I'm under 10 lines of code.
So we're going to have various level of AI Code Builders if you will: Junior, Mid, Senior, Architect. I don't know if models will ever pick up the slack for Juniors any time soon. We would need massive context windows for models, and who will pay for that? We need a major AI breakthrough to where the cost goes down drastically before that becomes profitable.
We have services deployed globally serving millions of customers where rigor is really important.
And we have internal users who're building browser extensions with AI that provide valuable information about the interface they're looking at including links to the internal record management, and key metadata that's affecting content placement.
These tools could be handed out on Zip drives in the street and it would just show our users some of the metadata already being served up to them, but it's amazing to strip out 75% of the process of certain things and just have our user (in this case though, it's one user who is driving all of this, so it does take some technical inclination) build out these tools that save our editors so much time when doing this before would have been months and months and months of discovery and coordination and designs that probably wouldn't actually be as useful in the end after the wants of the user are diluted through 18 layers of process.
There's more to it than just coding Vs building though.
For a long time in my career now I've been in a situation where I'd be able to build more if I was willing to abstract myself and become a slide-merchant/coalition-builder. I don't want to do this though.
Yet, I'm still quite an enthusiastic vibe-coder.
I think it's less about coding Vs building and more about tolerance for abstraction and politics. And I don't think there are that many people who are so intolerant of abstraction that they won't let agents write a bunch of code for them.
I’ve heard something similar: “there are people who enjoy the process, and people who enjoy the outcome”. I think this saying comes moreso from artistic circles.
I’ve always considered myself a “process” person, I would even get hung-up on certain projects because I enjoyed crafting them so much.
LLM’s have taken a bit of that “process” enjoyment from me, but I think have also forced some more “outcome” thinking into my head, which I’m taking as a positive.
The new LLM centered workflow is really just a management job now.
Managers and project managers are valuable roles and have important skill sets. But there's really very little connection with the role of software development that used to exist.
It's a bit odd to me to include both of these roles under a single label of "builders", as they have so little in common.
I enjoy both and have ended up using AI a lot differently than vibe coders. I rarely use it for generating implementations, but I use it extensively for helping me understand docs/apis and more importantly, for debugging. AI saves me so much time trying to figure out why things aren’t working and in code review.
I deliberately avoid full vibe coding since I think doing so will rust my skills as a programmer. It also really doesn’t save much time in my experience. Once I have a design in mind, implementation is not the hard part.
To me this is similar to car enthusiasms. Some people absolutely love to build their project car, it's a major part of the hobby for them. Others just love the experience of driving, so they buy ready cars or just pay someone to work on the car.
agree completely. I used to be (and still would love to be) a process person, enjoying hand writing bulletproof artisanal code. Switching to startups many years ago gave me a whole new perspective, and its been interesting the struggle between writing code and shipping. Especially when you dont know how long the code you are writing will actually live. LLMs are fantastic in that space.
So far I haven't seen it actually be effective at "building" in a work context with any complexity, and this despite some on our team desperately trying to make that the case.
I have! You have to be realistic about the projects. The more irreducible local context it needs, the less useful it will be. Great for greenfield code, oneshots, write once read once run for months.
Agreed. I don’t care for engineering or coding, and would gladly give it up the moment I can. I’m also running a one man business where every hour counts (and where I’m responsible for maintaining every feature).
The fact of the matter is LLMs produce lower quality at higher volumes in more time than it would take to write it myself, and I’m a very mediocre engineer.
I find this seperation of “coding” vs “building” so offensive. It’s basically just saying some people are only concerned with “inputs”, while others with “outputs”. This kind of rhetoric is so toxic.
It’s like saying LLM art is separating people into people who like to scribble, and people who like to make art.
I mean it’s closer, but I don’t think it’s right to equate commissioning an artist with paying a multi-billion dollar corporation to steal from artists.
These tools are just lazy shortcuts. And that’s fine, there’s no problem with taking the lazy way. I’m never going to put in the time to learn to draw, so it’s cool there’s an option for me.
I just take ire with pretending it’s something grand and refined, or spitting in the face of the ones who are willing to put in the work
I like building, but I don't fool myself into thinking it can be done by taking shortcuts. You could build something that looks like a house for half the cost but it won't be structurally sound. That's why I care about the details. Someone has to.
> I enjoy both and have ended up using AI a lot differently than vibe coders. I rarely use it for generating implementations, but I use it extensively for helping me understand docs/apis and more importantly, for debugging. AI saves me so much time trying to figure out why things aren’t working and in code review.
I had felt like this and still do but man, at some point, I feel like the management churn feels real & I just feel suffering from a new problem.
Suppose, I actually end up having services literally deployed from a single prompt nothing else. Earlier I used to have AI write code but I was interested in the deployment and everything around it, now there are services which do that really neatly for you (I also really didn't give into the agent hype and mostly used browsers LLM)
Like on one hand you feel more free to build projects but the whole joy of project completely got reduced.
I mean, I guess I am one of the junior dev's so to me AI writing code on topics I didn't know/prototyping felt awesome.
I mean I was still involved in say copy pasting or looking at the code it generates. Seeing the errors and sometimes trying things out myself. If AI is doing all that too, idk
For some reason, recently I have been disinterested in AI. I have used it quite a lot for prototyping but I feel like this complete out of the loop programming just very off to me with recent services.
I also feel like there is this sense of if I buy for some AI thing, to maximally extract "value" out of it.
I guess the issue could be that I can have vague terms or have a very small text file as input (like just do X alternative in Y lang) and I am now unable to understand the architectural decisions and the overwhelmed-ness out of it.
Probably gonna take either spec-driven development where I clearly define the architecture or development where I saw something primagen do recently which is that the AI will only manipulate code of that particular function, (I am imagining it for a file as well) and somehow I feel like its something that I could enjoy more because right now it feels like I don't know what I have built at times.
When I prototype with single file projects using say browser for funsies/any idea. I get some idea of what the code kind of uses with its dependencies and functions names from start/end even if I didn't look at the middle
A bit of ramble I guess but the thing which kind of is making me feel this is that I was talking to somebody and shwocasing them some service where AI + server is there and they asked for something in a prompt and I wrote it. Then I let it do its job but I was also thinking how I would architect it (it was some detect food and then find BMR, and I was thinking first to use any api but then I thought that meh it might be hard, why not use AI vision models, okay what's the best, gemini seems good/cheap)
and I went to the coding thing to see what it did and it actually went even beyond by using the free tier of gemini (which I guess didn't end up working could be some rate limit of my own key but honestly it would've been the thing I would've tried too)
So like, I used to pride myself on the architectural decisions I make even if AI could write code faster but now that is taken away as well.
I really don't want to read AI code so much so honestly at this point, I might as well write code myself and learn hands on but I have a problem with build fast in public like attitude that I have & just not finding it fun.
I feel like I should do a more active job in my projects & I am really just figuring out what's the perfect way to use AI in such contexts & when to use how much.
My answer to this is to often get the LLMs to do multiple rounds of code review (depending on the criticality of the code, doing reviews on every commit. but this was clearly a zero-impact hobby project).
They are remarkably good at catching things, especially if you do it every commit.
> My answer to this is to often get the LLMs to do multiple rounds of code review
So I am supposed to trust the machine, that I know I cannot trust to write the initial code correctly, to somehow do the review correctly? Possibly multiple times? Without making NEW mistakes in the review process?
Sorry no sorry, but that sounds like trying to clean a dirty floor by rubbing more dirt over it.
It sounds to me like you may not have used a lot of these tools yet, because your response sounds like pushback around theoreticals.
Please try the tools (especially either Claude Code with Opus 4.5, or OpenAI Codex 5.2). Not at all saying they're perfect, but they are much better than you currently think they might be (judging by your statements).
AI code reviews are already quite good, and are only going to get better.
Why is the go-to always "you must not have used it" in lieu of the much more likely experience of having already seen and rejected first-hand the slop that it churns out? Synthetic benchmarks can rise all they want; Opus 4.5 is still completely useless at all but the most trivial F# code and, in more mainstream affairs, continues to choke even on basic ASP.NET Core configuration.
> It sounds to me like you may not have used a lot of these tools yet
And this is more and more becoming the default answer I get whenever I point out obvious flaws of LLM coding tools.
Did it occur to you that I know these flaws precisely because I work a lot with, and evaluate the performance of, LLM based coding tools? Also, we're almost 4y into the alleged "AI Boom" now. It's pretty safe to assume that almost everyone in a development capacity has spent at least some effort evaluating how these tools do. At this point, stating "you're using it wrong" is like assuming that people in 2010 didn't know which way to hold a smartphone.
Sorry no sorry, but when every criticism towards a tool elecits the response that people are not using it well, then maybe, just maybe, the flaw is not with all those people, but with the tool itself.
> Spending 4 years evaluating something that’s changing every month means almost nothing, sorry.
No need to be sorry. Because, if we accept that premise, you just countered your own argument.
If me evaluating these things for the past 4 years "means almost nothing" because they are changing sooo rapidly...then by the same logic, any experience with them also "means almost nothing". If the timeframe to get any experience with these models befor said experience becomes irelevant is as short as 90 days, then there is barely any difference between someone with experience and someone just starting out.
Meaning, under that premise, as long as I know how to code, I can evaluate these models, no matter how little I use them.
Luckily for me though, that's not the case anyway because...
> It’s about “if you last tried it more than 3 months ago,
...guessss what: I try these almost every week. It's part of my job to do so.
Implementation -> review cycles are very useful when iterating with CC. The point of the agent reviewer is not to take the place of your personal review, but to catch any low hanging fruit before you spend your valuable time reviewing.
> but to catch any low hanging fruit before you spend your valuable time reviewing.
And that would be great, if it wern't for the fact that I also have to review the reviewers review. So even for the "low hanging fruit", I need to double-check everything it does.
That is not my perspective. I don't review every review, instead use a review agent with fresh context to find as much as possible. After all automated reviews pass, I then review the final output diff. It saves a lot of back and forth, especially with a tight prompt for the review agent. Give the reviewer specific things to check and you won't see nearly as much garbage in your review.
Well, you can review its reasoning. And you can passively learn enough about, say, Rust to know if it's making a good point or not.
Or you will be challenged to define your own epistemic standard: what would it take for you to know if someone is making a good point or not?
For things you don't understand enough to review as comfortably, you can look for converging lines of conclusions across multiple reviews and then evaluate the diff between them.
I've used Claude Code a lot to help translate English to Spanish as a hobby. Not being a native Spanish speaker myself, there are cases where I don't know the nuances between two different options that otherwise seem equivalent.
Maybe I'll ask 2-3 Claude Code to compare the difference between two options in context and pitch me a recommendation, and I can drill down into their claims infinitely.
At no point do I need to go "ok I'll blindly trust this answer".
Humans do have capacity for deductive reasoning and understanding, at least. Which helps. LLMs do not. So would you trust somebody who can reason or somebody who can guess?
People work different than llms they fond things we don't and the reverse is also obviously true. As an example, a stavk ise after free was found in a large monolithic c++98 codebase at my megacorp. None of the static analyzers caught it, even after modernizing it and getting clang tidy modernize to pass, nothing found it. Asan would have found it if a unit test had covered that branch. As a human I found it but mostly because I knew there was a problem to find. An llm found and explained the bug succinctly. Having an llm be a reviewer for merge requests males a ton of sense.
Clawdbot is interesting but I finally feel like those people who look at people like me raving about Claude code when it barely works for them.
I have no doubt clawdBot, when it works, must feel great. But I’ve had the tough time setting it up and found it to be very buggy.
My first couple of conversations? It forgot the context literally seconds later when I responded.
Nevertheless, I’m sure it’s improving by the day so I’m going to set it up on my existing Mac mini because I think it has the capacity to be really fascinating.
I built something similar (well… with a lot of integrations) but for running my company and continue to iterate on it.
I’ve been doing Vim + aider, and now Claude Code. Those tools I understood. I never got into Cursor because I’m too old to give up Vim.
Clawd.bot really annoyed me at first. The setup is super tedious and broken and not fun. That’s mostly because I’m too impatient to tinker like I used to.
However, once you tinker, it’s so-so. I don’t think it’s a lot better than Claude Code or anything, but I think it’s just a focused vector for the same AI model, one focused on being your personal assistant. It’s like Claude Code vs. Claude Cowork. They’re the same thing. But given the low cost of creating custom tools, why not give people something that Clawd.bot that gives them focused guardrails?
Anyway, I could end up abandoning all of this too. And it’s all a kludge around things that should really be an API. But I do like that I can run it on my Mac Mini and have it control my desktop. It’ll be a cold day if I let it message for me; I’d rather it write deterministic code that does that, rather than do it directly.
Maybe this is the issue I’m facing. I’m already using Claude, Claude projects, Claude cowork, and Claude code a lot.
I used Claude projects for an entire proposal. That was one of the best proposals I think I’ve ever written.
I’ve been using cowork to help organize my downloads folder, which had 1500 files and I just didn’t have the patience to organize them.
So maybe the differences with Claude bought not as big because I’m able to vibe code my way into things like like integrations and other things that I’ve already been using?
For the app that I wrote to help manage my business, I exposed everything over MCP so I’m able to do things like timesheets and adding and removing people and purchase orders and all that stuff using MCP. So which is why I’m already kind of feeling the magic with my existing stuff maybe?
The one thing in ClawdBot’s favor is the scheduled stuff maybe?
Aider isn't abandoned, but it feels like it's basically in maintenance mode at this point. Updates over the last year
were limited to small improvements and fixes. There are some forks aimed at making it more "agentic" (more like Claude Code, etc). I haven't tried them personally.
Neovim is the only reason I've given vim a serious look. I love Emacs more, but Neovim lets me use any UI on top of it, which means I can have better visual indicators for things I don't know how to do in VIM. Emacs has a GUI but a lot of it is "beyond flat" and it just doesn't translate well to my brain. The best plugin for Emacs for me is still Spacemacs, and no I don't use it with the vim mode stuff, I prefer it with regular emacs commands (for anyone curious).
But Neovim just works for me every time, even vanilla its fine.
I'm a strict Emacs-only user (although sometimes I'll jump into nano for quick edits of isolated files). When I just started out, I went with Spacemacs, which served me pretty well. But there were a few pain points that I can no longer remember, and eventually I gave Doom a try. Haven't looked back.
I cloned the clawdbot repo back when it was named warelay or clawdis, can't remember, but it was much less dense then. Mainly cloned it for the in-the-box Whatsapp implementation. Since then I've built it into a pretty awesome agent for my home and family, who all have their own privileged access which allows it access to different skills and a mixture of shared and personal information. I have no interest in reconciling the Frankenstein I've built with newer mainline features, but the custom nature of my build is one of the things I find so fun and helpful about it. It's become so much more "mine" by just asking it to build out xyz feature for itself, and now it can do a bunch of weird things that revolve around its persistent access to the information I provide it and my ability to interface with it through a regular messaging app.
definitely. I got ad blitzed the last two days by "wow" YT videos which I admit is why I even ended up clicking through todays "bot news" to this site. It's been uber hyped with marketing strategy for sure, its only coz it was OSS I paid attention but was surprised by the marketing for OSS since that doesnt usually happen.
And now the data exfiltration stuff happening makes me put my tinfoil hat on and think this was actually a coordinated data exfiltration attack that leveraged AI hype lol.
The emergency docket is a preferred method for blatant partisanship because it lets them immediately prevent lower courts from stopping the administration but doesn’t require them to set a binding precedent or even explain the ruling. If it looks like they might be losing power, suddenly those “emergency” decisions which were subsequently back-burnered can be dropped to prevent a Democrat from using the same powers.
I don’t know what your stack is, but at least with elixir and especially typescript/nextJS projects, and properly documenting all those pieces you mentioned, it goes a long way. You’d be amazed.
I would never use, let alone pay for, a fully vibe-coded app whose implementation no human understands.
Whether you’re reading a book or using an app, you’re communicating with the author by way of your shared humanity in how they anticipate what you’re thinking as you explore the work. The author incorporates and plans for those predicted reactions and thoughts where it makes sense. Ultimately the author is conveying an implicit mental model to the reader.
The first problem is that many of these pathways and edge cases aren’t apparent until the actual implementation, and sometimes in the process the author realizes that the overall app would work better if it were re-specified from the start. This opportunity is lost without a hands on approach.
The second problem is that, the less human touch is there, the less consistent the mental model conveyed to the user is going to be, because a specification and collection of prompts does not constitute a mental model. This can create subconscious confusion and cognitive friction when interacting with the work.
> The second problem is that, the less human touch is there, the less consistent the mental model conveyed to the user is going to be, because a specification and collection of prompts does not constitute a mental model. This can create subconscious confusion and cognitive friction when interacting with the work.
Which is why, on the one hand, the talk about "this is replacing software devs" is going to still be premature for many use cases. Because there is more to software than just the code output.
But on the other hand, a lot of software is riddled with inconsistent mental models today, depending on its age, who the UX people were, etc. This is not something unique to vibe coded apps.
> The second problem is that, the less human touch is there, the less consistent the mental model conveyed to the user is going to be, because a specification and collection of prompts does not constitute a mental model. This can create subconscious confusion and cognitive friction when interacting with the work.
tbf, this is a trend i see more and more across the industry; llm or not so many process get automated that teams just implement x cause pdm y said so and its because they need to meet goal z for the quarter... and everyone is on scrum autopilot they cant see the forest for the trees anymore.
i feel like the massive automation afforded by these coding agents may make this worse
If it involves Nextjs then we aren’t talking about the same category of software. Yes it can make a website pretty darn well. Can it debug and fix excessive database connection creation in a way that won’t make things worse? Maybe, but more often not and that’s why we are engineers and not craftsmen.
That example is from a recent bug I fixed without Cursor being able to help. It wanted to create a wrapper around the pool class that would have blocked all threads until a connection was free. Bug fixed! App broken!
If the software is, say, Audacity, who's target market isn't specifically software developers, sure, but seeing as how Claude code's target market has a lot of people who can read code and write software (some of them for a living!) it becomes material. Especially when CC has numerous bugs that have gone unaddressed for months that people in their target market could fix. I mean, I have my own beliefs as to why they haven't opened it, but at the same time, it's frustrating hitting the same bugs day after day.
> ... numerous bugs that have gone unaddressed for months that people in their target market could fix.
THIS. I get so annoyed when there's a longstanding bug that I know how to fix, the fix would be easy for me, but I'm not given the access I need in order to fix it.
For example, I use Docker Desktop on Linux rather than native Docker, because other team members (on Windows) use it, and there were some quirks in how it handled file permissions that differed from Linux-native Docker; after one too many times trying to sort out the issues, my team lead said, "Just use Docker Desktop so you have the same setup as everyone else, I don't want to spend more time on permissions issues that only affect one dev on the team". So I switched.
But there's a bug in Docker Desktop that was bugging me for the longest time. If you quit Docker Desktop, all your terminals would go away. I eventually figured out that this only happened to gnome-terminal, because Docker Desktop was trying to kill the instance of gnome-terminal that it kicked off for its internal terminal functionality, and getting the logic wrong. Once I switched to Ghostty, I stopped having the issue. But the bug has persisted for over three years (https://github.com/docker/desktop-linux/issues/109 was reported on Dec 27, 2022) without ever being resolved, because 1) it's just not a huge priority for the Docker Desktop team (who aren't experiencing it), and 2) the people for whom it IS a huge priority (because it's bothering them a lot) aren't allowed to fix it.
Though what's worse is a project that is open-source, has open PRs fixing a bug, and lets those PRs go unaddressed, eventually posting a notice in their repo that they're no longer accepting PRs because their team is focusing on other things right now. (Cough, cough, githubactions...)
> I get so annoyed when there's a longstanding bug that I know how to fix, the fix would be easy for me, but I'm not given the access I need in order to fix it.
This exact frustration (in his case, with a printer driver) is responsible for provoking RMS to kick off the free software movement.
GitHubactions is a bit of a special case, because it's mostly run in their systems, but that's when you just fork and, I mean, the problems with their (original) branch is their problem.
They are turning it into a distributed system that you'll have to pay to access. Anyone can see this. CLI is easy to make and easy to support, but you have to invest in the underlying infrastructure to really have this pay off.
Especially if they want to get into enterprise VPCs and "build and manage organizational intelligence"
The CLI is just the tip of the iceberg. I've been building a similar loop using LangGraph and Celery, and the complexity explodes once you need to manage state across async workers reliably. You basically end up architecting a distributed state machine on top of Redis and Postgres just to handle retries and long-running context properly.
But you don't have to be restricted to one model either? Codex being open source means you can choose to use Claude models, or Gemini, or...
It's fair enough to decide you want to just stick with a single provider for both the tool and the models, but surely still better to have an easy change possible even if not expecting to use it.
Codex CLI with Opus, or Gemini CLI with 5.2-codex, because they're open sourced agents? Go ahead if you want but show me where it actually happens with practical values
This is a fun thought experiment. I believe that we are now at the $5 Uber (2014) phase of LLMs. Where will it go from here?
How much will a synthetic mid-level dev (Opus 4.5) cost in 2028, after the VC subsidies are gone? I would imagine as much as possible? Dynamic pricing?
Will the SOTA model labs even sell API keys to anyone other than partners/whales? Why even that? They are the personalized app devs and hosts!
Man, this is the golden age of building. Not everyone can do it yet, and every project you can imagine is greatly subsidized. How long will that last?
While I remember $5 Ubers fondly, I think this situation is significantly more complex:
- Models will get cheaper, maybe way cheaper
- Model harnesses will get more complex, maybe way more complex
- Local models may become competitive
- Capital-backed access to more tokens may become absurdly advantaged, or not
The only thing I think you can count on is that more money buys more tokens, so the more money you have, the more power you will have ... as always.
But whether some version of the current subsidy, which levels the playing field, will persist seems really hard to model.
All I can say is, the bad scenarios I can imagine are pretty bad indeed—much worse than that it's now cheaper for me to own a car, while it wasn't 10 years ago.
If the electric grid cannot keep up with the additional demand, inference may not get cheaper. The cost of electricity would go up for LLM providers, and VCs would have to subsidize them more until the price of electricity goes down, which may take longer than they can wait, if they have been expecting LLM's to replace many more workers within the next few years.
This is a super interesting dynamic! The CCP is really good at subsidizing and flooding global markets, but in the end, it takes power to generate tokens.
In my Uber comparison, it was physical hardware on location... taxis, but this is not the case with token delivery.
This is such a complex situation in that regard, however, once the market settles and monopolies are created, eventually the price will be what market can bear. Will that actually create an increase in gross planet product, or will the SOTA token providers just eat up the existing gross planet product, with no increase?
I suppose whoever has the cheapest electricity will win this race to the bottom? But... will that ever increase global product?
___
Upon reflection, the comment above was likely influenced by this truly amazing quote from Satya Nadella's interview on the Dwarkesh podcast. This might be one of the most enlightened things that I have ever heard in regard to modern times:
> Us self-claiming some AGI milestone, that's just nonsensical benchmark hacking to me. The real benchmark is: the world growing at 10%.
With optimizations and new hardware, power is almost a negligible cost that $5/month would be sufficient for all users, contrary to people's belief. You can get 5.5M tokens/s/MW[1] for kimi k2(=20M/KWH=181M tokens/$) which is 400x cheaper than current pricing even if you exclude architecture/model improvements. The thing is currently Nvidia is swallowing up a massive revenue which China could possible solve by investing in R and D.
I can run Minimax-m2.1 on my m4 MacBook Pro at ~26 tokens/second. It’s not opus, but it can definitely do useful work when kept on a tight leash. If models improve at anything like the rate we have seen over the last 2 years I would imagine something as good as opus 4.5 will run on similarly specced new hardware by then.
I appreciate this, however, as a ChatGPT, Claude.ai, Claude Code, and Windsurf user... who has tried nearly every single variation of Claude, GPT, and Gemini in those harnesses, and has tested all the those models via API for LLM integrations into my own apps... I just want SOTA, 99% of the time, for myself, and my users.
I have never seen a use case where a "lower" model was useful, for me, and especially my users.
I am about to get almost the exact MacBook that you have, but I still don't want to inflict non-SOTA models on my code, or my users.
This is not a judgement against you, or the downloadable weights, I just don't know when it would be appropriate to use those models.
BTW, I very much wish that I could run Opus 4.5 locally. The best that I can do for my users is the Azure agreement that they will not train on their data. I also have that setting set on my claude.ai sub, but I trust them far less.
Disclaimer: No model is even close to Opus 4.5 for agentic tasks. In my own apps, I process a lot of text/complex context and I use Azure GPT 4.1 for limited llm tasks... but for my "chat with the data" UX, Opus 4.5 all day long. It has tested so superior.
The last I checked, it is exactly equivalent per token to direct OpenAI model inference.
The one thing I wish for is that Azure Opus 4.5 had json structured output. Last I checked that was in "beta" and only allowed via direct Anthropic API. However, after many thousands of Opus 4.5 Azure API calls with the correct system and user prompts, not even one API call has returned invalid json.
I love Postgres and use it for _everything_. I've also used SQL Server for a couple of years.
I've lost count the number of times I'll read about some new postgres or MySQL thing where you find out that Oracle or SQL server implemented it 20 years ago. Yes they always have it behind expensive SKUs. But they're hardly slouches in the technical competence departments.
I found Oracle to just be a lot more unwieldy from a tooling perspective than SQL Server (which IMO had excellent tools like SSMS and the query planner/profiler to do all your DB management).
But overall, these paid databases have been very technically sound and have been solving some of these problems many, many years ago. It's still nice to see the rest of us benefit from these features in free databases nowadays.
As others have said, the query planners I used 25 years ago with Oracle (cost based, rule based, etc) were amazing. The oracle one wasn't visual but the MSSQL one was totally visual that actually gave you a whole graph of how the query was assembled. And I last used the MSSQL one 15 years ago.
Maybe pgAdmin does that now (I haven't used pgAdmin), but I miss the polished tools that came with SQL Server.
I disagree. As much as I hate speed cameras, the way they’ve been implemented (meaning, the fact that you get a letter with evidence and it’s clear you committed the offense, and usually no points, like you might get from a cop) seems to strike a balance of fair punishment.
Now, whether they’re that effective at reducing speeding is a bigger question. Because people just slam the brakes for the 100 feet around the camera and then resume speeding.
My guess is that it's more "we are right now using every talented individual right now to make sure our datacenters don't burn down from all the demand. we'll get to support soon once we can come up for air"
But at the same time, they have been hiring folks to help with Non Profits, etc.
reply