I feel so weird not being the grumpy one for once.
Can't relate to GP's experience of one-shotting. I need to try a couple of times and really hone in on the right plan and constraints.
But I am getting so much done. My todo list used to grow every year. Now it shrinks every month.
And this is not mindless "vibe coding". I insist on what I deploy being quality, and I use every tool I can that can help me achieve that (languages with strong types, TDD with tests that specify system behaviour, E2E tests where possible).
I regret using the term "one-shot", because my reality isn't really that. It's more that the first shot gets the code 80-90% of the way there, usually, and it short-circuits a ton of the "code archaeology" I would normally have to do to get to that point.
Some bugs really can be one-shotted, but that's with the benefit of a lot of scaffolding our company has built and the prompting process. It's not as simple as Claude Code being able to do this out of the box.
I'm on my 5th draft of an essentially vibe-coded project. Maybe its because I'm using not-frontier models to do the coding, but I have to take two or three tries to get the shape of a thing just right. Drafting like this is something I do when I code by hand, as well. I have to implement a thing a few times before I begin to understand the domain I'm working in. Once I begin to understand the domain, the separation of concerns follows naturally, and so do the component APIs (and how those APIs hook together).
- like the sister comment says, use the best model available. For me that has been opus but YMMV. Some of my colleagues prefer the OAI models.
- iterate on the plan until it looks solid. This is where you should invest your time.
- Watch the model closely and make sure it writes tests first, checks that they fail, and only then proceeds to implementation
- the model should add pieces one by one, ensuring each step works before proceeding. Commit each step so you can easily retry if you need to. Each addition will involve a new plan that you go back and forth on until you're happy with it. The planning usually gets easier as the project moves along.
- this is sometimes controversial, but use the best language you can target. That can be Rust, Haskell, Erlang depending on the context. Strong types will make a big difference. They catch silly mistakes models are liable to make.
Cursor is great for trying out the different models. If opus is what you like, I have found Claude code to be better value, and personally I prefer the CLI to the vscode UI cursor builds on. It's not a panacea though. The CLI has its own issues like occasionally slowing to a crawl. It still gets the work done.
My options are 1) pay about a dollar per query from a frontier model, or 2) pay a fraction of that for a not-so-great model that makes my token spend last days/weeks instead of hours.
I spend a lot of time on plans, but unfortunately the gotchas are in the weeds, especially when it comes to complex systems. I don't trust these models with even marginally complex, non-standard architectures (my projects center around statecharts right now, and the semantics around those can get hairy).
I git commit after each feature/bugfix, so we're on the same page here. If a feature is too big, or is made up of more than one "big" change, I chunk up the work and commit in small batches until the feature is complete.
I'm running golang for my projects right now. I can try a more strongly typed language, but that means learning a whole new language and its gotchas and architectural constraints.
Right now I use claude-code-router and Claude Code on top of openrouter, so swapping models is trivial. I use mostly Grok-4.1 Fast or Kimi 2.5. Both of these choke less than Anthropic's own Sonnet (which is still more expensive than the two alternatives).
With the AI. I read the whole thing and correct the model where it makes mistakes, fill the gaps where I find them.
I also always check that it explicitly states my rules (some from the global rules, some from the session up until that moment) so they're followed at implementation time.
In my experience opus is great at understanding what you want and putting it in a plan, and it's also great at sticking to the plan. So just read through the entire thing and make sure it's a plan that you feel confident about.
There will be some trial and error before you notice the kind of things the model gets wrong, and that will guide what you look for in the plan that it spits out.
> Maybe its because I'm using not-frontier models to do the coding
IMO it’s probably that. The difference between where this was a a year ago and now is night and day, and not using frontier models is roughly like stepping back in time 6-12 months.
On what sort of hardware/RAM? I've been trying ollama and opencode with various local models on a 16Gb RAM, but the speed, and accuracy/behaviour just isn't good enough yet.
I never really used Codex (found it to slow) just 5.2, which I going to be an excellent model for my work. This looks like another step up.
This week, I'm all local though, playing with opencode and running qwen3 coder next on my little spark machine. With the way these local models are progressing, I might move all my llm work locally.
reply