Yeah, that is one of my main uses for AI: getting the build stuff and scripts out of the way so that I can focus on the application code. That and brainstorming.
In both cases, it works because I can mostly detect when the output is bullshit. I'm just a little bit scared, though, that it will stop working if I rely too much on it, because I might lose the brain muscles I need to detect said bullshit.
Im super interested to know how juniors get a long. i have dealt with build systems for decades and half the time its just use google or stackoverflow to get past something quickly, or manually troubleshoot deps. now i automate that entirely. and for code, i know what good or not, i check its output and hve it redo anything t5hat doesnt pass my known stndards. It makes using it so much easier. the article is so on point
> have it learn your conventions, pull in best practices
What do you mean by "have it learn your conventions"? Is there a way to somehow automatically extract your conventions and store it within CLAUDE.md?
> For example, we have a custom UI library, and Claude Code has a skill that explains exactly how to use it. Same for how we write Storybooks, how we structure APIs, and basically how we want everything done in our repo. So when it generates code, it already matches our patterns and standards out of the box.
Did you have to develop these skills yourself? How much work was that? Do you have public examples somewhere?
> What do you mean by "have it learn your conventions"?
I'll give you an example: I use ruff to format my python code, which has an opinionated way of formatting certain things. After an initial formatting, Opus 4.5, without prompting, will write code in this same style so that the ruff formatter almost never has anything to do on new commits. Sonnet 4.5 is actually pretty good at this too.
Isn't this a meaningless example? Formatters already exist. Generating code that doesn't need to be formatted is exactly the same as generating code and then formatting it.
I care about the norms in my codebase that can't be automatically enforced by machine. How is state managed? How are end-to-end tests written to minimize change detectors? When is it appropriate to log something?
We have some tests in "GIVEN WHEN THEN" style, and others in other styles. Opus will try to match each style of testing by the project it is in by reading adjacent tests.
The one caveat with this, is that in messy code bases it will perpetuate bad things, unless you're specific about what you want. Then again, human developers will often do the same and are much harder to force to follow new conventions.
But I think it should be doable. You can tell it how YOU want the state to be managed and then have it write a custom "linter" that makes the check deterministic. I haven't tried this myself, but claude did create some custom clippy scripts in rust when I wanted to enforce something that isn't automatically enforced by anything out there.
Lints are typically well suited for syntactic properties or some local semantic properties. Almost all interesting challenges in software design and evolution involve nonlocal semantic properties.
Starting to use Opus 4.5 I'm reduces instrutions in claude.md and just ask claude to look in the codebase to understand the patterns already in use. Going from prompts/docs to instead having code being the "truth". Show don't tell. I've found this patterns has made a huge leap with Opus 4.5.
"Model your application's behavior first, as data, and derive everything else automatically. Ash resources center around actions that represent domain logic."
I feel like I've been doing this since Sonnet 3.5 or Sonnet 4. I'll clone projects/modules/whatever into the working directory and tell claude to check it out. Voila, now it knows your standards and conventions.
When I ask Claude to do something, it independently, without me even asking or instructing it to, searches the codebase to understand what the convention is.
I’ve even found it searching node_modules to find the API of non-public libraries.
If they're using Opus then it'll be the $100/month Claude Max 5x plan (could be the more expensive 20x plan depending on how intensive their use is). It does consume a lot of tokens, but I've been using the $100/mo plan and get a lot done without hitting limits. It helps to be mindful of context (regularly amending/pruning your CLAUDE.md instructions, clearing context between tasks, sizing your tasks to stay within the Opus context window). Claude Code plans have token limits that work in 5-hour blocks (that start when you send your first token, so it's often useful to prime it as early in the morning as possible).
Claude Code will spawn sub-agents (that often use their cheap Haiki model) for exploration and planning tasks, with only the results imported into the main context.
I've found the best results from a more interactive collaboration with Claude Code. As long as you describe the problem clearly, it does a good job on small/moderate tasks. I generally set two instances of Claude Code separate tasks and run them concurrently (the interaction with Claude Code distracts me too much to do my own independent coding simultaneously like with setting a task for a colleague, but I do work on architecture / planning tasks)
The one manner of taste that I have had to compromise on is the sheer amount of code - it likes to write a lot of code. I have a better experience if I sweat the low-level code less, and just periodically have it clean up areas where I think it's written too much / too repetitive code.
As you give it more freedom it's more prone to failure (and can often get itself stuck in a fruitless spiral) - however as you use it more you get a sense of what it can do independently and what's likely to choke on. A codebase with good human-designed unit & playwright tests is very good.
Crucially, you get the best results where your tasks are complex but on the menial side of the spectrum - it can pay attention to a lot of details, but on the whole don't expect it to do great on senior-level tasks.
To give you an idea, in a little over a month "npx ccusage" shows that via my Claude Code 5x sub I've used 5M input tokens, 1.5M output, 121M Cache Create, 1.7B Cache Read. Estimated pay-as-you-go API cost equivalent is $1500 (N.B. for the tail end of December they doubled everybody's API limits, so I was using a lot more tokens on more experimental on-the-fly tool construction work)
FYI Opus is available and pretty usable in claude-code on the $20/Mo plan if you are at all judicious.
I exclusively use opus for architecture / speccing, and then mostly Sonnet and occasionally Haiku to write the code. If my usage has been light and the code isn't too straightforward, I'll have Opus write code as well.
The problem with current approaches is the lack of feedback loops with independent validators that never lose track of the acceptance criteria. That's the next level that will truly allow no-babysitting implementatons that are feature complete and production grade. Check out this repo that offers that: https://github.com/covibes/zeroshot/
That's helpful to know, thanks! I gave Max 5x a go and didn't look back. My suspicion is that Opus 4.5 is subsidised, so good to know there's flexibility if prices go up.
The $20 plan for CC is good enough for 10-20 minutes of opus every 5h and you’ll be out of your weekly limit after 4-5 days if you sleep during the night. I wouldn’t be surprised if Anthropic actually makes a profit here. (Yeah probably not, but they aren’t burning cash.)
I use the $200/month Claude Code plan, and in the last week I've had it generate about half a million words of documentation without hitting any session limits.
I have hit the weekly limit before, briefly, but that took running multiple sessions in parallel continuously for many days.
/init in Claude Code already automatically extracts a bunch, but for something more comprehensive, just tell it which additional types of things you want it to look for and document.
> Did you have to develop these skills yourself? How much work was that? Do you have public examples somewhere?
I don't know about the person above, but I tell Claude to write all my skills and agents for me. With some caveats, you can do this iteratively in a single session ("update the X agent, then re-run it. Repeat until it reliably does Y")
"Claude, clone this repo https://github.com/repo, review the coding conventions, check out any markdown or readme files. This is an example of coding conventions we want to use on this project"
The answer to that would very much be: "it depends".
Yes, of course, network I/O > local I/O > most things you'll do on your CPU. But regardless, the answer is always to measure performance (through benchmarking or telemetry), find your bottlenecks, then act upon them.
I recall a case in Firefox in which we were bitten by a O(n^2) algorithm running at startup, where n was the number of tabs to restore, another in which several threads were fighting each other to load components of Firefox and ended up hammering the I/O subsystem, but also cases of executable too large, data not fitting in the CPU cache, Windows requiring a disk access to normalize paths, etc.
> Every report and available evidence shows he is barely technologically astute, nevermind genius; the accomplishments of his teams are despite him not because of him.
In particular, nothing that comes out of his mouth regarding AI makes any sense.
And still, people listen to him as if he was an expert. Go figure.
His latest bullshit was about Tesla cameras and fog/rain/snow - on an investor call, no less - "Oh, we do photon counting directly from the sensor, so it's a non-issue".
No. 1, Tesla cameras are not capable of that - you need a special sensor, that's not useful for any real visual representation. And 2, even if you did, photon counting requires a closed "box" so to speak - you can't count photons in "open air".
I just don’t get it? Do people hang off his every word just because he’s rich? What are they expecting for this worship… it’s not like he’s going to start throwing $100 bills to people because they agree with him on Twitter
Seen from the other side of the Atlantic, I've regularly felt that the US is rather prone to hero worship, see e.g. the passion dedicated to presidential candidates, former presidents, billionaires, but also how the main characters of pretty much all American biopics I recall can't ever be wrong.
If my observation is correct, I guess what we're witnessing with Musk could be a case of hero worship – and in any narrative in which Musk is a hero, he's of course right.
Can confirm at least for Firefox. When I worked on it, I've spent literal years shaving seconds from startup, or shutdown, or milliseconds from tab switching.
Everybody likes to hate Telemetry, and yes, it can be abused, but that's how Mozilla (and its competitors) manage to make user's life more comfortable.
As a variant, I recently stumbled upon a post that basically sums up to "people who disagree with me on AI are clearly blinded by their prejudice, it's so sad."
Your argument is dumb because it's objectively better to optimize x conditioned on y than optimize y conditioned on x.
Maybe the worst variant of this is where people don't realize they're actually arguing for different things but because it's the same general topic they assume everything is the same (duals are common). I feel like this describes many political arguments and it feels in part intentional...
But still, a good gag gift takes effort. It's not like you walk into a random store and pick the first thing you see.
The whole aspect of stealing gifts demonstrates this. It'd be pointless if the gifts were all low grade garbage. They'd be effectively fungible. Yet the theft part it is critical to making white elephant fun. Regardless if you're doing gag gifts or good gifts.
A white elephant is a gift that you cannot refuse, cannot regift, and is so expensive/complicated to take care of that it will become your primary concern for the rest of your life.
Well, yes, but it also means a gag gift; I'd hazard a guess that >99% of uses of the term in the past several decades have been of the "gag gift" persuasion. There are many white elephant parties thrown by people who care little for history.
Even then, intentionally ruining someone's financial life requires more care and attention than telling an AI agent to perform random acts of kindness (so far).
> Well, yes, but it also means a gag gift; I'd hazard a guess that >99% of uses of the term in the past several decades have been of the "gag gift" persuasion. There are many white elephant parties thrown by people who care little for history.
Is this an Americanism? I've never heard "white elephant" used with such a meaning.
> Even then, intentionally ruining someone's financial life requires more care and attention than telling an AI agent to perform random acts of kindness (so far).
In both cases, it works because I can mostly detect when the output is bullshit. I'm just a little bit scared, though, that it will stop working if I rely too much on it, because I might lose the brain muscles I need to detect said bullshit.
reply