Is this not a bash script, but run through a "maybe it won't work this time" randomizer?
Sometimes I feel like I live on another planet. Wouldn't you at least get Claude to write the bash script, confirm it works how you like, then run that? Why get an LLM to guess how to do it each and every time?
At least they are still manually approving, which the title made sound like something they'd move on from.
We're moving fast from "we do this with AI because it's useful" to "we do this with AI because it's cool despite the fact that it's slower, more expensive, and non-deterministic"
It's like the "we made over our product in order to use technology X because it's cool and modern" that we've seen multiple times with node, go, rust, k8s, blockchain, <enter technology here>.
On a more serious note: I personally only found LLMs really interesting to double check my code if there might be something I overlooked in the general implementation (a different approach or so), never really for it to write code directly for me. I feel like this is just becoming less and less common nowadays and it kinda makes me worry for the quality of code that was already questionable at times...
I know you used the /s but it's quite common that 0 temperature is believed to be deterministic. For others coming across this thread, it's not deterministic, it is simply less likely to return different tokens (it still absolutely will)
I did something similar in my hobby project - the agent was promoted, among the other things, to copy a signature from a build artifact into a json file. It worked fine until it didn’t - one day Claude 4 Sonnet randomly flipped one letter in the signature to something else. It wasn’t the end of the world, I catched the error because I always manually test if the release worked, but it shows that AI tools should not be used as execution engines for CI/CD workflows. It’s slow, inefficient and error prone. Just ask the AI to help you write a proper workflow with code.
Thanks for pointing out something which by some is considered unpopular. Use AI tools all you want but wherever you want deterministic outcome - current generation isn't up to that level.
We must acknowledge, understand and work around a technology's limitations.
What is the deterministic alternative you suggest?
I’m not endorsing this release practice in particular, it scares me. But I have been involved in a lot of automation projects where perfection was the initial goal, and then abandoned because it was obvious that non-automated work was so imperfect. Human error is a fact of life.
If you really have to use an IA, at least use it to generate code once and use that. This way it's deterministic and you get a chance to understand what happens and to debug issues.
Not sure why IA could create something you couldn't however. And at least understanding what happens if part of the bundle.
Did it? I didn’t see a claim that doing this work manually had a zero error rate.
Again, I would probably not do this. But let’s not pretend that non-AI release processes prevent all issues. We’re really talking different kinds of errors, and the ai driven ones tend to be obviously wrong. At least right now.
I tell my teams and stakeholders that releasing software is like exercising a muscle. It more it's automated, and the more frequently we do it, the less of a "thing" it becomes. Releasing should not be reserved for a single senior dev: any dev in the team should be able to release at any time, and the system should be set up with the appropriate guard rails to avoid stupid mistakes. Rollbacks should be trivial and everyone should know how to do it.
Why do your releases involve non negligible manual work?
> We release 1-3 times per week.
If I were a customer, I would be concerned by this statement. Having an amateurish deployment strategy is one thing, but your release cadence implies low confidence and quality issues.
The cadence of releases is not related to quality. There will always be variances depending on the codebase involved and organisational constraints and product velocities. I could easily say that more frequent or less frequent releases are also a cause for concern.
We are talking about a Gen AI startup that has a handful of employees here. They have little excuse not to implement CI/CD, unless they lack confidence in their product's quality.
It means you release whatever you have first thing and then release a bunch of patch releases on top as the QA (outsourced on users, probably) results come back.
Sans security patches, why are your features not sized to roughly a sprint? What do you manage to prepare, build and validate in a day or two? To me release cadence less than every few weeks screams "whatever landed in master is good to go" and is a sign of mis-/un-managed development.
Yeah, could be user feedback. And if it's a public beta release why not?
3 devs working for 5 days on each their feature means 3 releases per week.
For the last 10+ years I have been working on the same project with 2 releases per year, so what do I know. But I have used projects with quick release cycles that work very closely with the community. Push new beta, feedback on discord. Was also fine from my (limited) perspective.
Because they should be spending one day writing scripts and github actions for their CI/CD system before pushing out new code by hand or AI assistance several times a week.
Releasing 1 - 3 times a week means it's 1 - 3 times more important to have a deterministic release process than if you release 1 time a week.
Why would a startup CTO choose to deploy a couple of times a week with a Claude script instead of implementing basic CI/CD? He makes it sound like he's running a clusterfuck and he isn't qualified for the job.
Aren't you assuming a bit much here? They have implemented CI/CD; their Claude script triggers the git branch part of the release flow, which is what in turn triggers the CI/CD pipeline—a completely normal thing to do, even in established teams (save for the Claude part).
The guy automated the toil away using AI. Not that I would feel confident automating away that part in particular, but it does neither speak badly of the code quality, nor his job qualifications.
Because there is more toil in writing a blog post about how you use AI to do a release, than there is in writing a script that makes the AI and the blog post unnecessary.
But then you wouldn't have something about using AI to blog about.
That’s an entirely different issue: people maintain a blog to improve their hireability, and that entails blogging about things that may not always be brilliant insights. It’s not this person's fault that that’s the state of tech hiring however. Don’t hate the player, hate the game.
Blogging is actually a good way for professionals to earn CEUs to maintain certifications.
Chances are, if you’re reading low-effort blog posts that are consistently in certain knowledge domains, they’re intending to apply for CEUs at renewal time.
Well looks like he had the last laugh on us: GitHub was down, so nobody with CI/CD GitHub Action deployment scripts could make a release unless they used AI or did it by hand!
Is this a psyop to check how HN readers will react to this insanity ?
This could be a simple script. It should be a simple script. Please update us when this fails :)
psyop is working. i am seriously considering quitting my 2 decade coding career that I very much loved before this AI insanity. We have mandate at work that all code must now be generated by AI in first iteration.
I am going to open a Montessori school in India with my wife.
I’ll be honest this sounds like something that could be completely automated without AI. Wouldn’t a simple shell script accomplish this? Merge, push tag, deploy release with tag. I’m really not sure why AI is useful here at all but maybe I’m missing something?
I think the only main "benefit" here is auto-documentation. I have for example been pretty impressed when doing coding with Claude Code of how it creates the git commits.
I've just been testing it out having it create a whole application by itself so I can understand how well it works and what its limiations are with Sonnet 4. So far it is pretty impressive but also limited in context retention/retrieval.
I just spent a day resurrecting an old iOS music app of mine that got kicked out of the app store because I didn't make time to keep up with changes and devices, and some music students kept asking "when is it going to get back on the app store?".
I used Claude Code in "do whatever you need to to modernize this code" mode for a bit and then went about correcting the things it got wrong. Many changes made were mechanical and useful and a few incorrect. It botched up some core code big time with changes that didn't make any sense, and I had to revert and micromanage changes to that. In all, it was a win for a day's work during the weekend.
Approx $30 well spent, but I give my thanks to the OSS community whose code made Claude Code possible and wish I could've split that with them.
The fact that software systems are architected with tons of mini languages means LLMs can be of use to select from the space of possibilities without me having to learn incidental details I don't want to spend time or my neurons on.
Fortunately I wasn't in the market for an AI shopping assistant anyhow, but still thanks for letting me know what to avoid if I ever became interested in one.
[The keyword "PLEASE"] provides two reasons for the program's rejection by the compiler: if "PLEASE" does not appear often enough, the program is considered insufficiently polite, and the error message says this; if it appears too often, the program could be rejected as excessively polite. Although this feature existed in the original INTERCAL compiler, it was undocumented.
Please ask the agent to help write a workflow script (GitHub Actions yaml or makefile or similar) instead of using it as a runner - if you do that the release pipeline changes with each execution. You do not want a non deterministic release pipeline that's mostly correct. You want one that's checked in to version control and always does exactly the same thing, with all logs and artefacts recorded.
By all means use whatever AI agent you have to help set that up.
Speaking of which - I was wondering if there is a catalogue of release processes? Variety, usual steps, checklists?
Asking because development processes are a hot market themselves think of XP, Scrum, agile etc so wondering if there's something documented about release processes.
This is the main problem with AI, we can no longer distinguish bad coders/managers because they use AI heavily and which can say knowledgeable things at times while having no idea on what they are doing.
I now say things that will not make sense to a good coders (like saying the steps out of order), to find out such people.
Eventually you will find the imposters (I have impostor syndrome, gives it a whole new spin), because AI does not replace your knowledge of the underlying concepts.
> ...and which can say knowledgeable things at times while having no idea on what they are doing
This has been a thing forever, before AI.
Lots of "programmers" at BigCorp have hacked the system and they know how to say the right words relevant to BigCorp's tech stack. Knowing how to actually program is entirely optional. (You can just copy-paste boilerplate from BigCorp's monorepo, writing boilerplate will be your day job anyways.)
This is cool and all, but I don't see why this can't be accomplished (and more deterministically) with a bunch of bash/python scripts. I've seen that done and work well in several firms for decades.
Letting AI control software releases might seem risky, but it may not be as problematic as it sounds—especially if you also use AI to handle issue support. :)
Sometimes I feel like I live on another planet. Wouldn't you at least get Claude to write the bash script, confirm it works how you like, then run that? Why get an LLM to guess how to do it each and every time?
At least they are still manually approving, which the title made sound like something they'd move on from.