> value would be in analyzing those rich traces with another agent to extract (failure) patterns and learnings
Claude Code supports hooks. This allows me to run an agent skill at the end of every agent execution to automatically determine if there were any lessons worth learning from the last session. If there were. new agent skills are automatically created or existing ones automatically updated as apporpriate.
Another anecdote. Nobody in my extended family has more than 3 kids. My grandmothers from both sides had more. But the trend is pretty clear. Fewer kids for the modern generation. Regardless of the level of education and income. In fact, the lower education/income ones in my extended family have fewer kids.
> an amazing future of perfect code from agentic whatevers will come to fruition...
Nobody credible is promising you a perfect future. But, a better future, yes! If you do not see it, then know this. You have your head firmly planted in the sand and are intentionally refusing to see what is coming. You may not like it. You may not want it. But it is coming and you will either have to adapt or become irrelevant.
Does Copilot spit out useless PR comments. 100% yes! Are there tools that are better than Copilot? 100% yes! These tools are not perfect. But even with their imperfections, they are very useful. You have to learn to harness them for their strengths and build processes to address their weaknesses. And yes, all of this requires learning and experimentation. Without that, you will not get good results and you will complain about these tools not being good.
> I am still waiting to see where it increases productivity...
If you are a software engineer, and you are not using using AI to help with software development, then you are missing out. Like many other technologies, using AI agents for software dev work takes time to learn and master. You are not likely to get good results if you try it half-heartedly as a skeptic.
And no, nobody can teach you these skills in a comment in an online forum. This requires trial and error on your part. If well known devs like Linus Torvalds are saying there is value here, and you are not seeing it, then then the issue is not with the tool.
“Of all the points the other side makes, this one seems the most incoherent. Code is deterministic, AI isn’t. We don’t have to look at assembly, because a compiler produces the same result every time.”
This is a valid argument. However, if you create test harnesses using multiple LLMs validating each other’s work, you can get very close to compiler-like deterministic behavior today. And this process will improve over time.
It helps, but it doesn't make it deterministic. LLMs could all be misled together. A different story would be if we had deterministic models, where the exact same input always results in the exact same output. I'm not sure why we don't try this tbh.
Are you sure that it’s T=0. My comment’s first draft said “it can’t just be setting temp to zero can it?” But I felt like T is not enough. Try running the same prompt in new sessions with T=0, like “write a poem”. Will it produce the same poem each time? (I’m not where I can try it currently).
That’s the opposite of clean-room. The whole point of clean-room design is that you have your software written by people who have not looked into the competing, existing implementation, to prevent any claim of plagiarism.
“Typically, a clean-room design is done by having someone examine the system to be reimplemented and having this person write a specification. This specification is then reviewed by a lawyer to ensure that no copyrighted material is included. The specification is then implemented by a team with no connection to the original examiners.”
No they don't. One team meticulously documents and specs out what the original code does, and then a completely independent team, who has never seen the original source code, implements it.
What they don't do is read the product they're clean-rooming. That's kinda disqualifying. Impossible to know if the GCC source is in 4.6's training set but it would be kinda weird if it wasn't.
True, but the human isn't allowed to bring 1TB of compressed data pertaining to what they are "redesigning from scratch/memory" into the clean room.
In fact the idea of a "clean room" implementation is that all you have to go on is the interface spec of what you are trying to build a clean (non-copyright violating) version of - e.g. IBM PC BIOS API interface.
You can't have previously read the IBM PC BIOS source code, then claim to have created a "clean room" clone!
If that's what clean room means to you, I do know AI can definitely replace you.
As even ChatGPT is better than that.
(prompt: what does a clean room implementation mean?)
From ChatGPT without login BTW!
> A clean room implementation is a way of building something (usually software) without copying or being influenced by the original implementation, so you avoid copyright or IP issues.
> The core idea is separation.
> Here’s how it usually works:
> The basic setup
> Two teams (or two roles):
> Specification team (the “dirty room”)
> Looks at the original product, code, or behavior
> Documents what it does, not how it does it
> Produces specs, interfaces, test cases, and behavior descriptions
> Implementation team (the “clean room”)
> Never sees the original code
> Only reads the specs
> Writes a brand-new implementation from scratch
> Because the clean team never touches the original code, their work is considered independently created, even if the behavior matches.
I'm not the commentor, but you could get different results from the same curl command depending on what the server wants to give you at the time. The bash script can make additional curl calls or set up jobs that occur at other times.
I'm sure both of you understand this. I'm guessing it's just semantics.
Right. My point is that you only run it once, so there's only that one chance for a compromise. If you got lucky and talked to the right server and it gave you a good script, which is overwhelmingly probable most of the time, you're in the clear. That doesn't mean it's wise, but the danger is limited. Whereas with these agents, every piece of data they're exposed to is potentially interpreted as instructions.
> I was an on-prem maxi (if thats a thing) for a long time. I've run clusters that costed more than $5M, but these days I am a changed man.
I have had a similar transformation. I still host non-critical services on-prem. They are exceptionally cheap to run. Everything else, I host it on Hetzner.
> Managing the PostgreSQL databases is a medium to low complexity task as I see it.
Same here. But, I assume you have managed PostgreSQL in the past. I have. There are a large number of people software devs who have not. For them, it is not a low complexity task. And I can understand that.
I am a software dev for our small org and I run the servers and services we need. I use ansible and terraform to automate as much as I can. And recently I have added LLMs to the mix. If something goes wrong, I ask Claude to use the ansible and terraform skills that I created for it, to find out what is going on. It is surprisingly good at this. Similarly I use LLMs to create new services or change configuration on existing ones. I review the changes before they are applied, but this process greatly simplifies service management.
> Same here. But, I assume you have managed PostgreSQL in the past. I have. There are a large number of people software devs who have not. For them, it is not a low complexity task. And I can understand that.
I'd say needing to read the documentation for the first time is what bumps it up from low complexity to medium. And then at medium you should still do it if there's a significant cost difference.
You can ask an AI or StackOverflow or whatever for the correct way to start a standby replica, though I think the PostgreSQL documentation is very good.
But if you were in my team I'd expect you to have read at least some of the documentation for any service you provision (self-hosted or cloud) and be able to explain how it is configured, and to document any caveats, surprises or special concerns and where our setup differs / will differ from the documented default. That could be comments in a provisioning file, or in the internal wiki.
That probably increases our baseline complexity since "I pressed a button on AWS YOLO" isn't accepted. I think it increases our reliability and reduces our overall complexity by avoiding a proliferation of services.
I hope this is the correct service, "Amazon RDS for PostgreSQL"? [1]
The main pair of PostgreSQL servers we have at work each have two 32-core (64-vthread) CPUs, so I think that's 128 vCPU each in AWS terms. They also have 768GiB RAM. This is more than we need, and you'll see why at the end, but I'll be generous and leave this as the default the calculator suggests, which is db.m5.12xlarge with 48 vCPU and 192GiB RAM.
That would cost $6559/month, or less if reserved which I assume makes sense in this case — $106400 for 3 years.
Each server has 2TB of RAID disk, of which currently 1TB is used for database data.
That is an additional $245/month.
"CloudWatch Database Insights" looks to be more detailed than the monitoring tool we have, so I will exclude that ($438/month) and exclude the auto-failover proxy as ours is a manual failover.
With the 3-year upfront cost this is $115000, or $192000 for 5 years.
Alternatively, buying two of yesterday's [2] list-price [3] Dell servers which I think are close enough is $40k with five years warranty (including next-business-day replacement parts as necessary).
That leaves $150000 for hosting, which as you can see from [4] won't come anywhere close.
We overprovision the DB server so it has the same CPU and RAM as our processing cluster nodes — that means we can swap things around in some emergency as we can easily handle one fewer cluster node, though this has never been necessary.
When the servers are out of warranty, depending on your business, you may be able to continue using them for a non-prod environment. Significant failures are still very unusual, but minor failures (HDDs) are more common and something we need to know how to handle anyway.
For what it's worth, I have also managed my own databases, but that's exactly why I don't think it's a good use of my time. Because it does take time! And managed database options are abundant, inexpensive, and perform well. So I just don't really see the appeal of putting time into this.
If you have a database, you still have work to do - optimizing, understanding indexes, etc. Managed services don't solve these problems for you magically and once you do them, just running the db itself isn't such a big deal and it's probably easier to tune for what you want to do.
Absolutely yes. But you have to do this either way. So it's just purely additive work to run the infrastructure as well.
I think if it were true that the tuning is easier if you run the infrastructure yourself, then this would be a good point. But in my experience, this isn't the case for a couple reasons. First of all, the majority of tuning wins (indexes, etc.) are not on the infrastructure side, so it's not a big win to run it yourself. But then also, the professionals working at a managed DB vendor are better at doing the kind of tuning that is useful on the infra side.
Maybe, but you're paying through the nose continually for something you could learn to do once - or someone on your team could easily learn with a little time and practice. Like, if this is a tradeoff you want to make, it's fine, but at some point learning that 10% more can halve your hosting costs so it's well worth it.
It's not the learning, it's the ongoing commitment of time and energy into something that is not a differentiator for the business (unless it is actually a database hosting business).
I can see how the cost savings could justify that, but I think it makes sense to bias toward avoiding investing in things that are not related to the core competency of the business.
If they could top it off by stealing a janitor from OpenAI or Anthropic, the VCs might wet their pants with excitement.
reply