This is it for me. If you ask these models to write something new, the result ca...

unshavedyak · 2025-06-24T04:57:53 1750741073

> because they never delete code. Never.

That's not true in my experience. Several times now i've given Claude Code a too-challenging task and after trying repeatedly it eventually gave up, removing all the previous work on that subject and choosing an easier solution instead.

.. unfortunately that was not at all what i wanted lol. I had told it "implement X feature with Y library", ie specifically the implementation i wanted to make progress towards, and then after a while it just decided that was difficult and to do it differently.

soulofmischief · 2025-06-24T00:51:37 1750726297

You'd be surprised what a combination of structured review passes and agent rules (even simple ones such as "please consider whether old code can be phased out") might do to your agentic workflow.

> Show me a language model that can turn rube goldberg code into good readable code, and I'll suddenly become very interested in them.

They can already do this. If you have any specific code examples in mind, I can experiment for you and return my conclusions if it means you'll earnestly try out a modern agentic workflow.

jcalvinowens · 2025-06-24T01:20:05 1750728005

> You'd be surprised

I doubt it. I've experimented with most of them extensively, and worked with people who use them. The atrocious results speak for themselves.

> They can already do this. If you have any specific code examples in mind

Sure. The bluetooth drivers in the Linux kernel contain an enormous amount of shoddy duplicated code that has amalgamated over the past decade with little oversight: https://code.wbinvd.org/cgit/linux/tree/drivers/bluetooth

An LLM which was capable of refactoring all the duplicated logic into the common core and restructuring all the drivers to be simpler would be very very useful for me. It ought to be able to remove a few thousand lines of code there.

It needs to do it iteratively, in a sting of small patches that I can review and prove to myself are correct. If it spits out a giant single patch, that's worse than nothing, because I do systems work that actually has to be 100% correct, and I can't trust it.

Show me what you can make it do :)