More

bpavuk · 2026-01-09T10:54:44 1767956084

strongly agree! can't see any use of LLMs beyond tabcomplete and navigating an unknown codebase.

ndr · 2026-01-09T11:09:50 1767956990

Assuming that wasn't trolling, what's the last thing you tried and when? Latest Claude Code can do a lot over lots of domains. I recommend giving their top plan a fair chance for a month.

Also most people who I see succeed from the start are technical managers. That is people who can code, who are used to delegate (good at phrasing) and are more likely to accept something that works even though is not character by character what they expected.

mgaunard · 2026-01-09T12:16:31 1767960991

As a technical manager the reason you accept things that are not quite what you had in mind is because you delegate the responsibility of the correctness to the employee who is gonna be owning that area, maintaining it and ensuring it works.

There is no responsibility with AI. It doesn't care about getting it right or making sure the code delivers on what the company needs. It has nothing to prove, no incentive to supporting anything.

hexo · 2026-01-09T13:10:58 1767964258

Why would we use something that makes us dumber and wastes our planet more than anything else.

Nah, I'll skip.

bpavuk · 2026-01-10T08:51:51 1768035111

> Assuming that wasn't trolling, what's the last thing you tried and when?

vibe-coding the GTK-rs app with Codex, delegating ready Figma designs to Junie to implement in Jetpack Compose, improving a11y and perf on my personal website built in Svelte with Claude. (a friend threw $100 into that slot machine and shared access to Claude Code with me.)

all in December 2025. all failed *horrendously*. Claude Sonnet 4.5 actually worsened the Lighthouse performance score on mobile from 95 down to 40!! though, it did a decent job on a11y.

Codex CLI constantly used older versions of all sorts of crates, spat out deprecated code, outright refused to use Blueprints and Context7, but some time after, I could get a mockup of Superhuman-style email client.

and Junie... well, it did best. it had Gemini 3 Flash picked as a model. despite Material 3 Expressive being beyond Gemini's knowledge cutoff, it actually used Context7 to fill the gap. though, it fumbled the bag with details, performance, and overabstractioning.

> I recommend giving their top plan a fair chance for a month.

as I said, my friend already did. we both find it the worst spend of the decade. seriously, in yesteryear's Call of Duty we would at least have fun. being a purchase worse than Call of Duty is a floor so low, you have to break it deliberately.

before you say "try again," do your research around food prices and salaries in Ukraine. (or delegate it to Claude Deep Research feature, you may love it.) to feed yourself well here, you have to spend measly by Western standards $300-600/mo. (that's without housing and utilities, presuming you live alone.) though, salaries outside dev jobs are just as measly, and then you have the overall job crisis in Ukraine overlaid on top of the worldwide IT job crisis.

4-8 months of Claude Max amount to the rumored price of Steam Machine. I would rather spend $800 to see the joy on my younger sister's face and familiarize her with true games, beyond what's on Google Play.

> That is people who can code, who are used to delegate (good at phrasing) and are more likely to accept something that works even though is not character by character what they expected.

I would rather spend that time nursing a real open-source contributor or a junior colleague. I would have a real person with real willpower and real ideas in my network, and that would spread the joy of code.

bpavuk · 2026-01-09T10:36:05 1767954965

this map misses that Palantir tech is being used in Ukraine. [0][1]

is there a way to contribute to the map?

[0]: https://united24media.com/war-in-ukraine/palantir-the-secret...

[1]: https://www.surveillancewatch.io/entities?entity=palantir

EDIT: ah, I found the "Submit" button! nice.

bpavuk · 2026-01-08T18:57:04 1767898624

there are enough Ukrainian sources that did not retract the claim. one of them might point to the original source outside Guardian, but I'm too lazy to search ¯ \ _ ( ツ ) _ / ¯

you might start with Mezha, Channel 24, and TSN. arm yourself with a translator.

zamadatix · 2026-01-08T19:31:43 1767900703

It was the original author who issued the retraction, The Guardian just had enough credibility to follow up on that. That other news organizations lacked retractions does not make the original reporting of the author's claim any less retracted. If there are reports showing the retraction was bogus and there was separate proof contradicting the original author's retraction that would be something else of course, but you can't just say "I swear you'll find them, just keep looking harder!" or anyone could just make any claim up they wanted.

Thanks to mcintyre1994 for noting the link in the retraction does actually go into the details of why the author retracted the claim https://www.theguardian.com/books/2023/sep/12/elon-musk-biog...

bpavuk · 2026-01-07T15:39:33 1767800373

a brittle MCP that connects a brittle (unless using Opus 4.5) CLI to a brittle browser? (see: Scamlexity, an actual vulnerability name)

I trust Claude in Chrome a lot more, and I trust my own hands and eyes most.

bpavuk · 2026-01-05T23:18:53 1767655133

but not Claude Code. it was released just this summer (I guess?)

bpavuk · 2026-01-05T22:50:41 1767653441

don't forget Aider from 2023

outofpaper · 2026-01-05T22:58:19 1767653899

Still working hard and now we also have Aider-ce.

bpavuk · 2026-01-05T22:42:34 1767652954

read it again. he criticizes the hype built around 2025 as the Year X for agents. many were thinking that "we'll carry PCs in our pockets" when Windows Mobile-powered devices came out. many predicted 2003 as the Year X for what we now call smartphones.

no, it was 2008, with the iPhone launch.

bpavuk · 2026-01-05T22:37:49 1767652669

a stellar piece, Cal, as always. short and straight to the point.

I believe that Codex and the likes took off (in comparison to e.g. "AI" browsers) because the bottleneck there was not reasoning about code, it was about typing and processing walls of text. for a human, the interface of e.g. Google Calendar is ± intuitive. for a LLM, any graphical experience is an absolute hellscape from performance standpoint.

CLI tools, which LLMs love to use, output text and only text, not images, not audio, not videos. LLMs excel at text, hence they are confined to what text can do. yes, multimodal is a thing, but you lose a lot of information and/or context window space + speed.

LLMs are a flawed technology for general, true agents. 99% of the time, outside code, you need eyes and ears. we have only created a self-writing paper yet.

Yizahi · 2026-01-05T23:06:26 1767654386

Codex and the like took off because there existed a "validator" of its work - a collection of pre-existing non-LLM software - compilers, linters, code analyzers etc. And the second factor is very limited and defined grammar of programming languages. Under such constraints it was much easier to build a text generator which will validate itself using external tools in a loop, until generated stream makes sense.

And the other "successful" industry being disrupted is the one where there is no need validate output, because errors are ok or irrelevant. A text not containing much factual data, like fiction or business-lingo or spam. Or pictures, where it doesn't matter which color is a specific pixel, a rough match will do just fine.

But outside of those two options, not many other industries can use at scale an imprecise word or media generator. Circular writing and parsing of business emails with no substance? Sure. Not much else.

zarzavat · 2026-01-06T04:13:19 1767672799

This is the reasoning deficit. Models are very good at generating large quantities of truthy outputs, but are still too stupid to know when they've made a serious mistake. Or, when they are informed about a mistake they sometimes don't "get it" and keep saying "you're absolutely right!" while doing nothing to fix the problem.

It's a matter of degree, not a qualitative difference. Humans have the exact same flaws, but amateur humans grow into expert humans with low error rates (or lose their job and go to work in KFC), whereas LLMs are yet to produce a true expert in anything because their error rates are unacceptably high.

fpaf · 2026-01-05T23:45:43 1767656743

Besides the ability to deal with text, I think there are several reasons why coding is an exceptionally good fit for LLMs.

Once LLMs gained access to tools like compilers, they started being able to iterate on code based on fast, precise and repeatable feedback on what works and what doesn't, be it failed tests or compiler errors. Compare this with tasks like composing a powerpoint deck, where feedback to the LLM (when there is one) is slower and much less precise, and what's "good" is subjective at best.

Another example is how LLMs got very adept at reading and explaining existing code. That is an impressive and very useful ability, but code is one of the most precise ways we, as humans, can express our intent in instructions that can be followed millions of times in a nearly deterministic way (bugs aside). Our code is written in thoroughly documented languages with a very small vocabulary and much easier grammar than human languages. Compare this to taking notes in a zoom call in German and trying to make sense of inside jokes, interruptions and missing context.

But maybe most importantly, a developer must be the friendliest kind of human for an LLM. Breaking down tasks in smaller chunks, carefully managing and curating context to fit in "memory", orchestrating smaller agents with more specialized tasks, creating new protocols for them to talk to each others and to our tools.... if it sounds like programming, it's because it is.

rhubarbtree · 2026-01-06T07:18:46 1767683926

LLMs are good at coding (well, kinda, sometimes) because programmers gave away their work away for free and created vast training data.

Lio · 2026-01-06T09:05:11 1767690311

I don’t think “giving away” has much to do with it.

I mean we did give away code as training data but we also know that AI companies just took pirated books and media too.

So I don’t think gifting has much to do with it.

Next all the Copilot users will be “giving away” all their business processes and secrets to Microsoft to clone.

gosub100 · 2026-01-06T13:04:53 1767704693

I agree with that. For code, most of it was in a "public space" similar to driving down a street and training the model on trees and signs etc. The property is not yours but looking at it doesn't require ownership.

XenophileJKO · 2026-01-06T06:56:10 1767682570

It was not a well thought out piece and it is discounting the agentic progress that has happened.

>The industry had reason to be optimistic that 2025 would prove pivotal. In previous years, AI agents like Claude Code and OpenAI’s Codex had become impressively adept at tackling multi-step computer programming problems.

It is easy to forget that Claude Code CAME OUT in 2025. The models and agents released in 2025 really DID prove how powerful and capable they are. The predictions were not really wrong. I AM using code agents in a literal fire and forget way.

Claude Code is a hugely capable agentic interface for sovling almost any kind of problem or project you want to solve for personal use. I literally use it as the UX for many problems. It is essentially a software that can modify itself on the fly.

Most people haven't really grasped the dramatic paradigm shift this creates. I haven't come up with a great analogy for it yet, but the term that I think best captures how it feels to work with claude code as a primary interface is "intelligence engine".

I'll use an example, I've created several systems harnessed around Claude Code, but the latest one I built is for stock porfolio management (This was primarily because it is a fun problem space and something I know a bit about). Essentially you just used Claude Code to build tools for itself in a domain. Let me show how this played out in this example.

Claude and I brainstorma general flow for the process and roles. Then figure out what data each role would need, research what providers have the data at a reasonable price.

I purchase the API keys and claude wires up tools (in this case python scripts and documentation for the agents for about 140 api endpoints), then builds the agents and also creates an initial vesrion of the "skill" that will invoke the process that looks something like this:

Macro Economist/Strategist -> Fact Checker -> Securities Sourcers -> Analysts (like 4 kinds) -> Fact Checker/Consolidator -> Portfolio Manager

Obviously it isn't 100% great on the first pass and I have to lean on expertise I have in building LLM applications, but now I have a Claude Code instance that can orchestrate this whole research process and also handle ad-hoc changes on the fly.

Now I have evolved this system through about 5 significant iterations, but I can do it "in the app". If I don't like how part of it is working, I just have the main agent rewire stuff on the fly. This is a completely new way of working on problems.

bpavuk · 2026-01-05T17:30:22 1767634222

there is a very similar app with much bigger history and (obviously) greater reputation: BuzzKill. [0] it's paid, available on Google Play, has tons of features and then some.

also, I bet that Android platform forbids you from requesting the internet permission if you use some "dangerous" permissions, e.g. reading notifications.

EDIT: added link.

[0]: https://play.google.com/store/apps/details?id=com.samruston....

bpavuk · 2025-12-21T09:17:46 1766308666

let me be a date-time nerd for a split second:

- Claude Code released Introducing Claude Code video on 24 Feb 2025 [0]

- Aider's oldest known GitHub release, v0.5.0, is dated 8 Jun 2025 [1]

[0]: https://www.youtube.com/watch?v=AJpK3YTTKZ4

[1]: https://github.com/Aider-AI/aider/releases/tag/v0.5.0

jeeeb · 2025-12-21T09:43:50 1766310230

That’s 8th of June 2023 not 2025.. almost 2 years before Claude Code was released.

I remember evaluating Aider and Cursor side by side before Claude Code existed.

social_quotient · 2025-12-21T12:26:13 1766319973

Hey your dates are wildly wrong... It’s important people know aider is 2023. 2 years before CC

IAmGraydon · 2025-12-21T15:26:07 1766330767

Wrong. So wrong, in fact, that I’m wondering if it’s intentional. Aider was June 2023.

bpavuk · 2025-12-21T18:50:33 1766343033

sorry, editing it out! thanks for pointing out.

EDIT: I was too late to edit it. I have to keep an eye on what I type...