Javalin was inspired by https://sparkjava.com/ which was inspired by Sinatra (which I think also inspired Express?).
Anyway, libraries like this were only really feasible after Java 8 because of the reliance on lambdas. Having to instantiate anonymous nested classes for every "function" was a total pain before that.
Kinda now that Java has lambdas, but still async in that disn't work as easily as JS, which is important. This is only recently starting to change with Project Loom.
Author here, we don't use generative AI for software development. We've been building since 2018, and our number one goal has always been ensuring our software remains maintainable.
Did you use the 'litmus' test suite? I found it very useful when building Fastmail's (perl) WebDAV file server implementation.
There were also a bunch of fun things with quirks around unicode filename handling which made me sad (that was just a matter of testing against a ton of clients).
As for CalDAV and CardDAV - as others have said, JMAP Calendars/Contacts will make building clients a lot easier eventually... but yeah. My implementation of syncing as a client now is to look for sync-collection and fall back to collecting etags to know which URLs to fetch. Either way, sync-collection ALSO gives a set of URLs and then I multi-get those in batches; meaning both the primary and fallback codepath revert to the multi-get (or even individual GETs).
I've tried that (with Sonnet 4.5 at least, not Opus) and Claude isn't good at code analysis because it's too lazy. It just grepped for a few things and then made the rest of it up.
I think the issue is mostly that it desperately tries to avoid filling its context window, and Anthropic writes system prompts that are so long it's practically already full from the start.
A good harness to read code for you and write a report on it would certainly be interesting.
Those two things aren’t mutually exclusive. It may be worthwhile to at least have Claude (or whatever LLM you favor) to look at the other libraries and compare it to yours. It doesn’t have to write the code, but it could point out areas/features you’re missing.
We know what we're missing (a lot, we didn't implement the full spec). We don't know what weird edge cases the clients/servers will have, and I would bet you decent money a LLM won't either. That's why manual testing and validation is so important to us.
I wouldn’t be so sure about the LLM not helping. The LLM doesn’t need to know about the edge cases itself. Instead, you’d be relying on other client implementations knowing about the edge cases and the LLM finding the info in those code bases. Those other implementations have probably been through similar test cycles, so using an LLM to compare those implementations to yours isn’t a bad option.
> I don't understand why Hacker News is so dismissive about the coming of LLMs
I find LLMs incredibly useful, but if you were following along the last few years the promise was for “exponential progress” with a teaser world destroying super intelligence.
We objectively are not on that path. There is no “coming of LLMs”. We might get some incremental improvement, but we’re very clearly seeing sigmoid progress.
I can’t speak for everyone, but I’m tired of hyperbolic rants that are unquestionably not justified (the nice thing about exponential progress is you don’t need to argue about it)
I'm not sure I understand: we are _objectively on that path_ -- we are increasing exponentially on a number of metrics that may be imperfect but seem to paint a pretty consistent picture. Scaling laws are exponential. METR's time horizon benchmark is exponential. Lots of performance measures are exponential, so why do you say we're objectively not on that path?
> We might get some incremental improvement, but we’re very clearly seeing sigmoid progress.
again, if it is "very clear" can you point to some concrete examples to illustrate what you mean?
> I can’t speak for everyone, but I’m tired of hyperbolic rants that are unquestionably not justified (the nice thing about exponential progress is you don’t need to argue about it)
OK but what specifically do you have an issue with here?
I can’t point at many problems it has meaningfully solved for me. I mean real problems , not tasks that I have to do for my employer. It seems like it just made parts of my existence more miserable, poisoned many of the things I love, and generally made the future feel a lot less certain.
Language model capability at generating text output.
The model progress this year has been a lot of:
- “We added multimodal”
- “We added a lot of non AI tooling” (ie agents)
- “We put more compute into inference” (ie thinking mode)
So yes, there is still rapid progress, but these ^ make it clear, at least to me, that next gen models are significantly harder to build.
Simultaneously we see a distinct narrowing between players (openai, deepseek, mistral, google, anthropic) in their offerings.
Thats usually a signal that the rate of progress is slowing.
Remind me what was so great about gpt 5? How about gpt4 from from gpt 3?
Do you even remember the releases? Yeah. I dont. I had to look it up.
Just another model with more or less the same capabilities.
“Mixed reception”
That is not what exponential progress looks like, by any measure.
The progress this year has been in the tooling around the models, smaller faster models with similar capabilities. Multimodal add ons that no one asked for, because its easier to add image and audio processing than improve text handling.
That may still be on a path to AGI, but it not an exponential path to it.
> Language model capability at generating text output.
That's not a metric, that's a vague non-operationalized concept, that could be operationalized into an infinite number of different metrics. And an improvement that was linear in one of those possible metrics would be exponential in another one (well, actually, one that is was linear in one would also be linear in an infinite number of others, as well as being exponential in an infinite number of others.
That’s why you have to define an actual metric, not simply describe a vague concept of a kind of capacity of interest, before you can meaningfully discuss whether improvement is exponential. Because the answer is necessarily entirely dependent on the specific construction of the metric.
I don’t think the path was ever exponential but your claim here is almost as if the slow down hit an asymptote like wall.
Most of the improvements are intangible. Can we truly say how much more reliable the models are? We barely have quantitative measurements on this so it’s all vibes and feels. We don’t even have a baseline metric for what AGI is and we invalidated the Turing test also based on vibes and feels.
So my argument is that part of the slow down is in itself an hallucination because the improvement is not actually measurable or definable outside of vibes.
I kind of agree in principle but there are a multitude of clever benchmarks that try to measure lots of different aspects like robustness, knowledge, understanding, hallucinations, tool use effectiveness, coding performance, multimodal reasoning and generation, etc etc etc. all of these have lots of limitations but they all paint a pretty compelling picture that compliments the “vibes” which are also important.
> Language model capability at generating text output.
That's not a quantifiable sentence. Unless you put it in numbers, anyone can argue exponential/not.
> next gen models are significantly harder to build.
That's not how we judge capability progress though.
> Remind me what was so great about gpt 5? How about gpt4 from from gpt 3?
> Do you even remember the releases?
At gpt 3 level we could generate some reasonable code blocks / tiny features. (An example shown around at the time was "explain what this function does" for a "fib(n)") At gpt 4, we could build features and tiny apps. At gpt 5, you can often one-shot build whole apps from a vague description. The difference between them is massive for coding capabilities. Sorry, but if you can't remember that massive change... why are you making claims about the progress in capabilities?
> Multimodal add ons that no one asked for
Not only does multimodal input training improve the model overall, it's useful for (for example) feeding back screenshots during development.
Exactly, gpt5 was unimpressive not because of its leap from GPT4 but because of expectations based on the string of releases since GPT4 (especially the reasoning models). The leap from 4->5 was actually massive.
Next gen models are always hard to build, they are by definition pushing the frontier. Every generation of CPU was hard to build but we still had Moores law.
> Simultaneously we see a distinct narrowing between players (openai, deepseek, mistral, google, anthropic) in their offerings.
Thats usually a signal that the rate of progress is slowing.
I agree with you on the fact in the first part but not the second part…why would convergence of performance indicate anything about the absolute performance improvements of frontier models?
> Remind me what was so great about gpt 5? How about gpt4 from from gpt 3?
Do you even remember the releases? Yeah. I dont. I had to look it up.
3 -> 4 -> 5 were extraordinary leaps…not sure how one would be able to say anything else
> Just another model with more or less the same capabilities.
5 is absolutely not a model with more or less the same capabilities as gpt 4, what could you mean by this?
> “Mixed reception”
A mixed reception is an indication of model performance against a backdrop of market expectations, not against gpt 4…
> That is not what exponential progress looks like, by any measure.
Sure it is…exponential is a constant % improvement per year. We’re absolutely in that regime by a lot of measures
> The progress this year has been in the tooling around the models, smaller faster
Effective tool use is not somehow some trivial add on it is a core capability for which we are on an exponential progress curve.
> models with similar capabilities. Multimodal add ons that no one asked for, because its easier to add image and audio processing than improve text handling.
This is definitely a personal feeling of yours, multimodal models are not something no one asked for…they are absolutely essential. Text data is essential and data curation is non trivial and continually improving, we are also hitting the ceiling of internet text data. But yet we use an incredible amount of synthetic data for RL and this continues to grow……you guessed it, exponentially. and multimodal data is incredibly information rich. Adding multi modality lifts all boats and provides core capabilities necessary for open world reasoning and even better text data (e.g. understanding charts and image context for text).
> exponential is a constant % improvement per year
I suppose of you pick a low enough exponent then the exp graph is flat for a long time and you're right, zero progress is “exponential” if you cherry pick your growth rate to be low enough.
Generally though, people understand “exponential growth” as “getting better/bigger faster and faster in an obvious way”
> 3 -> 4 -> 5 were extraordinary leaps…not sure how one would be able to say anything else
They objectively were not.
The metrics and reception to them was very clear and overwhelming.
Youre spitting some meaningless revisionist BS here.
Doesn’t sound like you really seem to be interested in any sort of rational dialogue, metrics were “objectively” not better? What are you talking about of course they were have you even looked at benchmark progression for every benchmark we have?
You don’t understand what an exponential is or apparently what the benchmark numbers even are or possibly even how we actually measure model performance and the very real challenges and nuances involved but yet I’m “spitting some revisionist BS”. You have cited zero sources and are calling measured numbers “revisionist”.
You are also citing reception to models as some sort of indication of their performance, which is yet another confusing part of your reasoning.
I do agree that “metrics were were very clear” it just seems you don’t happen to understand what they are or what they mean.
>following along the last few years the promise was for “exponential progress”
I've been following for many years and the main exponential thing has been the Moore's law like growth in compute. Compute per dollar is probably the best tracking one and has done a steady doubling every couple of years or so
for decades. It's exponential but quite a leisurely exponential.
The recent hype of the last couple of years is more dot com bubble like and going ahead of trend but will quite likely drop back.
I’ve been reading this comment multiple times a week for the last couple years. Constant assertions that we’re starting to hit limits, plateau, etc. But a cursory glance at where we are today vs a year ago, let alone two years ago, makes it wildly obvious that this is bullshit. The pace of improvement of both models and tooling has been breathtaking. I could give a shit whether you think it’s “exponential”, people like you were dismissing all of this years ago, meanwhile I just keep getting more and more productive.
People keep saying stuff like this. That the improvements are so obvious and breathtaking and astronomical and then I go check out the frontier LLMs again and they're maybe a tiny bit better than they were last year but I can't actually be sure bcuz it's hard to tell.
sometimes it seems like people are just living in another timeline.
You might want to be more specific because benchmarks abound and they paint a pretty consistent picture. LMArena "vibes" paint another picture. I don't know what you are doing to "check" the frontier LLMs but whatever you're doing doesn't seem to match more careful measurement...
You don't actually have to take peoples word for it, read epoch.ai developments, look into the benchmark literature, look at ARC-AGI...
That's half the problem though. I can see benchmarks. I can see number go up on some chart or that the AI scores higher on some niche math or programming test, but those results don't seem to actually connect much to meaningful improvements in daily usage of the software when those updates hit the public.
That's where the skepticism comes in, because one side of the discussion is hyping up exponential growth and the other is seeing something that looks more logarithmic instead.
I realize anecdotes aren't as useful as numbers for this kind of analysis, but there's such a wide gap between what people are observing in practice and what the tests and metrics are showing it's hard not to wonder about those numbers.
I would really appreciate it if people could be specific when they say stuff like this because it's so crazy out of line with all measurement efforts. There are an insane amount of serious problems with current LLM / agentic paradigms, but the idea that things have gotten worse since 2023? I mean come on.
Yeah, probably. But no chart actually shows it yet. For now we are firmly in exponential zone of the signoid curve and can't really tell if it's going to end in a year, decade or a century.
Doesn't even matter if the goal is extremely high. Talking about exponential when we clearly see matching energy needs proves there is no way we can maintain that pace without radical (and thus unpredictable) improvements.
My own "feeling" is that it's definitely not exponential but again, doesn't matter if it's unsustainable.
We're very clearly seeing exponential progress - even above trend, on METR, whose slope keeps getting revised to a higher and higher estimate each time. Explain your perspective on the objective evidence against exponential progress?
Because that requires adoption. Devs on hackernews are already the most up to date folks in the industry and even here adoption of LLMs is incredibly slow. And a lot of the adoption that does happen is still with older tech like ChatGPT or Cursor.
Writing the code itself was never the main bottleneck. Designing the bigger solution, figuring out tradeoffs, taking to affected teams, etc. takes as much time as it used to. But still, there's definitely a significant improvement in code production part in many areas.
I think this is an open question still and very interesting. Ilya discussed this on the Dwarkesh podcast. But the capabilities of LLMs is clearly exponential and perhaps super exponential. We went from something that could string together incoherent text in 2022 to general models helping people like Terrance Tao and Scott Aaronson write new research papers. LLMs also beat IMO and the ICPC. We have entered the John Henry era for intellectual tasks...
Very spurious claims, given that there was no effort made to check whether the IMO or ICPC problems were in the training set or not, or to quantify how far problems in the training set were from the contest problems. IMO problems are supposed to be unique, but since it's not at the frontier of math research, there is no guarantee that the same problem, or something very similar, was not solved in some obscure manual.
LLMs from late 2024 were nearly worthless as coding agents, so given they have quadrupled in capability since then (exponential growth, btw), it's not surprising to see a modestly positive impact on SWE work.
Also, I'm noticing you're not explaining yourself :)
I think this is happening by raising the floor for job roles which are largely boilerplate work. If you are on the more skilled side or work in more original/ niche areas, AI doesn't really help too much. I've only been able to use AI effectively for scaling refactors, not really much in feature development. It often just slows me down when I try to use it. I don't see this changing any time soon.
Hey, I'm not the OG commentator, why do I have to explain myself! :)
When Fernando Alonso (best rookie btw) goes from 0-60 in 2.4 seconds in his Aston Martin, is it reasonable to assume he will near the speed of light in 20 seconds?
> Hey, I'm not the OG commentator, why do I have to explain myself! :)
The issue is that you're not acknowledging or replying to people's explanations for _why_ they see this as exponential growth. It's almost as if you skimmed through the meat of the comment and then just re-phrased your original idea.
> When Fernando Alonso (best rookie btw) goes from 0-60 in 2.4 seconds in his Aston Martin, is it reasonable to assume he will near the speed of light in 20 seconds?
This comparison doesn't make sense because we know the limits of cars but we don't yet know the limits of LLMs. It's an open question. Whether or not an F1 engine can make it the speed of light in 20 seconds is not an open question.
It's not in me to somehow disprove claims of exponential growth when there isn't even evidence provided of it.
My point with the F1 comparison is to say that a short period of rapid improvement doesn't imply exponential growth and it's about as weird to expect that as it is for an f1 car to reach the speed of light. It's possible you know, the regulations are changing for next season - if Leclerc sets a new lap record in Australia by .1 ms we can just assume exponential improvements and surely Ferrari will be lapping the rest of the field by the summer right?
There is already evidence provided of it! METR time horizons is going up on an exponential trend. This is literally the most famous AI benchmark and already mentioned in this thread.
How long before introduction of computers lead to increases in average productivity? How long for the internet? Business is just slow to figure out how to use anything for its benefit, but it eventually gets there.
The best example is that even ATM machines didn't reduce bank teller jobs.
Why? Because even the bank teller is doing more than taking and depositing money.
IMO there is an ontological bias that pervades our modern society that confuses the map for the territory and has a highly distorted view of human existence through the lens of engineering.
We don't see anything in this time series, because this time series itself is meaningless nonsense that reflects exactly this special kind of ontological stupidity:
As if the sum of human interaction in an economy is some kind of machine that we just need to engineer better parts for and then sum the outputs.
Any non-careerist, thinking person that studies economics would conclude we don't and will probably not have the tools to properly study this subject in our lifetimes. The high dimensional interaction of biology, entropy and time. We have nothing. The career economist is essentially forced to sing for their supper in a type of time series theater.
Then there is the method acting of pretending to be surprised when some meaningless reductionist aspect of human interaction isn't reflected in the fake time series.
Sir, we're in a modern economy, we don't ever ever look at productivity graphs (this is not to disparage LLMs, just a comment on productivity in general)
Based on quite a few comments recently, it also looks like many have tried LLMs in the past, but haven't seriously revisited either the modern or more expensive models. And I get it. Not everyone wants to keep up to date every month, or burn cash on experiments. But at the same time, people seem to have opinions formed in 2024. (Especially if they talk about just hallucinations and broken code - tell the agent to search for docs and fix stuff) I'd really like to give them Opus 4.5 as an agent to refresh their views. There's lots to complain about, but the world has moved on significantly.
This has been the argument since day one. You just have to try the latest model, that's where you went wrong. For the record I use Claude Code quite a bit and I can't see much meaningful improvements from the last few models. It is a useful tool but it's shortcomings are very obvious.
Just last week Opus 4.5 decided that the way to fix a test was to change the code so that everything else but the test broke.
When people say ”fix stuff” I always wonder if it actually means fix, or just make it look like it works (which is extremely common in software, LLM or not).
Sure, I get an occasional bad result from Opus - then I revert and try again, or ask it for a fix. Even with a couple of restarts, it's going to be faster than me on average. (And that's ignoring the situations where I have to restart myself)
Basically, you're saying it's not perfect. I don't think anyone is claiming otherwise.
The problem is it’s imperfect in very unpredictable ways. Meaning you always need to keep it on a short leash for anything serious, which puts a limit on the productivity boost. And that’s fine, but does this match the level of investment and expectations?
It’s not about being perfect, it’s about not being as great as the marketing, and many proponents, claim.
The issue is that there’s no common definition of ”fixed”. ”Make it run no matter what” is a more apt description in my experience, which works to a point but then becomes very painful.
Nope, I did get a lot of fancy markdown with emojis though so I guess that was a nice tradeoff.
In general, even with access to the entire code base (which is very small), I find the inherent need in the models to satisfy the prompter to be their biggest flaw since it tends to constantly lead down this path. I often have to correct over convoluted SQL too because my problems are simple and the training data seems to favor extremely advanced operations.
The negatives outweigh the positives, if only because the positives are so small. A bunch of coders making their lives easier doesn't really matter, but pupils and students skipping education does. As a meme said: you had better start eating healthy, because your future doctor vibed his way through med school.
Education part is on point and as a CS student that sees many of his colleagues using way too much the AI tools for instant homework solving without even processing the answers much.
The idea of HN being dismissive of impactful technology is as old as HN. And indeed, the crowd often appears stuck in the past with hindsight. That said, HN discussions aren't homogeneous, and as demonstrated by Karpathy in his recent blogpost "Auto-grading decade-old Hacker News", at least some commenters have impressive foresight: https://karpathy.bearblog.dev/auto-grade-hn/
So exactly 10 years ago a lot of people believed that the game Go would not be “conquered” by AI, but after just a few months it was. People will always be skeptical of new things, even people who are in tech, because many hyped things indeed go nowhere… while it may look obvious in hindsight, it’s really hard to predict what will and what won’t be successful. On the LLM front I personally think it’s extremely foolish to still consider LLMs as going nowhere. There’s a lot more evidence today of the usefulness of LLMs than there was of DeepMind being able to beat top human players in Go 10 years ago.
It feels like there are several conversations happening that sound the same but are actually quite different.
One of them is whether or not large models are useful and/or becoming more useful over time. (To me, clearly the answer is yes)
The other is whether or not they live up to the hype. (To me, clearly the answer is no)
There are other skirmishes around capability for novelty, their role in the economy, their impact on human cognition, if/when AGI might happen and the overall impact to the largely tech-oriented community on HN.
It is an over correction because of all the empty promises of LLMs. I use Claude and chatgpt daily at work and am amazed at what they can do and how far they can come.
BUT when I hear my executive team talk and see demos of "Agentforce" and every saas company becoming an AI company promising the world, I have to roll my eyes.
The challenge I have with LLMs is they are great at creating first draft shiny objects and the LLMs themselves over promise. I am handed half baked work created by non technical people that now I have to clean up. And they don't realize how much work it is to take something from a 60% solution to a 100% solution because it was so easy for them to get to the 60%.
Amazing, game changing tools in the right hands but also give people false confidence.
Not that they are not also useful for non-technical people but I have had to spend a ton of time explaining to copywriters on the marketing team that they shouldn't paste their credentials into the chat even if it tells them to and their vibe coded app is a security nightmare.
This seems like the right take. The claims of the imminence of AGI are exhausting and to me appear dissonant with reality. I've tried gemini-cli and Claude Code and while they're both genuinely quite impressive, they absolutely suffer from a kind of prototype syndrome. While I could learn to use these tools effectively for large-scale projects, I still at present feel more comfortable writing such things by hand.
The NVIDIA CEO says people should stop learning to code. Now if LLMs will really end up as reliable as compilers, such that they can write code that's better and faster than I can 99% of the time, then he might be right. As things stand now, that reality seems far-fetched. To claim that they're useless because this reality has not yet been achieved would be silly, but not more silly than claiming programming is a dead art.
It’s not the technology I’m dismissive about. It’s the economics.
25 years ago I was optimistic about the internet, web sites, video streaming, online social systems. All of that. Look at what we have now. It was a fun ride until it all ended up “enshitified”. And it will happen to LLMs, too. Fool me once.
Some developer tools might survive in a useful state on subscriptions. But soon enough the whole A.I. economy will centralise into 2 or 3 major players extracting more and more revenue over time until everyone is sick of them. In fact, this process seems to be happening at a pretty high speed.
Once the users are captured, they’ll orient the ad-spend market around themselves. And then they’ll start taking advantage of the advertisers.
I really hope it doesn’t turn out this way. But it’s hard to be optimistic.
Contrary to the case for the internet, there is a way out, however - if local, open-source LLMs get good. I really hope they do, because enshittification does seem unavoidable if we depend on commercial offerings.
Well the "solution" for that will be the GPU vendors focusing solely on B2B sales because it's more profitable, therefore keeping GPUs out of the hands of average consumers. There's leaks suggesting that nVidia will gradually hike the prices of their 5090 cards from $2000 to $5000 due to RAM price increases ( https://wccftech.com/geforce-rtx-5090-prices-to-soar-to-5000... ). At that point, why even bother with the R&D for newer consumer cards when you know that barely anyone will be able to afford them?
Speaking for myself: because if the hype were to be believed we should have no relational databases when there's MongoDB, no need for dollars when there's cryptocoins, all virtual goods would be exclusively sold as NFTs, and we would be all driving self-driving cars by now.
LLMs are being driven mostly by grifters trying to achieve a monopoly before they run out of cash. Under those conditions I find their promises hard to believe. I'll wait until they either go broke or stop losing money left and right, and whatever is left is probably actually useful.
The way I've been handling the deafening hype is to focus exclusively on what the models that we have right now can do.
You'll note I don't mention AGI or future model releases in my annual roundup at all. The closest I get to that is expressing doubt that the METR chart will continue at the same rate.
If you focus exclusively on what actually works the LLM space is a whole lot more interesting and less frustrating.
> focus exclusively on what the models that we have right now can do
I'm just a casual user, but I've been doing the same and have noticed the sharp improvements of the models we have now vs a year ago. I have OpenAI Business subscription through work, I signed up for Gemini at home after Gemini 3, and I run local models on my GPU.
I just ask them various questions where I know the answer well, or I can easily verify. Rewrite some code, factual stuff etc. I compare and contrast by asking the same question to different models.
AGI? Hell no. Very useful for some things? Hell yes.
Many people feel threatened by the rapid advancements in LLMs, fearing that their skills may become obsolete, and in turn act irrationally. To navigate this change effectively, we must keep open minds, keep adaptable, and embrace continuous learning.
I'm not threatened by LLMs taking my job as much as they are taking away my sanity. Every time I tell someone no and they come back to me with a "but copilot said.." it's followed by something entirely incorrect it makes me want to autodefenestrate.
Many comments discussing LLMs involve emotions, sure. :) Including, obviously, comments in favour of LLMs.
But most discussion I see is vague and without specificity and without nuance.
Recognising the shortcomings of LLMs makes comments praising LLMs that much more believable; and recognising the benefits of LLMs makes comments criticising LLMs more believable.
I'd completely believe anyone who says they've found the LLM very helpful at greenfield frontend tasks, and I'd believe someone who found the LLM unable to carry out subtle refactors on an old codebase in a language that's not Python or JavaScript.
it isn't irrational to act in self-interest. If LLM threatens someone's livelihood, it matters not that it helps humanity overall one bit - they will oppose it. I don't blame them. But i also hope that they cannot succeed in opposing it.
It's irrational to genuinely hold false beliefs about capabilities of LLMs. But at this point I assume around half of the skeptics are emotionally motivated anyway.
> I don't understand why Hacker News is so dismissive about the coming of LLMs.
Eh. I wouldn’t be so quick to speak for the entirety of HN. Several articles related to LLMs easily hit the front page every single day, so clearly there are plenty of HN users upvoting them.
I think you're just reading too much into what is more likely classic HN cynicism and/or fatigue.
Exactly. There was a stretch of 6 months or so right after ChatGPT was released where approximately 50% of front page posts at any given time were related to LLMs. And these days every other Show HN is some kind of agentic dev tool and Anthropic/OpenAI announcements routinely get 500+ comments in a matter of hours.
When an "AI skeptic" sees a very positive AI comment, they try to argue that it is indeed interesting but nowhere near close to AI/AGI/ASI or whatever the hype at the moment uses.
When an "AI optimistic" sees a very negative AI comment, they try to list all the amazing things they have done that they were convinced was until then impossible.
The internet and smartphones were immediately useful in a million different ways for almost every person. AI is not even close to that level. Very to somewhat useful in some fields (like programming) but the average person will easily be able to go through their day without using AI.
The most wide-appeal possibility is people loving 100%-AI-slop entertainment like that AI Instagram Reels product. Maybe I'm just too disconnected with normies but I don't see this taking off. Fun as a novelty like those Ring cam vids but I would never spend all day watching AI generated media.
> The internet and smartphones were immediately useful in a million different ways for almost every person. AI is not even close to that level.
Those are some very rosy glasses you've got on there. The nascent Internet took forever to catch on. It was for weird nerds at universities and it'll never catch on, but here we are.
The early internet and smartphones (the Japanese ones, not iPhone) were definitely not "immediately" adopted by the mass, unlike LLM.
If "immediate" usefulness is the metric we measure, then the internet and smartphones are pretty insignificant inventions compared to LLM.
(of course it's not a meaningful metric, as there is no clear line between a dumb phone and a smart phone, or a moderately sized language model and a LLM)
In common with AI there was probably a long period when the hardware wasn't really good enough for it to be useful to most people. I remember 300 baud modems and rubber things to try to connect to your telephone handset back in the 80s.
Thats all irrelevant. Is/was there tremendous value to be had by being able to transport data? Of course. No doubt about it. Everything else got figured out and investments were made because of that.
The same line of thinking does not hold with LLMs given their non-deterministic nature. Time will tell where things land.
How many pay? And out of that how many are willing to pay the amount to at least cover the inference costs (not loss leading?)
Outside the verifiable domains I think the impact is more assistance/augmentation than outright disruption (i.e. a novelty which is still nice). A little tiny bit of value sprinkled over a very large user base but each person deriving little value overall.
Even as they use it as search it is at best an incrementable improvement on what they used to do - not life changing.
Usage plunges on the weekends and during the summer, suggesting that a significant portion of users are students using ChatGPT for free or at heavily subsidized rates to do homework (i.e., extremely basic work that is extraordinarily well-represented in the training data). That usage will almost certainly never be monetizable, and it suggests nothing about the trajectory of the technology’s capability or popularity. I suspect ChatGPT, in particular, will see its usage slip considerably as the education system (hopefully) adapts.
Interesting, thank you for that. I’d be curious to see the data for 2025. I was basing my take off Google trends data - the kind of person who goes to ChatGPT by googling “chatGPT” seems to be using it less in the summer.
“Almost everyone will use it at free or effectively subsidized prices” and “It delivers utility which justifies its variable costs + fixed costs amortized over useful lifetime” are not the same thing, and its not clear how much of the use is tied to novelty such that if new and progressively more expensive to train releases at a regular cadence dropped off, usage, even at subsidized prices, would, too.
The adoption is just so weird to me. I cannot for the life of me get LLM chatbot to work for me. Every time I try I get into an argument with the stupid thing. They are still wrong constantly, and when I'm wrong they won't correct me.
I have great faith in AI in e.g. medical equipment, or otherwise as something built in, working on a single problem in the background, but the chat interface is terrible.
Kagi’s Research Assistant is pretty damn useful, particularly when I can have it poll different models. I remember when the first iPhone lacked copy-paste. This feels similar.
… the internet was not immediately useful in a million different ways for almost every person.
Even if you skip ARPAnet, you’re forgetting the Gopher days and even if you jump straight to WWW+email==the internet, you’re forgetting the mosaic days.
The applications that became useful to the masses emerged a decade+ after the public internet and even then, it took 2+ decades to reach anything approaching saturation.
Your dismissal is not likely to age well, for similar reasons.
the "usefulness" excuse is irrelevant, and the claim that phones/internet is "immediately useful" is just a post hoc rationalization. It's basically trying to find a reasonable reason why opposition to AI is valid, and is not in self-interest.
The opposition to AI is from people who feel threatened by it, because it either threatens their livelihood (or family/friends'), and that they feel they are unable to benefit from AI in the same way as they had internet/mobile phones.
The usefulness of mobile phones was identifiable immediately and it is absolutely not 'post hoc rationalization'. The issue was the cost - once low cost mobile telephones were produced they almost immediately became ubiquitous (see nokia share price from the release of the nokia 6110 onwards for example).
This barrier does not exist for current AI technologies which are being given away free. Minor thought experiment - just how radical would the uptake of mobile phones have been if they were given away free?
It's only low cost for general usage chat users. If you are using it for anything beyond that, you are paying or sitting in a long queue (likely both).
You may just be a little early to the renaissance. What happens when the models we have today run on a mobile device?
The nokia 6110 was released 15 years after the first commercial cell phone.
Yes although even those people paying are likely still being subsidized and not currently paying the full cost.
Interesting thought about current SOTA models running on my mobile device. I've given it some thought and I don't think it would change my life in any way. Can you suggest some way that it would change yours?
It will open access of llms to developers in the same way smart phones opened access to mobile general computing.
I really think most everyone misses the actual potential of llms. They aren't an app but an interface.
They are the new UI everyone has known they wanted going back as long as we've had computers. People wanted to talk to the computer and get results.
Think of the people already using them instead of search engines.
To me, and likely you, it doesn't add any value. I can get the same information at about the same speed as before with the same false positives to weed through.
To the person that couldn't use a search engine and filled the internet with easily answered questions before, it's a godsend. They can finally ask the internet in plain ole whatever language they use and get an answer. It can be hard to see, but this is the majority of people on this planet.
LLMs raise the floor of information access. When they become ubiquitous and basically free, people will forget they ever had to use a mouse or hunt for the right pixel to click a button on a tiny mobile device touch screen.
I think that's a nice reply and these products becoming the future of user computer interface is possible.
I can imagine them generating digital reality on the fly for users - no more dedicated applications, just pure creation on demand ('direct me via turn by turn 3d navigation to x then y and z', 'replay that goal that just was scored and overlay the 3 most recent similar goals scored like that in the bottom right corner of the screen', 'generate me a 3D adventure game to play in the style of zelda, but make it about gnomes').
I suspect the only limitation for a product like this is energy and compute.
Eh, quite the contrary. A lot of anti AI people genuinely wanted to use AI but run into the factual reality of the limitations of the software. It's not that it's going to take my job, it's that I was told it would redefine how I do work and is exponentially improving only to find out that it just kind of sucks and hasn't gotten much better this year.
> Very to somewhat useful in some fields (like programming) but the average person will easily be able to go through their day without using AI.
I know a lot of "normal" people who have completely replaced their search engine with AI. It's increasingly a staple for people.
Smartphones were absolutely NOT immediately useful in a million different ways for almost every person, that's total revisionist history. I remember when the iPhone came out, it was AT&T only, it did almost nothing useful. Smartphones were a novelty for quite a while.
I agree with most points but as a tech enthusiast, I was using a smart phone years before the iPhone, and I could already use the internet, make video calls, email etc around 2005. It was a small flip phone but it was not uncommon for phones to do that already at that time, at least in Australia and parts of Asia (a Singaporean friend told me about the phone).
A year after the iPhone came out… it didn’t have an App Store, barely was able to play video, barely had enough power to last a day. You just don’t remember or were not around for it.
A year after llms came out… are you kidding me?
Two years?
10 years?
Today, by adding an MCP server to wrap the same API that’s been around forever for some system, makes the users of that system prefer NLI over the gui almost immediately.
LLMs hold some real utility. But that real utility is buried under a mountain of fake hype and over-promises to keep shareholder value high.
LLMs have real limitations that aren't going away any time soon - not until we move to a new technology fundamentally different and separate from them - sharing almost nothing in common. There's a lot of 'progress-washing' going on where people claim that these shortfalls will magically disappear if we throw enough data and compute at it when they clearly will not.
I think the missing ingredient is not something the LLMs lack, but something we as developers don't do - we need to constrain, channel, and guide agents by creating reactive test environments around them. Not vibes, but hard tests, they are the missing ingredient to coding agents. You can even use AI to write most of these tests but the end result depends on how well you structured your code to be testable.
If you inherit 9000 tests from an existing project you can vibe code a replacement on your phone in a holiday, like Simon Willison's JustHTML port. We are moving from agents semi-randomly flailing around to constraint satisfaction.
I find opus 4.5 and gpt 5.2 mind blowing more often than I find them dumb as rocks. I don’t listen to or read any marketing material, I just use the tools. I couldn’t care less about what the promises are, what I have now available to me is fundamentally different from what I had in August and it changed completely how I work.
Markets never deliver. That isnt new, i do think llms are not far off from google in terms of impact.
Search, as of today, is inferior to frontier models as a product. However, best case still misses expected returns by miles which is where the growsing comes from.
Generative art/ai is still up in the air for staying power but id predict it isnt going away.
I think the split between vibe coding and AI-assisted coding will only widen over time. If you ask LLMs to do something complex, they will fail and you waste your time. If you work with them as a peer, and you delegate tasks to them, they will succeed and you save your time.
For most books, but technical non-fiction in particular, the payout isn't nearly worth enough for the effort.
And by "most" there I mean "all". Yes, there are exceptions, but those exceptions prove the rule.
I've written 2 technical books, for incredibly niche audiences, where the total number of potential buyers is numbered in the low thousands.
I self published as a PDF. and charge $200 a copy, of which I keep $200. It's -marginally- worth it. But the hourly rate is much lower than my day job.
The marketing benefit (as it affects my actual business in the same field) is likely real, but hard to measure. Still, having "written the book" opens doors, and brings credibility.
Disagree, a blog that gets tens of thousands of unique visitors could clear huge numbers on KDP. Maybe your niche is too narrow (probably, given your TAM is in the thousands) but this post is about "timeless programming projects" and is going to be extremely broad. The number of hits to the blog is itself an indicator of a very big and very eager potential market.
Because no middle managers will get promoted for doing this. All large corporate structures are the same: What's the incentive for the mini warlords to expand their mini empire? Nothing else is worth doing (to them).
Mocks make it easy to record and assert on method invocations. Additionally spys (instance mocks) are really useful when you need to forward to the real method or rely on some state.
At the moment I can't see anything Mokckito gives that you technically couldn't implement yourself via subclassing and overriding, but it'd be a lot of boilerplate to proxy things and record the arguments.
Subclasing and overriding is not a good idea. There is no compilation failure if you forget to override a function which can lead to flakey tests at best and prod data impact at worst.
Credentials should only be provided at the application root, which is going to be a different root for a test harness.
Mockito shouldn't change whether or not this is possible; the code shouldn't have the prod creds (or any external resource references) hard coded in the compiled bytecode.
I totally agree, I’m being tongue in cheek, but given how poor some codebases can be, the more precautions the better ie compilation failures on non-mocked functions.
Mockito allows one to write mocks in tests for code that doesn't use dependency injection and isn't properly testable in any other way.
On the one hand, you should just design things to be testable from the start. On the other... I'm already working in this codebase with 20 years of legacy untestable design...
Google API libraries mark every class as "final" so it's not trivial to mock-extend it for tests. But third-party IO is exactly the thing you'd want to mock.
Probably because they zealously followed "Effective Java" book.
Once you start writing adapters you need a way to instantiate them to choose between implementations, and factories is often used for this. Then you might generalize the test suites to make the setup easier and you end up with the infamous FactoryFactory pattern.
No, some other library classes accept only their own, not my adapter.
Not mentioning of course needless copy-pasting dosens of members in the adapter. And it must be in prod code, not tests, even though it's documentation would say "Adapter for X, exists only for tests, to be able to mock X".
That's a lot of upfront work and maintenance, not to mention the friction of needing to mentally translate every occurrence of OurFooAdapter to Foo in order to find documentation.
Mockito uses declarative matching style of specifying what should be mocked. You don't need to implement or even stub all of interface methods since Mockito can do it itself. It may be extremely concise. For example, interfaces may have tens methods or even more, but only one method is needed (say, java.sql.ResultSet). And finally probably the most important thing, interaction with mocks is recorded and then can be verified if certain methods were invoked with certain arguments.
That’s the seductive power of mocking - you get a test up and running quickly. The benefit to the initial test writer is significant.
The cost is the pain - sometimes nightmarish - for other contributors to the code base since tests depending on mocking are far more brittle.
Someone changes code to check if the ResultSet is empty before further processing and a large number of your mock based tests break as the original test author will only have mocked enough of the class to support the current implementation.
Working on a 10+ year old code base, making a small simple safe change and then seeing a bunch of unit tests fail, my reaction is always “please let the failing tests not rely on mocks”.
> Someone changes code to check if the ResultSet is empty before further processing and a large number of your mock based tests break as the original test author will only have mocked enough of the class to support the current implementation.
So this change doesn't allow an empty result set, something that is no longer allowed by the new implementation but was allowed previously. Isn't that the sort of breaking change you want your regression tests to catch?
I used ResultSet because the comment above mentioned it. A clearer example of what I’m talking about might be say you replace “x.size() > 0” with “!x.isEmpty()” when x is a mocked instance of class X.
If tests (authored by someone else) break, I now have to figure out whether the breakage is due to the fact that not enough behavior was mocked or whether I have inadvertently broken something. Maybe it’s actually important that code avoid using “isEmpty”? Or do I just mock the isEmpty call and hope for the best? What if the existing mocked behavior for size() is non-trivial?
Typically you’re not dealing with something as obvious.
What is the alternative? If you write a complete implementation of an interface for test purposes, can you actually be certain that your version of x.isEmpty() behaves as the actual method? If it has not been used before, can you trust that a green test is valid without manually checking it?
When I use mocking, I try to always use real objects as return values. So if I mock a repository method, like userRepository.search(...) I would return an actual list and not a mocked object. This has worked well for me. If I actually need to test the db query itself, I use a real db
For example, one alternative is to let my IDE implement the interface (I don’t have to “write” a complete implementation), where the default implementations throw “not yet implemented” type exceptions - which clearly indicate that the omitted behavior is not a deliberate part of the test.
Any “mocked” behavior involves writing normal debuggable idiomatic Java code - no need to learn or use a weird DSL to express the behavior of a method body. And it’s far easier to diagnose what’s going on or expected while running the test - instead of the backwards mock approach where failures are typically reported in a non-local manner (test completes and you get unexpected invocation or missing invocation error - where or what should have made the invocation?).
My test implementation can evolve naturally - it’s all normal debuggable idiomatic Java.
It doesn't have to be a breaking change -- an empty result set could still be allowed. It could simply be a perf improvement that avoids calling an expensive function with an empty result set, when it is known that the function is a no-op in this case.
If it's not a breaking change, why would a unit test fail as a result, whether or not using mocks/fakes for the code not under test? Unit tests should test the contract of a unit of code. Testing implementation details is better handled with assertions, right?
If the code being mocked changes its invariants the code under test that depends on that needs to be carefully re-examined. A failing unit test will alert one to that situation.
(I'm not being snarky, I don't understand your point and I want to.)
The problem occurs when the mock is incomplete. Suppose:
1. Initially codeUnderTest() calls a dependency's dep.getFoos() method, which returns a list of Foos. This method is expensive, even if there are no Foos to return.
2. Calling the real dep.getFoos() is awkward, so we mock it for tests.
3. Someone changes codeUnderTest() to first call dep.getNumberOfFoos(), which is always quick, and subsequently call dep.getFoos() only if the first method's return value is nonzero. This speeds up the common case in which there are no Foos to process.
4. The test breaks because dep.getNumberOfFoos() has not been mocked.
You could argue that the original test creator should have defensively also mocked dep.getNumberOfFoos() -- but this quickly becomes an argument that the complete functionality of dep should be mocked.
Jumping ahead to the comments below: obviously, I mentioned `java.sql.ResultSet` only as an example of an extremely massive interface. But if someone starts building theories based on what is left unsaid in the example for those from outside the Java world, one could, for instance, assume that such brittle tests are simply poorly written, or that they fail to mitigate Mockito's default behavior.
In my view, one of the biggest mistakes when working with Mockito is relying on answers that return default values even when a method call has not been explicitly described, treating this as some kind of "default implementation". Instead, I prefer to explicitly forbid such behavior by throwing an `AssertionError` from the default answer. Then, if we really take "one method" literally, I explicitly state that `next()` must return `false`, clearly declaring my intent that I have implemented tests based on exactly this described behavior, which in practice most often boils down to a fluent-style list of explicitly expected interactions. Recording interactions is also critically important.
How many methods does `ResultSet` have today? 150? 200? As a Mockito user, I don't care.
As someone who has been out of Java for close to 10 years now, you certainly could do without Mockito, but you'd be writing a lot of boiler plate code repetitively. There's also the case of third-party libraries that you don't control and Mockito has decent facilities for working with those, especially when you're working with a codebase that isn't pure DI and interfaces.
The point is to let you create mocks without having to go through the whole polymorphism rigmarole, without forcing classes to define a separate interface or anything like that.
because even supposing you have an interface for your thing under test (which you don't necessarily, nor do you necessarily want to have to) it lets you skip over having to do any fake implementations, have loads of variations of said fake implementations, have that code live somewhere, etc etc.
Instead your mocks are all just inline in the test code: ephemeral, basically declarative therefore readily readable & grokable without too much diversion, and easily changed.
A really good usecase for Java's 'Reflection' feature.
An anonymous inner class is also ephemeral, declarative, inline, capable of extending as well as implementing, and readily readable. What it isn't is terse.
Mocking's killer feature is the ability to partially implement/extend by having some default that makes some sense in a testing situation and is easily instantiable without calling a super constructor.
Magicmock in python is the single best mocking library though, too many times have I really wanted mockito to also default to returning a mock instead of null.
Yeah, it's funny, I'm often arguing in the corner of being verbose in the name of plain-ness and greater simplicity.
I realise it's subjective, but this is one of the rare cases where I think the opposite is true, and using the 'magic' thing that shortcuts language primitives in a sort-of DSL is actually the better choice.
It's dumb, it's one or two lines, it says what it does, there's almost zero diversion. Sure you can do it by other means but I think the (what I will claim is) 'truly' inline style code of Mockito is actually a material value add in readability & grokability if you're just trying to debug a failing test you haven't seen in ages, which is basically the usecase I have in mind whenever writing test code.
I cannot put my finger on it exactly either. I also often find the mocking DSL the better choice in tests.
But when there are many tests where I instantiate a test fixture and return it from a mock when the method is called, I start to think that an in memory stub would have been less code duplication and boilerplate... When some code is refactored to use findByName instead of findById and a ton of tests fail because the mock knows too much implementation detail then I know it should have been an in memory stub implementation all along.
Before Mockito, it was common (where I worked) to create an interface just to support testing. This is an anti-pattern in my opinion. To create interfaces just for testing complicates the code and it is one of my pet peeves. It also encourages the factory pattern.
It’s definitely a bit annoying and verbose in Java but I think creating an interface to support testing is a net positive. That interface is the specification of what that concrete class requires it’s dependencies to do.
I think all the dependencies of a class should define behaviour not implementation so it’s not tightly coupled and can be modified in the future. If you have a class that injects LookUpService, why not put an interface LookUpper in front of it? It’s a layer of indirection but we have IDEs now and reading the interface should be easier or at least provide context.
There are many cases where you don't control the library code your code depends on that you want to test. Also, the FactoryFactoryFactory patterns can be quite cumbersome and simply mocking out something makes for a far simpler test. There are likely more common cases.
reply