Sometimes the AI is all too good at writing tests.
I agree with the idea, I do it too, but you need to make sure the test don't just validate the incorrect behavior or that the code is not updated to pass the test in a way that actually "misses the point".
I've had this happen to me on one or two tests every time
Even more important, those tests need to be useful. Often unit tests are simply testing the code works as written which is generally doing more harm than good.
To give some further advice to juniors: if somebody is telling you writing unit tests is boring, they haven’t learned how to write good tests. There appears to be a large intersection between devs who think testing is a dull task and devs who see a self proclaimed speed up from AI. I don’t think this is a coincidence.
Writing useful tests is just as important as writing app code, and should be reviewed with equal scrutiny.
For some reason Gemini seems to be worse at it than Claude lately. Since mostly moving to 3 I've had it go back and change the tests rather than fixing the bug on what seems to be a regular basis. It's like it's gotten smart enough to "cheat" more. You really do still have to pay attention that the tests are valid.
Yep. It's incredibly annoying that obviously these AI companies are turning the "IQ knob" on these models up and down without warning or recourse. First OpenAI, then Anthropic and now Google. I'm guessing it's a cost optimization. OpenAI even said that part out loud.
Of course, for customers it is just one more reason you need to be looking at every AI outputs. Just because they did something perfect yesterday doesn't mean they won't totally screw up the exact same thing today. Or you could say it's one more advantage of local models: you control the knobs.
Don't worry, the 99% reduction in battery materials is just a strategic pivot to an 'asset-light' approach. The 4680 supply chain isn't collapsing, it’s just being 'optimized' for a future where cars apparently don't need batteries—just FSD subscriptions and robotaxis that run on optimism.
Perhaps "our PR team is a prompt" is what they mean to convey? Or "let's make this obviously AI so more people comment pointing that out" is their social media strategy?
Since LLMs emulate human writing, what is it about that sentence that gives away that it was written by an LLM rather than human? Haven't we seen plenty of hollow-sounding self-aggrandizing marketing copies like this one pre-LLMs? What is it that is wrong with this sentence?
It sounds like corporate meaningless drivel. Everyone is dogging on it because it's no different than when startups of yore would say "making the world a better place." As if the meaningless platitude was some incantation you had to whisper or the funding wouldn't close.
Would that be odd? AI companies are still staffed by people, and large announcements like acquihires certainly feel like they could use a slightly more human touch if they truly mean a lot to the company.
Eh if anyone is all in on AI and it replacing human writing it would be an AI company
But then that means if you're a PR or communications person working at this startup (or at Meta?) your job is not secure and that your days there are probably numbered, which I'm sure is great for morale...
To anyone who isn't deep in the AI hype space it reads like satire to include such an obvious AI tell but I think it's a positive in the eyes of the AI hype world. It's like how anyone not a lizard is repulsed by LinkedIn speak and yet it dominates the platform.
I saw this in a past hype cycle. What happens is that it becomes a "performative" art in an echo-chamber for startups, startup founders, VCs. Performative meaning doing things one thinks others want to see rather than when it makes sense.
Management is quizzing their tech teams on injecting agents into their workflows whatever the f that means. Some of these big companies will acquire startups in the space so they are not left behind on the hype-train. So, they can claim to have agentic talent on their teams.
Those of us who have seen this movie play out know the ending.
To be honest, I'm a bit annoyed that I installed maybe 2-3 extensions, and in the last year or so whenever I open one of their IDEs I need to update anywhere from 10 to 25 extensions. What are these things? Where did they come from and why do I have them, I used to see only the extensions that I actually installed, and now there's all kind of stuff that I thought was basic functionality.
A lot of core functionality is implemented as bundled plugins (they ship with the IDE, but can receive updates separately). They can also be independently disabled (and older versions used to come with only some enabled and ask you which others you want enabled at first launch).
Or maybe it’s just strange classification. I see a lot of prompts on the internet looking like “act as a senior xxx expert with over 15 years of industry experience and answer the following: [insert simple question]”
I hope those are not classified as “roleplaying” the “roleplay” here is just a trick to get better answer from the model, often in a professional setting that has nothing to do with creative writing of NSFW stuff
I can't be sure, but this sounds entirely possible to me.
There are many, many people, and websites, dedicated to roleplaying, and those people will often have conversations lasting thousands of messages with different characters. I know a people whose personal 'roleplay AI' budget is a $1,000/month, as they want the best quality AIs.
Would be good to look into those particular statistics, then. Seems like the category could include all sorts of stuff:
> This indicates that users turn to open models primarily for creative interactive dialogues (such as storytelling, character roleplay, and gaming scenarios) and for coding-related tasks. The dominance of roleplay (hovering at more than 50% of all OSS tokens) underscores a use case where open models have an edge: they can be utilized for creativity and are often less constrained by content filters, making them attractive for fantasy or entertainment applications. Roleplay tasks require flexible responses, context retention, and emotional nuance - attributes that open models can deliver effectively without being heavily restricted by commercial safety or moderation layers. This makes them particularly appealing for communities experimenting with character-driven experiences, fan fiction, interactive games, and simulation environments.
I could imagine something like D&D or other types of narrative adventures on demand with a machine that never tires of exploring subplots or rewriting sections to be a bit different is a pretty cool thing to have. Either that, or writing fiction, albeit hopefully not entire slop books that are sold, but something to draw inspiration from and do a back and forth.
In regards to NSFW stuff, a while back people were clowning on OpenAI for suggesting that they'd provide adult writing content to adults, but it might as well be a bunch of money that's otherwise left on the table. Note: I'm all for personal freedom, though one also has to wonder about the longer term impact of those "AI girlfriend/boyfriend" trends, you sometimes see people making videos about those subreddits. Oh well, not my place to judge.
Edit: oh hey, there is more data there after all
> Among the highest-volume categories, roleplay stands out for its consistency and specialization. Nearly 60% of roleplay tokens fall under Games/Roleplaying Games, suggesting that users treat LLMs less as casual chatbots and more as structured roleplaying or character engines. This is further reinforced by the presence of Writers Resources (15.6%) and Adult content (15.4%), pointing to a blend of interactive fiction, scenario generation, and personal fantasy. Contrary to assumptions that roleplay is mostly informal dialogue, the data show a well-defined and replicable genre-based use case.
The way I think about it, the training data (i.e. the internet) has X% of people asking something like "explain it to me like I'm five years old" and Y% of people framing it like "I'm technical, explain this to me in detail". You use the "act as a senior XXX" when you want to bias the output towards something more detailed.
> I guess that also selects for people who would use openrouter
It definitely does. OpenRouter is pretty popular among roleplayers and creative writers due to having a wide variety of models available, sometimes providing free access to quality models such as DeepSeek, and lacking any sort of rules against generating "adult" content.
Openrouter has an apps tab. If you look at the free, non-coding models, some apps that feature are: janitor.ai, sillytavern, chub.ai. I'd never heard of them but people seem to be burning millions of tokens enjoying them.
If you rely on AI to write most of your code (instead of using it like Stackoverflow), Claude Code/OpenAI Codex subscription are cheaper than buying tokens. So those users are not on openrouter.
Both Claude Code and Codex steer you towards the monthly subscription. Last time I tried Codex, I remember several aspects of it being straight up broken if used with an API key instead of a subscription account.
The business model is likely built upon the assumption that most people aren't going to max out their limits every day, because if they were, it likely wouldn't be profitable.
I'm not surprised. Roleplay means endless sessions with huge context (character history, world, previous dialogues). On commercial APIs (OpenAI/Anthropic), that long-context costs a fortune. On OpenRouter, many OSS models, especially via providers like DeepInfra or Fireworks, cost pennies or are even free, like some Free-tier models. The RP community is very price-sensitive, so they massively migrate to cheap OSS models via aggregators. It skews the stats but highlights a real niche for cheap inference
That also stuck out for me, I was wondering if it was video games using openrouter for uptime / inference switching, video games would use a lot of tokens generating dialogue for a few programmer's villages.
I'm not surprised at all. The HN crowd think LLMs are mostly used for engineering because they live in a multi layer bubble. Real people in the real world do all kind of shit with LLMs which aren't productivity or even work related.
reply