More

IMTDb · 2026-01-08T16:08:24 1767888504

A key difference is that the cost to execute a cab ride largely stayed the same. Gas to get you from point A to point B is ~$5, and there's a floor on what you can pay the driver. If your ride costs $8 today, you know that's unsustainable; it'll eventually climb to $10 or $12.

But inference costs are dropping dramatically over time, and that trend shows no signs of slowing. So even if a task costs $8 today thanks to VC subsidies, I can be reasonably confident that the same task will cost $8 or less without subsidies in the not-too-distant future.

Of course, by then we'll have much more capable models. So if you want SOTA, you might see the jump to $10-12. But that's a different value proposition entirely: you're getting significantly more for your money, not just paying more for the same thing.

lompad · 2026-01-09T08:48:10 1767948490

>But inference costs are dropping dramatically over time,

Please prove this statement, so far there is no indication that this is actually true - the opposite seems to be the case. Here are some actual numbers [0] (and whether you like Ed or not, his sources have so far always been extremely reliable.)

There is a reason the AI companies don't ever talk about their inference costs. They boast with everything they can find, but inference... not.

[0]: https://www.wheresyoured.at/oai_docs/

patresh · 2026-01-09T10:59:41 1767956381

I believe OP's point is that for a given model quality, inference cost decreases dramatically over time. The article you linked talks about effective total inference costs which seem to be increasing.

Those are not contradictory: a company's inference costs can increase due to deploying more models (Sora), deploying larger models, doing more reasoning, and an increase in demand.

However, if we look purely at how much it costs to run inference on a fixed amount of requests for a fixed model quality, I am quite convinced that the inference costs are decreasing dramatically. Here's a model from late 2025 (see Model performance section) [1] with benchmarks comparing a 72B parameter model (Qwen2.5) from early 2025 to the late 2025 8B Qwen3 model.

The 9x smaller model outperforms the larger one from earlier the same year on 27 of the 40 benchmarks they were evaluated on, which is just astounding.

[1] https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct

academia_hack · 2026-01-09T16:17:31 1767975451

++

Anecdotally, I find you can tell if someone worked at a big AI provider or a small AI startup by proposing an AI project like this:

" First we'll train a custom trillion parameter LLM for HTML generation. Then we'll use it to render our homepage to our 10 million daily visitors. "

The startup people will be like "this is a bad idea because you don't have enough GPUs for training that LLM" and the AI lab folks will be like "How do you intend to scale inference if you're not Google?"

SecretDreams · 2026-01-08T16:17:47 1767889067

> But inference costs are dropping dramatically over time, and that trend shows no signs of slowing. So even if a task costs $8 today thanks to VC subsidies, I can be reasonably confident that the same task will cost $8 or less without subsidies in the not-too-distant future.

I'd like to see this statement plotted against current trends in hardware prices ISO performance. Ram, for example, is not meaningfully better than it was 2 years ago, and yet is 3x the price.

I fail to see how costs can drop while valuations for all major hardware vendors continue to go up. I don't think the markets would price companies in this way if the thought all major hardware vendors were going to see margins shrink a la commodity like you've implied.

santadays · 2026-01-08T16:30:09 1767889809

I've seen the following quote.

"The energy consumed per text prompt for Gemini Apps has been reduced by 33x over the past 12 months."

My thinking is that if Google can give away LLM usage (which is obviously subsidized) it can't be astronomically expensive, in the realm of what we are paying for ChatGPT. Google has their own TPUs and company culture oriented towards optimizing the energy usage/hardware costs.

I tend to agree with the grandparent on this, LLMs will get cheaper for what we have now level intelligence, and will get more expensive for SOTA models.

lelanthran · 2026-01-08T16:49:49 1767890989

Google is a special case - ever since LLMs came out I've been pointing out that Google owns the entire vertical.

OpenAI, Anthropic, etc are in a race to the bottom, but because they don't own the vertical they are beholden to Nvidia (for chips), they obviously have less training data, they need constant influsx of cash just to stay in that race to the bottom, etc.

Google owns the entire stack - they don't need nvidia, they already have the data, they own the very important user-info via tracking, they have millions, if not billions, of emails on which to train, etc.

Google needs no one, not even VCs. Their costs must be a fraction of the costs of pure-LLM companies.

viraptor · 2026-01-09T00:18:48 1767917928

> OpenAI, Anthropic, etc are in a race to the bottom

There's a bit of nuance hiding in the "etc". Openai and anthropic are still in a race for the top results. Minimax and GLM are in the race to the bottom while chasing good results - M2.1 is 10x cheaper than Sonnet for example, but practically fairly close in capabilities.

lelanthran · 2026-01-09T10:27:13 1767954433

> There's a bit of nuance hiding in the "etc". Openai and anthropic are still in a race for the top results.

That's not what is usually meant by "race to the bottom", is it?

To clarify, in this context I mean that they are all in a race to be the lowest margin provider.

They re at the bottom of the value chain - they sell tokens.

It's like being an electricity provider: if you buy $100 or electricity and produce 100 widgets, which you sell for $1k each, that margin isn't captured by the provider.

That's what being at the bottom of the value chain means.

viraptor · 2026-01-09T11:40:21 1767958821

I get what it means, but it doesn't look to me like they're trying that yet. They don't even care that people buy multiple highest level plans to rotate them every week, because they don't provide a high enough tier for the existing customers. I don't see any price war happening. We don't know what their real margins are, but I don't see the race there. What signs do you see that Anthropic and Openai are in the race to the bottom?

lelanthran · 2026-01-09T12:41:43 1767962503

> I don't see any price war happening. What signs do you see that Anthropic and Openai are in the race to the bottom?

There doesn't need to be signs of a race (or a price-war),only signs of commodification; all you need is a lack of differentiation between providers for something to turn into a commodity.

When you're buying a commodity, there's no big difference between getting your commodity delivered by $PROVIDER_1 and getting your commodity delivered by $PROVIDER_2.

The models are all converging quality-wise. Right now the number of people who swear by OpenAI models are about the same as the number of people who swear by Anthropic models, which are about the same as the number of people who swear by Google's models, etc.

When you're selling a commodity, the only differentiation is in the customer experience.

Right now, sure, there's no price war, but right now almost everyone who is interested are playing with multiple models anyway. IOW, the target consumers are already treating LLMs as a commodity.

flyinglizard · 2026-01-08T22:12:27 1767910347

Gmail has 1.8b active users, each with thousands of emails in their inbox. The number of emails they can train of is probably in the trillions.

brokencode · 2026-01-08T22:29:54 1767911394

Email seems like not only a pretty terrible training data set, since most of it is marketing spam with dubious value, but also an invasion of privacy, since information could possibly leak about individuals via the model.

palmotea · 2026-01-08T22:59:13 1767913153

> Email seems like not only a pretty terrible training data set, since most of it is marketing spam with dubious value

Google probably even has an advantage there: filter out everything except messages sent from valid gmail account to valid gmail account. If you do that you drop most of the spam and marketing, and have mostly human-to-human interactions. Then they have their spam filters.

Terr_ · 2026-01-09T00:04:33 1767917073

I'd upgrade that "probably" leak to "will absolutely" leak, albeit with some loss of fidelity.

Imagine industrial espionage where someone is asking the model to roleplay a fictional email exchange between named corporate figures in a particular company.

SoftTalker · 2026-01-09T04:48:28 1767934108

> Google has ... company culture oriented towards optimizing the energy usage/hardware costs.

Google has a company culture of luring you in with freebies and then mining your data to sell ads.

AdrianB1 · 2026-01-09T00:16:03 1767917763

> if Google can give away LLM usage (which is obviously subsidized) it can't be astronomically expensive

There is a recent article by Linus Sebastian (LTT) talking about Youtube: it is almost impossible to support the cost to build a competitor because it is astronomically expensive (vs potential revenue)

SecretDreams · 2026-01-08T16:35:09 1767890109

I do not disagree they will get cheaper, but I pointing out that none of this is being reflected in hardware pricing. You state LLMs are becoming more optimized (less expensive). I agree. This should have a knockon effect on hardware prices, but it is not. Where is the disconnect? Are hardware prices a lagging indicator? Is Nvidia still a 5 trillion dollar company if we see another 33x improvement in "energy consumed per text prompt"?

zozbot234 · 2026-01-08T16:38:52 1767890332

Jevon's paradox. As AI gets more efficient its potential scope expands further and the hardware it runs on becomes even more valuable.

BTW, the absolute lowest "energy consumed per logical operation" is achieved with so-called 'neuromorphic' hardware that's dog slow in latency terms but more than compensates with extreme throughput. (A bit like an even more extreme version of current NPU/TPUs.) That's the kind of hardware we should be using for AI training once power use for that workload is measured in gigawatts. Gaming-focused GPUs are better than your average CPU, but they're absolutely not the optimum.

xpe · 2026-01-08T21:56:52 1767909412

> I fail to see how costs can drop while valuations for all major hardware vendors continue to go up. I don't think the markets would price companies in this way if the thought all major hardware vendors were going to see margins shrink a la commodity like you've implied.

This isn't hard to see. A company's overall profits are influenced – but not determined – by the per-unit economics. For example, increasing volume (quantity sold) at the same per-unit profit leads to more profits.

hug · 2026-01-08T22:16:40 1767910600

> I'd like to see this statement plotted against current trends in hardware prices ISO performance.

Prices for who? The prices that are being paid by the big movers in the AI space, for hardware, aren't sticker price and never were.

The example you use in your comment, RAM, won't work: It's not 3x the price for OpenAI, since they already bought it all.

mcphage · 2026-01-08T18:28:31 1767896911

> So even if a task costs $8 today thanks to VC subsidies, I can be reasonably confident that the same task will cost $8 or less without subsidies in the not-too-distant future.

The same task on the same LLM will cost $8 or less. But that's not what vendors will be selling, nor what users will be buying. They'll be buying the same task on a newer LLM. The results will be better, but the price will be higher than the same task on the original LLM.

PaulHoule · 2026-01-08T16:24:38 1767889478

It's not the hardware getting cheaper, it's that LLMs were developed when we really didn't understand how they worked, and there is still some room to improve the implementations, particularly do more with less RAM... And that's everything from doing more with fewer weights to things like FP16, not to mention if you can 2x the speed you can get twice as much done with the same RAM and all the other parts.

SecretDreams · 2026-01-08T16:36:06 1767890166

Improvements in LLM efficiency should be driving hardware to get cheaper.

I agree with everything you've said, I'm just not seeing any material benefit to the statement as of now.

sothatsit · 2026-01-08T16:56:23 1767891383

Inference costs falling 2x doesn’t decrease hardware prices when demand for tokens has increased 10x.

PaulHoule · 2026-01-08T17:34:32 1767893672

It's the ratio. If revenue goes up 10x you can afford 10x more hardware if you can afford to do it all.

doctorpangloss · 2026-01-08T22:18:37 1767910717

> I fail to see how costs can drop while valuations for all major hardware vendors continue to go up.

yeah. valuations for hardware vendors have nothing to do with costs. valuations are a meaningless thing to integrate into your thinking about something objective like, will the retail costs of inference trend down (obviously yes)

forty · 2026-01-08T23:05:58 1767913558

What if we run out of GPU? Out of RAM? Out of electricity?

AWS is already raising GPU prices, that never happened before. What if there is war in Taiwan? What if we want to get serious about climate change and start saving energy for vital things ?

My guess is that, while they can do some cool stuff, we cannot afford LLMs in the long run.

jiggawatts · 2026-01-09T02:57:38 1767927458

> What if we run out of GPU?

These are not finite resources being mined from an ancient alien temple.

We can make new ones, better ones, and the main ingredients are sand and plastic. We're not going to run out of either any time soon.

Electricity constraints are a big problem in the near-term, but may sort themselves out in the long-term.

twelvedogs · 2026-01-09T03:32:22 1767929542

> main ingredients are sand and plastic

kinda ridiculous point, we're not running into gpu shortages because we don't have enough sand

renegade-otter · 2026-01-09T13:36:30 1767965790

We already had a sand shortage. In 2019...

https://www.bbc.com/future/article/20191108-why-the-world-is...

Craighead · 2026-01-09T04:32:01 1767933121

Even funnier, there are legitimate shortages of usable sand.

jiggawatts · 2026-01-09T09:28:20 1767950900

That’s my point: the key inputs are not materials but the high tech machinery and the skills to operate them.

Draiken · 2026-01-09T14:06:14 1767967574

Which is better because?

We can't copy/paste a new ASML no matter how hard you try (aside from open sourcing all of their IPs). Even if you do, by the time you copy one generation of machine, they're on a new generation and you now still have the bottleneck on the same place.

Not to mention that with these monopolies they can just keep increasing prices ad infinitum.

jiggawatts · 2026-01-09T21:26:00 1767993960

ASML's secret sauce is not that secret or uncopyable. The Chinese are already working on their clone of the Twinscan tools.

Veritasium recently made a good video on the ASML machine design: https://youtu.be/MiUHjLxm3V0

The outcome may seem like magic, but the input is "simply" hard work and a big budget: billions of dollars and years of investment into tuning the parameters like droplet size, frequency, etc...

The interviews make it clear that the real reason ASML's machines are (currently) unique is that few people had the vision, patience, and money to fund what seemed at the time impossible. The real magic was that ASML managed to hang on by a fingernail and get a successful result before the money ran out.

Now that tin droplet EUV lasers have not only been demonstrated to be possible, but have become the essential component of a hugely profitable AI chip manufacturing industry, obtaining funding to develop a clone will be much easier.

Draiken · 2026-01-10T20:22:33 1768076553

> ASML's secret sauce is not that secret or uncopyable.

You must've watched a different video. They took a decade to get there and they're happy to show all the how-to's because they know the devil is in the details.

forty · 2026-01-09T07:02:14 1767942134

If the US is ready to start a war against Europe to invade Groenland, it's certainly because they need more sand and plastic? Of course in weight it's probably mostly sand and plastic but the interesting bit probably needs palladium, copper, boron, cobalt, tungsten, etc

rhubarbtree · 2026-01-09T07:47:50 1767944870

Well, also for military purposes.

And general imperialism.

jiggawatts · 2026-01-09T09:26:41 1767950801

Greenland is Trump’s Ukraine. He’s jealous of Putin, that is all.

There is nothing in Greenland worth breaking up the alliances with Europe over.

Trump is too stupid to realise this, he just wants land like it’s a Civ game.

PS: An entire rack of the most expensive NVIDA equipment millions of dollars can buy has maybe a few grams of precious or rare metals in it. The cost of those is a maybe a dollar or two. They don’t even use gold any more!

The expensive part is making it, not the raw ingredients.

gylterud · 2026-01-09T14:43:21 1767969801

One would then maybe suspect breaking up alliances with Europe is the point of the whole thing.

jiggawatts · 2026-01-09T21:31:49 1767994309

Some of the best advice I've ever heard is to look at how people act and ignore how they claim they act or their stated reasons for doing so.

A corollary is that even a "technically false" model can better predict someone's actions than a "truthful one".

Trump may not be a Russian agent, but he acts like one consistently.

It's more effective to simply assume he's an agent of a foreign power, because that's the best predictor of his actions.

imcritic · 2026-01-10T01:58:22 1768010302

That alliance costs money. It doesn't bring anything good in return: the USSR (that this alliance was created against) is long gone. Trump is a genius if he somehow manages to kill 2 birds with 1 stone: make OTHER parties of the alliance want to disband the alliance AND get some piece of land with a unique strategic position all to himself/U.S.

I think it's Putin who is going to be jealous of Trump, not the other way around.

iwontberude · 2026-01-08T22:24:01 1767911041

Your point could have made sense but the amount of inference per request is also going up faster than the costs are going down.

supern0va · 2026-01-08T22:56:28 1767912988

The parent said: "Of course, by then we'll have much more capable models. So if you want SOTA, you might see the jump to $10-12. But that's a different value proposition entirely: you're getting significantly more for your money, not just paying more for the same thing."

SOTA improvements have been coming from additional inference due to reasoning tokens and not just increasing model size. Their comment makes plenty of sense.

manmal · 2026-01-09T00:40:36 1767919236

Is it? Recent new models tend to need fewer tokens to achieve the same outcome. The days of ultrathink are coming to an end, Opus is well usable without it.

IMTDb · 2026-01-07T10:07:49 1767780469

It is not in the pursuit of happiness that we find fulfillment, it is in the happiness of pursuit - Denis Waitley

Aeolun · 2026-01-07T12:32:42 1767789162

As in, playing tag is always going to be fun?

IMTDb · 2026-01-05T15:48:07 1767628087

Off base.

The only way to get "perfect rating" is to go to your junior dev and bring another interruption (maybe the dev was 90% done !). So now he has been interrupted twice by two different manager and you have contradicted your own boss in front of an employee. You just broke a cardinal rule of middle management: it's ok to tell your boss he is wrong, but not in front of someone else. Additionally, you also need to tell him to f** off with is request to get the numbers (without even trying to understand if the request was legitimate or not !), so that your precious sprint is saved. I don't see how he gets what he wants in your ideal handling. AT best you seem to tell him you will "look into it" in two weeks.

Much better solution is to help you junior dev solve the problem so the interruption goes away as fast as possible and he can go back to contributing to the sprint. If the VP requires these numbers and went as far as back channeling you there is probably a quite good reason for that. Maybe the last time he needed something you told him it was not possible because the sprint thingy is unmovable ? Once you have the result, you can go give those to the VP yourself, highlight the work of the junior dev, and use this "I am giving you the very important data you asked for" as a foot in the door to show that you had to pull the dev from the other feature, that these interruption also have cost and that you are more than happy to take care of them. He gets what he wants, the difficult conversation of "you did not do what you are supposed to do" happens behind closed doors, and you have a much better of getting results if he sees you as an ally to get his important stuff rather than a hinderance.

motbus3 · 2026-01-05T16:33:03 1767630783

I agree that this is true 90% of times but if you included office politics in the equation sometimes it is not.

If it is in a deep political institution these are the initial set of questions I would start with:

Who is the Jr to the the VP, what are their relation ? How is your Jr to the manager ? How is the manager relation to the VP? How respectful to boundaries the VP is to the boundaries? How likely is for him to repeat or to get you shoved out the way next time ? How much do you care about being put astray in comparison to the quality of overall work ? How many times this has occurred before ? How likely is for the Jr to bypass you anyway ?

And as one can see, this is just too much to bother with. Sometimes it is easier to cry out that you need more money and or time.

I would do the same by the way. Make the distraction go away and try to put things back into the process route. If the process does not work and this is constant there is no reason to tell the person that pays you that they are always wrong.

tonnydourado · 2026-01-05T15:58:05 1767628685

> there is probably a quite good reason for that.

Eeeehhh, might be overestimating executives a bit =P

But yeah, my first instinct is also to tell Gary to fuck off. That said, I would default to a process reason, so, the advice at the end wasn't totally useless for me.

pingananth · 2026-01-05T16:14:12 1767629652

Haha, fair point on overestimating them!

And I'm with you—my default instinct is also to tell Gary to back off using a Process Reason (e.g. 'It's not in the sprint'). It feels safe because it's logical.

The 'Advice at the end' was just trying to highlight why that specific shield often cracks against a VP (because they think they own the process). Glad that specific breakdown was useful to read, even if the scenario felt a bit generous to the exec!

Again, it also depends on who "Gary" is in the real world!

motbus3 · 2026-01-05T16:35:32 1767630932

It totally depends who Gary is....

pingananth · 2026-01-05T16:45:14 1767631514

haha! Yea, in real life, we be nicer to nice people and otherwise right. That kinda logic which simulators suck at! :D

pingananth · 2026-01-05T16:00:29 1767628829

This is a brilliant deconstruction. You’ve highlighted a flaw in my 'Correct' path: I optimized for Process Protection (Save the Sprint), but you are optimizing for Relationship Preservation (Save the VP's face).

You are absolutely right that contradicting the VP in front of the Junior Dev breaks the 'United Front' rule.

This highlights a key point: I built this to teach transferable heuristics (e.g., 'Protect the team'), not to be a rigid playbook. In real life, specific contexts (like 'Is the VP usually reasonable?') often override the default rule.

Your approach—facilitate the request to clear the distraction, then negotiate boundaries in private—is a more sophisticated heuristic than the one I initially coded. It trades short-term sprint purity for long-term political capital.

I love this. I’m going to add your 'Shield & Deliver' path as an alternative (and perhaps superior) winning state. This is exactly the nuance I wanted to surface.

DetroitThrow · 2026-01-05T16:07:14 1767629234

>I love this. I’m going to add your 'Shield & Deliver' path as an alternative (and perhaps superior) winning state. This is exactly the nuance I wanted to surface.

I would be wary of this being the superior winning state, but definitely an alternative. I've done exactly this in my career as a tech lead only for it to burn me, and probably 2/3rds of the time the best thing for everyone is to simply "Save the Sprint" and not become mired in discussions that often are for personal empire building that strategic leadership would hate.

Maybe people have different experiences than me on this, feel free to speak up!

pingananth · 2026-01-05T17:00:10 1767632410

This is exactly why management is hard to unit test!

You are absolutely right. If you 'Shield & Deliver' every time, you risk becoming the 'Yes Man' who absorbs infinite scope creep for someone's vanity project (Empire Building).

The 'Correct' answer actually depends entirely on the Nature of the Request: Legitimate Business Crisis? -> Shield & Deliver Noise/Politics? -> Save the Sprint

Distinguishing between the two before you act is the master skill. I think keeping both paths as valid strategies with different 'Trade-off' warnings or having 2 different contexts is the right move to reflect that ambiguity

StrangeSound · 2026-01-05T22:43:00 1767652980

Are your responses written by an LLM?

pingananth · 2026-01-06T04:06:59 1767672419

When the traffic was peak, I did use LLM to polish my responses. Later, I started replying without an LLM. No, AI agents at work. This is an individual at work!

maxverse · 2026-01-06T21:18:26 1767734306

It's crazy that it set off our alarms at the same time. One agreement -- oh, nice human. Two -- what is this!?

alienbaby · 2026-01-06T00:59:17 1767661157

Exactly what I was thinking. Is someone letting their ai agent do _all the work?

tonnydourado · 2026-01-05T23:37:50 1767656270

I'll my hat if they aren't.

cbeach · 2026-01-05T16:35:43 1767630943

> Much better solution is to help you junior dev solve the problem

Meanwhile there are five other subordinates and all the overhead that you're neglecting while you fiddle with your dev environment trying to get started on the task, as you've been away from direct engineering for a while.

DetroitThrow · 2026-01-05T16:02:22 1767628942

>If the VP requires these numbers and went as far as back channeling you there is probably a quite good reason for that.

This is good intuition but generally people won't tell you whether they have a good or silly/self-serving reason in my experience, and you can only really get them to surface that by comparing it to the priority of other commitments and forcing them to depriotize something.

I think the ratings might be a bit borked. There's a dialogue path choice that results in the A+ where you end up asking directly whether the back-channel was worth another delay, and you. The VP says no. No junior gets interrupted.

ryanjshaw · 2026-01-05T16:41:17 1767631277

> you can only really get them to surface that

Sometimes you don’t even need to surface it. You just force responsibility: “This will prevent us from being ready for Friday’s demo. If you’re cool with that I’ll run it by {project sponsor}.”

Now it’s between the VP and the project sponsor - as it rightfully should be.

IMTDb · 2026-01-03T14:35:03 1767450903

Companies can become « too big to fail » and dictatord can become « too powerful to fall » ?

IMTDb · 2025-12-30T05:27:07 1767072427

You are comparing apple to oranges. Social media posts are static, don’t watch you etc. But the distribution platform does all these things.

In books it’s exactly the same thing: do not believe for one second that the publishing industry does not watch engagement metrics (aka: sales) and does not adapt to the taste of the market. It’s also tuned to maximize outrage; see how popular unauthorized biographies of polarizing figures have become - who is next on Walter Isaacson list ? I am betting Trump must be somewhere there and it’s gonna be a banger.

IMTDb · 2025-12-28T10:20:11 1766917211

If these customers have set an automated system to pay you $200 every single minute, that’s correct. If they haven’t and it was just a one off sale, you are missing the “recurring” part in ARR.

IMTDb · 2025-12-19T23:08:15 1766185695

> It seems like EU in general should be heavily invested in Mistral's development, but it doesn't seem like they are

The EU is extremely invested in Mistral's development: half of the effort is finding ways to tax them (hello Zucman tax), the other half is wondering how to regulate them (hello AI act)

District5524 · 2025-12-20T05:24:21 1766208261

Zucman taxes rich individuals (100m€+), not Mistral. AI Act rules are not that difficult to comply with by GPAI model providers as long as the model doesn't become systemic risk... They have to spend a lot more time on PR and handshaking with French politicians than on AI compliance. They probably don't even have a single FTE for that... So that's just prejudice I believe.

IMTDb · 2025-12-18T18:20:05 1766082005

> Our first and default tool should be some form of lightweight automated testing

Manual verification isn't about skipping tests, it's about validating what to test in the first place.

You need to see the code work before you know what "working" even means. Does the screen render correctly? Does the API return sensible data? Does the flow make sense to users? Automated tests can only check what you tell them to check. If you haven't verified the behavior yourself first, you're just encoding your assumptions into test cases.

I'd take "no tests, but I verified it works end-to-end" over "full test coverage, but never checked if it solves the actual problem" every time. The first developer is focused on outcomes. The second is checking boxes.

Tests are crucial: they preserve known-good behavior but you have to establish what "good" looks like first, and that requires human judgment. Automate the verification, not the discovery. So our first and default tool remains manual verification

maerF0x0 · 2025-12-19T18:01:04 1766167264

I suppose we could be talking circles around eachother, but I'd say many of what you've suggested as manual tests could be codified into an automated test just as easily.

Manual: `curl localhost:8080 | jq .` or whatever, brings value once.

Automated: `assert.ValidJSON(req.Body)` is basically identical, but can be repeated over and over again

IMTDb · 2025-12-11T13:50:37 1765461037

I know few countries that reject poisonous US social media in favor of better platform that is safe for children, safe for news and information, and safe for society and for Democracy itself: the peoples democracy of North Korea, the democratic republic of Iran, the not authoritarian society of Russia, etc

I see tremendous correlation between restriction of access to some websites and straight up dictatorship that pretend to protect it's population from the evils of foreign influences.

buellerbueller · 2025-12-11T15:28:16 1765466896

Perhaps you have the causality reversed, friendo?

IMTDb · 2025-12-05T01:28:12 1764898092

Or maybe it’s just strange classification. I see a lot of prompts on the internet looking like “act as a senior xxx expert with over 15 years of industry experience and answer the following: [insert simple question]”

I hope those are not classified as “roleplaying” the “roleplay” here is just a trick to get better answer from the model, often in a professional setting that has nothing to do with creative writing of NSFW stuff

CJefferson · 2025-12-05T06:46:51 1764917211

I can't be sure, but this sounds entirely possible to me.

There are many, many people, and websites, dedicated to roleplaying, and those people will often have conversations lasting thousands of messages with different characters. I know a people whose personal 'roleplay AI' budget is a $1,000/month, as they want the best quality AIs.

susanthenerd · 2025-12-05T05:50:09 1764913809

OpenRouter classifies content by the app that's used to interact with the llm. https://openrouter.ai/docs/app-attribution

mike_hearn · 2025-12-05T09:21:44 1764926504

This paper says they classify it by feeding tokens to a Google model.

Windchaser · 2025-12-05T04:38:46 1764909526

I strongly bet that this is it.

KronisLV · 2025-12-05T09:07:31 1764925651

Would be good to look into those particular statistics, then. Seems like the category could include all sorts of stuff:

> This indicates that users turn to open models primarily for creative interactive dialogues (such as storytelling, character roleplay, and gaming scenarios) and for coding-related tasks. The dominance of roleplay (hovering at more than 50% of all OSS tokens) underscores a use case where open models have an edge: they can be utilized for creativity and are often less constrained by content filters, making them attractive for fantasy or entertainment applications. Roleplay tasks require flexible responses, context retention, and emotional nuance - attributes that open models can deliver effectively without being heavily restricted by commercial safety or moderation layers. This makes them particularly appealing for communities experimenting with character-driven experiences, fan fiction, interactive games, and simulation environments.

I could imagine something like D&D or other types of narrative adventures on demand with a machine that never tires of exploring subplots or rewriting sections to be a bit different is a pretty cool thing to have. Either that, or writing fiction, albeit hopefully not entire slop books that are sold, but something to draw inspiration from and do a back and forth.

In regards to NSFW stuff, a while back people were clowning on OpenAI for suggesting that they'd provide adult writing content to adults, but it might as well be a bunch of money that's otherwise left on the table. Note: I'm all for personal freedom, though one also has to wonder about the longer term impact of those "AI girlfriend/boyfriend" trends, you sometimes see people making videos about those subreddits. Oh well, not my place to judge.

Edit: oh hey, there is more data there after all

> Among the highest-volume categories, roleplay stands out for its consistency and specialization. Nearly 60% of roleplay tokens fall under Games/Roleplaying Games, suggesting that users treat LLMs less as casual chatbots and more as structured roleplaying or character engines. This is further reinforced by the presence of Writers Resources (15.6%) and Adult content (15.4%), pointing to a blend of interactive fiction, scenario generation, and personal fantasy. Contrary to assumptions that roleplay is mostly informal dialogue, the data show a well-defined and replicable genre-based use case.

sysguest · 2025-12-05T05:56:42 1764914202

"act as a senior xxx expert with over 15 years of industry experience"

... I just don't get why LLMs are affected by this kind of nonsense -- is it due to training rewards?

mondojesus · 2025-12-05T06:46:22 1764917182

The way I think about it, the training data (i.e. the internet) has X% of people asking something like "explain it to me like I'm five years old" and Y% of people framing it like "I'm technical, explain this to me in detail". You use the "act as a senior XXX" when you want to bias the output towards something more detailed.

est · 2025-12-05T07:19:55 1764919195

imagine how to prepare for an interview? Act confident lol