More

syspec · 2025-12-30T16:06:46 1767110806

Yes! Completely agree! So hard to debug. Nearly impossible to set breakpoints (they disappear in refresh).

syspec · 2025-12-30T16:03:33 1767110613

Sometimes the AI is all too good at writing tests.

I agree with the idea, I do it too, but you need to make sure the test don't just validate the incorrect behavior or that the code is not updated to pass the test in a way that actually "misses the point".

I've had this happen to me on one or two tests every time

aisisiiaai · 2025-12-30T16:12:42 1767111162

Even more important, those tests need to be useful. Often unit tests are simply testing the code works as written which is generally doing more harm than good.

To give some further advice to juniors: if somebody is telling you writing unit tests is boring, they haven’t learned how to write good tests. There appears to be a large intersection between devs who think testing is a dull task and devs who see a self proclaimed speed up from AI. I don’t think this is a coincidence.

Writing useful tests is just as important as writing app code, and should be reviewed with equal scrutiny.

mapontosevenths · 2025-12-30T16:09:46 1767110986

I agree 100%.

For some reason Gemini seems to be worse at it than Claude lately. Since mostly moving to 3 I've had it go back and change the tests rather than fixing the bug on what seems to be a regular basis. It's like it's gotten smart enough to "cheat" more. You really do still have to pay attention that the tests are valid.

spwa4 · 2025-12-31T13:33:08 1767187988

Yep. It's incredibly annoying that obviously these AI companies are turning the "IQ knob" on these models up and down without warning or recourse. First OpenAI, then Anthropic and now Google. I'm guessing it's a cost optimization. OpenAI even said that part out loud.

Of course, for customers it is just one more reason you need to be looking at every AI outputs. Just because they did something perfect yesterday doesn't mean they won't totally screw up the exact same thing today. Or you could say it's one more advantage of local models: you control the knobs.

syspec · 2025-12-30T05:11:33 1767071493

Don't worry, the 99% reduction in battery materials is just a strategic pivot to an 'asset-light' approach. The 4680 supply chain isn't collapsing, it’s just being 'optimized' for a future where cars apparently don't need batteries—just FSD subscriptions and robotaxis that run on optimism.

syspec · 2025-12-30T02:29:26 1767061766

I used to do this with Karma test runner. The best part was how it didn't try to capture everything, so debugging with breakpoints was really easy.

I like Vitest browser mode, but it's a pain to just "detach" for a specific frame and run that test in isolation, with my actual breakpoints.

syspec · 2025-12-29T22:39:53 1767047993

> This announcement is more than just a headline—it's validation of our pioneering work with General AI Agents.

Anyone else thought this was satire when they read that as the second line in the announcement?

I literally laughed, then clicked the top left logo, to check out the homepage and see if this `ManuAI` was a real website.

---

You would think that they would know better to at least edit that out.

It's not just ironic -- it's cosmically poetic.

sigmar · 2025-12-29T23:07:43 1767049663

Perhaps "our PR team is a prompt" is what they mean to convey? Or "let's make this obviously AI so more people comment pointing that out" is their social media strategy?

yanslookup · 2025-12-29T22:46:37 1767048397

I don't get it.

They are saying the announcement means more to them than just a headline that most will scroll past. Maybe you are seeing something I'm not.

sokka_h2otribe · 2025-12-29T22:48:03 1767048483

Op is saying it sounds like it was written like an LLM

azangru · 2025-12-30T00:11:36 1767053496

I don't get it either.

Since LLMs emulate human writing, what is it about that sentence that gives away that it was written by an LLM rather than human? Haven't we seen plenty of hollow-sounding self-aggrandizing marketing copies like this one pre-LLMs? What is it that is wrong with this sentence?

Please don't say it's an em-dash...

nikoomilana · 2025-12-30T16:08:19 1767110899

Give this a read: https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing#...

I linked "Negative Parallelisms" because it's relevant here, but the article in general covers a lot of AI writing styles

mcintyre1994 · 2025-12-30T00:18:45 1767053925

It’s a sentence structure that LLMs over-use: “this isn’t just X, it’s Y”.

shimman · 2025-12-30T00:20:02 1767054002

It sounds like corporate meaningless drivel. Everyone is dogging on it because it's no different than when startups of yore would say "making the world a better place." As if the meaningless platitude was some incantation you had to whisper or the funding wouldn't close.

lobito25 · 2025-12-30T00:25:47 1767054347

it's always the em-dash

yanslookup · 2025-12-29T22:49:08 1767048548

ok... it's an AI company, It'd be odd if it weren't written by AI, no?

bentcorner · 2025-12-29T23:09:04 1767049744

I'm all for dogfooding but if you work for a bicycle company it shouldn't mean you can't drive to work. The right tool for the right job.

rbtprograms · 2025-12-29T22:52:14 1767048734

Would that be odd? AI companies are still staffed by people, and large announcements like acquihires certainly feel like they could use a slightly more human touch if they truly mean a lot to the company.

conception · 2025-12-29T22:51:05 1767048665

It is odd that they didn’t care or have the wherewithal to make it not sound obviously like an LLM wrote it.

polynomial · 2025-12-29T23:21:06 1767050466

Still getting paid either way.

Aerolfos · 2025-12-29T23:25:49 1767050749

Eh if anyone is all in on AI and it replacing human writing it would be an AI company

But then that means if you're a PR or communications person working at this startup (or at Meta?) your job is not secure and that your days there are probably numbered, which I'm sure is great for morale...

aratahikaru5 · 2025-12-30T01:49:11 1767059351

Some context:

> Fun fact: Manus is currently SOTA on the Remote Labor Index (RLI) benchmark that @scale_AI and @ai_risks released earlier this year.

> https://remotelabor.ai

Source: https://x.com/alexandr_wang/status/2005766469771223106

If you've been following Manus and their work on context engineering, or have used the product, that line doesn't come off as satire IMO.

Anon1096 · 2025-12-30T00:04:45 1767053085

To anyone who isn't deep in the AI hype space it reads like satire to include such an obvious AI tell but I think it's a positive in the eyes of the AI hype world. It's like how anyone not a lizard is repulsed by LinkedIn speak and yet it dominates the platform.

bwfan123 · 2025-12-30T17:10:07 1767114607

I saw this in a past hype cycle. What happens is that it becomes a "performative" art in an echo-chamber for startups, startup founders, VCs. Performative meaning doing things one thinks others want to see rather than when it makes sense.

Management is quizzing their tech teams on injecting agents into their workflows whatever the f that means. Some of these big companies will acquire startups in the space so they are not left behind on the hype-train. So, they can claim to have agentic talent on their teams.

Those of us who have seen this movie play out know the ending.

syspec · 2025-12-19T18:14:52 1766168092

Prompt injection?

syspec · 2025-12-09T14:59:31 1765292371

Alls they need to do is make extensions much much easier to build, especially extensions that render HTML.

That's vscode's moat.

Anytime the same extension exist in both vscode and jetbrains, the jetbrains version is clunky, crash, and unstable.

I keep Jetbrains open while using vscode, for its local history/git/etc features, but how long will that be enough to keep my subscription

mns · 2025-12-09T15:45:42 1765295142

To be honest, I'm a bit annoyed that I installed maybe 2-3 extensions, and in the last year or so whenever I open one of their IDEs I need to update anywhere from 10 to 25 extensions. What are these things? Where did they come from and why do I have them, I used to see only the extensions that I actually installed, and now there's all kind of stuff that I thought was basic functionality.

Aaron2222 · 2025-12-10T00:58:22 1765328302

A lot of core functionality is implemented as bundled plugins (they ship with the IDE, but can receive updates separately). They can also be independently disabled (and older versions used to come with only some enabled and ask you which others you want enabled at first launch).

syspec · 2025-12-04T23:46:19 1764891979

According to the report, 52% of all open-source AI is used for *roleplaying*. They attribute it to fewer content filters and higher creativity.

I'm pretty surprised by that, but I guess that also selects for people who would use openrouter

IMTDb · 2025-12-05T01:28:12 1764898092

Or maybe it’s just strange classification. I see a lot of prompts on the internet looking like “act as a senior xxx expert with over 15 years of industry experience and answer the following: [insert simple question]”

I hope those are not classified as “roleplaying” the “roleplay” here is just a trick to get better answer from the model, often in a professional setting that has nothing to do with creative writing of NSFW stuff

CJefferson · 2025-12-05T06:46:51 1764917211

I can't be sure, but this sounds entirely possible to me.

There are many, many people, and websites, dedicated to roleplaying, and those people will often have conversations lasting thousands of messages with different characters. I know a people whose personal 'roleplay AI' budget is a $1,000/month, as they want the best quality AIs.

susanthenerd · 2025-12-05T05:50:09 1764913809

OpenRouter classifies content by the app that's used to interact with the llm. https://openrouter.ai/docs/app-attribution

mike_hearn · 2025-12-05T09:21:44 1764926504

This paper says they classify it by feeding tokens to a Google model.

Windchaser · 2025-12-05T04:38:46 1764909526

I strongly bet that this is it.

KronisLV · 2025-12-05T09:07:31 1764925651

Would be good to look into those particular statistics, then. Seems like the category could include all sorts of stuff:

> This indicates that users turn to open models primarily for creative interactive dialogues (such as storytelling, character roleplay, and gaming scenarios) and for coding-related tasks. The dominance of roleplay (hovering at more than 50% of all OSS tokens) underscores a use case where open models have an edge: they can be utilized for creativity and are often less constrained by content filters, making them attractive for fantasy or entertainment applications. Roleplay tasks require flexible responses, context retention, and emotional nuance - attributes that open models can deliver effectively without being heavily restricted by commercial safety or moderation layers. This makes them particularly appealing for communities experimenting with character-driven experiences, fan fiction, interactive games, and simulation environments.

I could imagine something like D&D or other types of narrative adventures on demand with a machine that never tires of exploring subplots or rewriting sections to be a bit different is a pretty cool thing to have. Either that, or writing fiction, albeit hopefully not entire slop books that are sold, but something to draw inspiration from and do a back and forth.

In regards to NSFW stuff, a while back people were clowning on OpenAI for suggesting that they'd provide adult writing content to adults, but it might as well be a bunch of money that's otherwise left on the table. Note: I'm all for personal freedom, though one also has to wonder about the longer term impact of those "AI girlfriend/boyfriend" trends, you sometimes see people making videos about those subreddits. Oh well, not my place to judge.

Edit: oh hey, there is more data there after all

> Among the highest-volume categories, roleplay stands out for its consistency and specialization. Nearly 60% of roleplay tokens fall under Games/Roleplaying Games, suggesting that users treat LLMs less as casual chatbots and more as structured roleplaying or character engines. This is further reinforced by the presence of Writers Resources (15.6%) and Adult content (15.4%), pointing to a blend of interactive fiction, scenario generation, and personal fantasy. Contrary to assumptions that roleplay is mostly informal dialogue, the data show a well-defined and replicable genre-based use case.

sysguest · 2025-12-05T05:56:42 1764914202

"act as a senior xxx expert with over 15 years of industry experience"

... I just don't get why LLMs are affected by this kind of nonsense -- is it due to training rewards?

mondojesus · 2025-12-05T06:46:22 1764917182

The way I think about it, the training data (i.e. the internet) has X% of people asking something like "explain it to me like I'm five years old" and Y% of people framing it like "I'm technical, explain this to me in detail". You use the "act as a senior XXX" when you want to bias the output towards something more detailed.

est · 2025-12-05T07:19:55 1764919195

imagine how to prepare for an interview? Act confident lol

bakugo · 2025-12-05T00:56:31 1764896191

> I guess that also selects for people who would use openrouter

It definitely does. OpenRouter is pretty popular among roleplayers and creative writers due to having a wide variety of models available, sometimes providing free access to quality models such as DeepSeek, and lacking any sort of rules against generating "adult" content.

djfergus · 2025-12-05T00:12:07 1764893527

Openrouter has an apps tab. If you look at the free, non-coding models, some apps that feature are: janitor.ai, sillytavern, chub.ai. I'd never heard of them but people seem to be burning millions of tokens enjoying them.

raincole · 2025-12-05T00:10:58 1764893458

If you rely on AI to write most of your code (instead of using it like Stackoverflow), Claude Code/OpenAI Codex subscription are cheaper than buying tokens. So those users are not on openrouter.

djfergus · 2025-12-05T00:24:16 1764894256

I'm curious what percentage of claude/codex users this is true for - I assumed their business models rely on this not being true for the majority.

bakugo · 2025-12-05T01:01:49 1764896509

Both Claude Code and Codex steer you towards the monthly subscription. Last time I tried Codex, I remember several aspects of it being straight up broken if used with an API key instead of a subscription account.

The business model is likely built upon the assumption that most people aren't going to max out their limits every day, because if they were, it likely wouldn't be profitable.

UltraSane · 2025-12-05T03:10:06 1764904206

I got $250 free Claude Code credit and I was surprised by how hard it was to actually use it all before it expired.

veunes · 2025-12-05T12:50:03 1764939003

I'm not surprised. Roleplay means endless sessions with huge context (character history, world, previous dialogues). On commercial APIs (OpenAI/Anthropic), that long-context costs a fortune. On OpenRouter, many OSS models, especially via providers like DeepInfra or Fireworks, cost pennies or are even free, like some Free-tier models. The RP community is very price-sensitive, so they massively migrate to cheap OSS models via aggregators. It skews the stats but highlights a real niche for cheap inference

ceroxylon · 2025-12-05T01:42:36 1764898956

That also stuck out for me, I was wondering if it was video games using openrouter for uptime / inference switching, video games would use a lot of tokens generating dialogue for a few programmer's villages.

cess11 · 2025-12-05T07:25:24 1764919524

Sex- and spambots are likely the most common applications of these things.

lm28469 · 2025-12-05T10:12:37 1764929557

I'm not surprised at all. The HN crowd think LLMs are mostly used for engineering because they live in a multi layer bubble. Real people in the real world do all kind of shit with LLMs which aren't productivity or even work related.

syspec · 2025-11-27T17:26:10 1764264370

There are actually a lot of those! One of the best things about using them is that you can swap models around at will.

I love to switch models and ask them what they thought of the previous models answer

syspec · 2025-11-25T14:15:47 1764080147

Thank you Claude.