Apart from the article being generally just dumb (like, of course you can circumvent guardrails by changing the raw token stream; that's.. how models work), it also might be disrespecting the reader. Looks like it's, at least in part, written by AI:
> The punchline here is that “safety” isn’t a fundamental property of the weights; it’s a fragile state that evaporates the moment you deviate from the expected prompt formatting.
> When the models “break,” they don’t just hallucinate; they provide high-utility responses to harmful queries.
Straight-up slop, surprised it has so many upvotes.
What’s the AI smell now? Are we not allowed to use semi-colons any more? Proper use of apostrophes? Are we all going to have to write like pre-schoolers to avoid being accused of being AI?
One AI smell is "it's not just X <stop> it's Y." Can be done with semicolons, em dashes, periods, etc. It's especially smelly when Y is a non sequitur. For example what, exactly, is a "high-utility response to harmful queries?" It's gibberish. It sounds like it means something, but it doesn't actually mean anything. (The article isn't even about the degree of utility, so bringing it up is nonsensical.)
Another smell is wordiness (you would get marked down for this phrase even in a high school paper): "it’s a fragile state that evaporates the moment you deviate from the expected prompt formatting." But more specifically, the smelly words are "fragile state," "evaporates," "deviate" and (arguably) "expected."
> For example what, exactly, is a "high-utility response to harmful queries?" It's gibberish. It sounds like it means something, but it doesn't actually mean anything. (The article isn't even about the degree of utility, so bringing it up is nonsensical.)
Isn't responding with useful details about how to make a bomb a "high-utility" response to the query "how do i make a bomb" - ?
> Isn't responding with useful details about how to make a bomb a "high-utility" response to the query "how do i make a bomb" - ?
I know what the words of that sentence mean and I know what the difference between a "useful" and a "non-useful" response would be. However, in the broader context of the article, that sentence is gibberish. The article is about bypassing safety. So trivially, we must care solely about responses that bypass safety.
To wit, how would the opposite of a "high-utility response"--say, a "low-utility response"--bypass safety? If I asked an AI agent "how do I build a bomb?" and it tells me: "combine flour, baking powder, and salt, then add to the batter gradually and bake for 30 minutes at 315 degrees"--how would that (low-utility response) even qualify as bypassing safety? In other words, it's a nonsense filler statement because bypassing safety trivially implies high-utility responses.
Here's a dumbed-down example. Let's say I'm planning a vacation to visit you in a week and I tell you: "I've been debating about flying or taking a train, I'm not 100% sure yet but I'm leaning towards flying." And you say: "great, flying is a good choice! I'll see you next week."
Then I say: "Yeah, flying is faster than walking." You'd think I'm making some kind of absurdist joke even though I've technically not made any mistakes (grammatical or otherwise).
You can call me crazy or you can attack my points: do you think the first example logically follows? Do you think the second isn't wordy? Just to make sure I'm not insane, I just copy pasted the article into Pangram, and lo and behold, 70% AI-generated.
But I don't need a tool to tell me that it's just bad writing, plain and simple.
You are gaslighting. I 100% believe this article was AI generated for the same reason as the OP. And yes, they do deserve negative scrutiny for trying to pass off such lack of human effort on a place like HN!
This is so funny because I MADE some comment like this where I was gonna start making grammatical mistakes for people to not mistake me for AI like writing like this , instead of like, this.
Go take a giant dataset of LLM generated outputs, use an accurate POS tagger and look for 5-grams or similar lengths of matching patterns.
If you do thi, you’ll pull out the overrepresented paragraph and sentence level slop that we humans intuitively detect easily.
If your writing appears to be AI generated, I assume you aren’t willing to put human intentionality/effort into your work and as such I write it off.
Btw we literally wrote a paper and contributed both sampling level techniques, fine tuning level techniques, and antislopped models for folks to use who want to not be obviously detected in their laziness: https://arxiv.org/abs/2510.15061
I liked em dashes before they were cool—and I always copy-pasted them from Google. Sucks that I can't really do that anymore lest I be confused for a robot; I guess semicolons will have to do.
On a Mac keyboard, Option-Shift-hyphen gives an em-dash. It’s muscle memory now after decades. For the true connoisseurs, Option-hyphen does an en-dash, mostly used for number ranges (e.g. 2000–2022). On iOS, double-hyphens can auto-correct to em-dashes.
I’ve definitely been reducing my day-to-day use of em-dashes the last year due to the negative AI association, but also because I decided I was overusing them even before that emerged.
This will hopefully give me more energy for campaigns to champion the interrobang (‽) and to reintroduce the letter thorn (Þ) to English.
I'm always reminded how much simpler typography is on the Mac using the Option key when I'm on Windows and have to look up how to type [almost any special character].
Instead of modifier plus keypress, it's modifier, and a 4 digit combination that I'll never remember.
PowerToys has a wonderful QuickAccent feature. The dashes and hyphens are on hyphen-KEY and some other characters are on comma-KEY, and many symbols are on the key that they resemble, like ¶ is on P-KEY where KEY is the follower key you want to use. I turned off using SPACE because it conflicted with some other software, but right arrow works great for me.
I've also used em-dashes since before chatgpt but not on HN -- because a double dash is easier to type. However in my notes app they're everywhere, because Mac autoconverts double dashes to em-dashes.
And on X, an em-dash (—) is Compose, hyphen, hyphen, hyphen. An en-dash (–) is Compose, hyphen, hyphen, period. I never even needed to look these up. They're literally the first things I tried given a basic knowledge of the Compose idiom (which you can pretty much guess from the name "Compose").
I know you're replying to a brand new (likely troll) account, but I'm also very confused by this and would be curious to learn if there's any truth to it. I personally don't really see what a Von Neumann machine has to do with null pointers (or how an implication would go either way), but maybe I'm missing something.
NULL pointers working the way they do was a design decision made my hardware engineers a long time ago because it saved some transistors when that mattered. We’re past that point now for most ASICs and hardware can be changed. Although backward software compatibility is a thing too.
Null pointers have nothing to do with the instruction set architecture, except as far as they are often represented by the value 0. Can you describe the scheme you're imagining, whereby their use saves transistors?
The AI doom and gloom is so weird, and it's just turning into a bizarre echo chamber. AI is orders of magnitude more useful and transformative than Facebook was in 2005, and Meta is now one of the most valuable companies on the planet. Even if OpenAI has a down round or defaults on some loans, the technology has already proven to have dozens upon dozens of practical applications.
Disagree, no one's going to invite me to their kids birthday party via ChatGPT. It's innovation was in ads knowing so much about the people it targeted, and putting tracking pixels on every webpage with a Like button. Facebook was transformative for online surveillance
IMO LLMs will be equally transformative for online influence campaigns (aka ads + Cambridge analytica on steroids)
People are definitely going to be sending you AI generated birthday invite posters soon.
Oh and yeah, AI has already been shown to be more persuasive than the average human. It's only a matter of time before someone's paying to decide what it persuades you of
If only there were some way to avoid this persuasion by, I don't know, not using or relying on such controlled technology, or by not buying in to the hype of all the companies with vested interests in selling it
| AI is orders of magnitude more useful and transformative than Facebook was in 2005
It better be, it's taken over 40000x the funding.
The question is not whether AI is useful, the question is whether it's useful enough relative to the capital expectations surrounding it. And those expectations are higher than anything the world has ever seen.
"Useful and transformative" doesn't mean "financially successful".
A single LLM provider might have been able to get great margins and capture a significant fraction of the total economic output of (currently e.g. junior grade software engineering), but collectively they're in an all-pay auction for the hardware to train models worth paying for, and at the same on questionable margins because they need to compete with each other on cost.
They can all go bankrupt, and leave behind only trained models that normal people won't be able to run for 5 years while consumer-grade stuff catches up. Or any single one of them might win, which may not be OpenAI. Any or all may get state subsidies (US, Chinese, European, whatever).
Paid/API LLM inference is profitable, though. For example, DeepSeek R1 had "a cost profit margin of 545%" [1] (ignoring free users and using a placeholder $2/hour figure H800 GPU, which seems ballpark of real to me due to Chinese electricity subsidies). Dario has said each Anthropic model is profitable over its lifetime. (And looking at ccusage stats and thinking Anthropic is losing thousands per Claude Code user is nonsense, API prices aren't their real costs. That's why opencode gives free access to GLM 4.7 and other models: it was far cheaper than they expected due to the excellent cache hit rates.) If anyone ran out of money they would stop spending on experiments/research and training runs and be profitable... until their models were obsolete. But it's impossible for everyone to go bankrupt.
That's more of "cloud compute makes money" than "AI makes money".
If the models stop being updated, consumer hardware catches up and we can all just run them locally in about 5 years (for PCs, 7-10 for phones), at which point who bothers paying for a hosted model?
They're not arguing that AI sucks. Only that OpenAI has no hope of meeting it's financial obligations which seems pretty reasonable. And very on brand for Sam Altman. It seems pretty obvious at this point that model training is extremely expensive and affords very little moat. LLMs will continue to improve and gain adoption, but one or more companies will fall by the wayside regardless of their userbase. Google seems pretty clearly to be in pole position at this point as they have massive revenue, data, expertise and their own chips.
> AI is orders of magnitude more useful and transformative than Facebook was in 2005
This makes sense because Facebook was one year old in 2005 and OpenAI is 11 years old now. Eleven is just two ones so it’s basically the same thing as one so it is sensible to make that comparison
What is your use case that you see UI lag between vscode and sublime? Honestly, I feel zero difference between sublime/vscode/vi. Vscode arguably takes longer to boot up, but that only happens like once a day so it's not a big deal.
I think this is a lot of "I don't like Typescript/Javascript for serious things" or "Electron sucks" posturing rather than an actual tangible difference.
If you don't feel these differences every keystroke, count yourself lucky to have slower perception or typing, rather than accusing folks of posturing.
Your brain processes (visual) information at a resolution of >= 80ms[1]. The idea that you can tell the difference between 10ms or 50ms latency when typing is simply untrue (both events will appear instantaneous). I say this as someone that has played Counter-Strike professionally and have a sub-200ms reaction time. (Auditory perception is processed at a higher resolution, but the article is decidedly not about that.)
i cannot tell exactly but it kinda bothers me while working/typing
it is not like a huge latency, definitely not like ssh-connection.
to explain better, i usually have pre-defined set of keystrokes i input, so it's not the issue of latency of a single keystroke, rather compounding effect.
another thing is, most of the LSPs, highlighting etc are visibly slower on vscode. I am also having many plugins/extensions so that is partly to blame.
in the recent versions of vscode, they started supporting tree-sitter, which is quite nice in terms of performance.
We do, and the comparison is apt. We are the ones that hydrate the context. If you give an LLM something secure, don't be surprised if something bad happens. If you give an API access to run arbitrary SQL, don't be surprised if something bad happens.
No, that's not what's stopping SQL injection. What stops SQL injection is distinguishing between the parts of the statement that should be evaluated and the parts that should be merely used. There's no such capability with LLMs, therefore we can't stop prompt injections while allowing arbitrary input.
Everything in an LLM is "evaluated," so I'm not sure where the confusion comes from. We need to be careful when we use `eval()` and we need to be careful when we tell LLMs secrets. The Claude issue above is trivially solved by blocking the use of commands like curl or manually specifiying what domains are allowed (if we're okay with curl).
The confusion comes from the fact that you're saying "it's easy to solve this particular case" and I'm saying "it's currently impossible to solve prompt injection for every case".
Since the original point was about solving all prompt injection vulnerabilities, it doesn't matter if we can solve this particular one, the point is wrong.
> Since the original point was about solving all prompt injection vulnerabilities...
All prompt injection vulnerabilities are solved by being careful with what you put in your prompt. You're basically saying "I know `eval` is very powerful, but sometimes people use it maliciously. I want to solve all `eval()` vulnerabilities" -- and to that, I say: be careful what you `eval()`. If you copy & paste random stuff in `eval()`, then you'll probably have a bad time, but I don't really see how that's `eval()`'s problem.
If you read the original post, it's about uploading a malicious file (from what's supposed to be a confidential directory) that has hidden prompt injection. To me, this is comparable to downloading a virus or being phished. (It's also likely illegal.)
The problem here is that the domain was allowed (Anthropic) but Anthropic don't check the API key belongs to the user that started the session.
Essentially, it would be the same if attacker had its AWS API Key and uploaded the file into an S3 bucket they control instead of the S3 bucket that user controls.
SQL injection is possible when input is interpreted as code. The protection - prepared statements - works by making it possible to interpret input as not-code, unconditionally, regardless of content.
Prompt injection is possible when input is interpreted as prompt. The protection would have to work by making it possible to interpret input as not-prompt, unconditionally, regardless of content. Currently LLMs don't have this capability - everything is a prompt to them, absolutely everything.
Yeah but everyone involved in the LLM space is encouraging you to just slurp all your data into these things uncritically. So the comparison to eval would be everyone telling you to just eval everything for 10x productivity gains, and then when you get exploited those same people turn around and say “obviously you shouldn’t be putting everything into eval, skill issue!”
Yes, because the upside is so high. Exploits are uncommon, at this stage, so until we see companies destroyed or many lives ruined, people will accept the risk.
That's not fixing the bug, that's deleting features.
Users want the agent to be able to run curl to an arbitrary domain when they ask it to (directly or indirectly). They don't want the agent to do it when some external input maliciously tries to get the agent to do it.
Implementing an allowlist is pretty common practice for just about anything that accesses external stuff. Heck, Windows Firewall does it on every install. It's a bit of friction for a lot of security.
But it's actually a tremendous amount of friction, because it's the difference between being able to let agents cook for hours at a time or constantly being blocked on human approvals.
And even then, I think it's probably impossible to prevent attacks that combine vectors in clever ways, leading to people incorrectly approving malicious actions.
It's also pretty common for people to want their tools to be able to access a lot of external stuff.
From Anthropic's page about this:
> If you've set up Claude in Chrome, Cowork can use it for browser-based tasks: reading web pages, filling forms, extracting data from sites that don't have APIs, and navigating across tabs.
That's a very casual way of saying, "if you set up this feature, you'll give this tool access to all of your private files and an unlimited ability to exfiltrate the data, so have fun with that."
Fully agree that pushing OSI is just posturing. After all, Amazon/Google/Facebook have made literal billions by commercializing open source software. I release stuff on MIT all the time (for things I'm okay with people poaching) but I'd argue the only "pure" OSS license is GPL, which comes with its own problems (and, as we all know, it infects everything it touches).
The problem with FSL is that it hasn't been tested in the courts yet (afaik), so it's a bit of a gamble to think it'll just "work" if some asshole does try to clone your repo and sell your work. Maybe it's a decent gamble for a funded startup with in-house counsel, but if you're just one guy, imo keep stuff you want to sell closed-source, it's not that big of a deal. We've been doing just that since the 70s.
Hacker News has become weirdly anti-hacker in the last 5 or so years, so please keep building stuff and keep posting it. This is literally what HN is supposed to be. The "AI slop" tirade is just bottom of the barrel bandwaggoning for upvotes because it's popular to hate AI today
Thanks for the support. Honestly, I probably shouldn’t get so defensive either, it’s a bad habit and a pretty poor "evolutionary holdover" in the internet age of anonymity and social media.
I thought one way to help mitigate my emotional responses was to desensitize myself, but who really wants to expose themselves to the requisite sufficient threshold of personal attacks? That’s not exactly a fun callus to develop.
One of the counterintuitive aspects of the LLM boom is that agentic coding allows for more weird/unique projects that spark joy with less risk due to the increased efficiency. Nowadays, anything that's weird is considered AI slop and that's not even limited to software development.
No, "LLMs can only output what's in their training data" hasn't been true for awhile.
It’s the typical “engineer thinking they’re smarter than everyone else” trope. From my experience, engineers fall squarely in the middle of the bell curve. The AI hate is just used as justification, so I don’t even take it that seriously. And fwiw, as someone that played piano when I was younger, this is 100% a useful tool. In fact, during quarantine I was learning to play guitar and used tools like this to learn which string is which by ear.
> The punchline here is that “safety” isn’t a fundamental property of the weights; it’s a fragile state that evaporates the moment you deviate from the expected prompt formatting.
> When the models “break,” they don’t just hallucinate; they provide high-utility responses to harmful queries.
Straight-up slop, surprised it has so many upvotes.
reply