Hacker Newsnew | past | comments | ask | show | jobs | submit | valleyer's commentslogin

The first sentence of the article should make that clear. In any case, it's a pretty well-established abbreviation in US policy discussions.

> If you look at the security measures in other coding agents, they're mostly security theater. As soon as your agent can write code and run code, it's pretty much game over.

At least for Codex, the agent runs commands inside an OS-provided sandbox (Seatbelt on macOS, and other stuff on other platforms). It does not end up "making the agent mostly useless".


Approval should be mandatory for any non-read tool call. You should read everything your LLM intends to do, and approve it manually.

"But that is annoying and will slow me down!" Yes, and so will recovering from disastrous tool calls.


You’ll just end up approving things blindly, because 95% of what you’ll read will seem obviously right and only 5% will look wrong. I would prefer to let the agent do whatever they want for 15 minutes and then look at the result rather than having to approve every single command it does.

Works until it has access to write to external systems and your agent is slopping up Linear or GitHub without you knowing, identified as you.

Sure; I mean this is what I _would like_; I’m not saying this would work 100% of the time.

> I would prefer to let the agent do whatever they want

Lol, good luck to you!


That kind of blanket demand doesn't persuade anyone and doesn't solve any problem.

Even if you get people to sit and press a button every time the agent wants to do anything, you're not getting the actual alertness and rigor that would prevent disasters. You're getting a bored, inattentive person who could be doing something more valuable than micromanaging Claude.

Managing capabilities for agents is an interesting problem. Working on that seems more fun and valuable than sitting around pressing "OK" whenever the clanker wants to take actions that are harmless in a vast majority of cases.


I don't mean to sound like I'm demanding this. I'm saying you will get better outcomes if you choose to do this as a developer.

You're right it's an interesting problem that seems fun to work on. Hopefully we'll get better harnesses. For now I'm checking everything.


It’s not just annoying; at scale it makes using the agent clis impossible. You can tell someone spends a lot of time in Claude Code: they can type —dangerously-skip-permissions with their eyes closed.

Yep. The agent CLIs have the wrong level of abstraction. Needs more human in the loop.

This is like having a firewall on your desktop where you manually approve each and every connection.

Secure, yes? Annoying, also yes. Very error-prone too.


It's not reliable. The AI can just not prompt you to approve, or hide things, etc. AI models are crafty little fuckers and they like to lie to you and find secret ways to do things with alterior motives. This isn't even a prompt injection thing, it's an emergent property of the model. So you must use an environment where everything can blow up and it's fine.

The harness runs the tool call for the LLM. It is trivial to not run the tool call without approval, and many existing tools do this.

My codex just uses python to write files around the sandbox when I ask it to patch a sdk outside its path.

It's definitely not a sandbox if you can just "use python to write files" outside of it o_O

Hence the article’s security theatre remark.

I’m not sure why everyone seems to have forgotten about Unix permissions, proper sandboxing, jails, VMs etc when building agents.

Even just running the agent as a different user with minimal permissions and jailed into its home directory would be simple and easy enough.


I'm just guessing, but seems the people who write these agent CLIs haven't found a good heuristic for allowing/disallowing/asking the user about permissions for commands, so instead of trying to sit down and actually figure it out, someone had the bright idea to let the LLM also manage that allowing/disallowing themselves. How that ever made sense, will probably forever be lost on me.

`chroot` is literally the first thing I used when I first installed a local agent, by intuition (later moved on to a container-wrapper), and now I'm reading about people who are giving these agents direct access to reply to their emails and more.


> I'm just guessing, but seems the people who write these agent CLIs haven't found a good heuristic for allowing/disallowing/asking the user about permissions for commands, so instead of trying to sit down and actually figure it out, someone had the bright idea to let the LLM also manage that allowing/disallowing themselves. How that ever made sense, will probably forever be lost on me.

I don't think there is such a good heuristic. The user wants the agent to do the right thing and not to do the wrong thing, but the capabilities needed are identical.

> `chroot` is literally the first thing I used when I first installed a local agent, by intuition (later moved on to a container-wrapper), and now I'm reading about people who are giving these agents direct access to reply to their emails and more.

That's a good, safe, and sane default for project-focused agent use, but it seems like those playing it risky are using agents for general-purpose assistance and automation. The access required to do so chafes against strict sandboxing.


Here's OpenAI's docs page on how they sandbox Codex: https://developers.openai.com/codex/security/

Here's the macOS kernel-enforced sandbox profile that gets applied to processes spawned by the LLM: https://github.com/openai/codex/blob/main/codex-rs/core/src/...

I think skepticism is healthy here, but there's no need to just guess.


That still doesn't seem ideal. Run the LLM itself in a kernel-enforced sandbox, lest it find ways to exploit vulnerabilities in its own code.

The LLM inference itself doesn't "run code" per se (it's just doing tensor math), and besides, it runs on OpenAI's servers, not your machine.

There still needs to be a harness running on your local machine to spawn the processes in their sandboxes. I consider that "part of the LLM" even if it isn't doing any inference.

If that part were running sandboxed, then it would be impossible for it to contact the OpenAI servers (to get the LLM's responses), or to spawn an unsandboxed process (for situations where the LLM requests it from the user).

That's obviously not true. You can do anything you want with a sandbox. Open a socket to the OpenAI servers and then pass that off to the sandbox and let the sandboxed process communicate over that socket. Now it can talk to OpenAI's servers but it can't open connections to any other servers or do anything else.

The startup process which sets up the original socket would have to be privileged, of course, but only for the purpose of setting up the initial connection. The running LLM harness process would not have any ability to break out of the sandbox after that.

As for spawning unsandboxed processes, that would require a much more sophisticated system whereby the harness uses an API to request permission from the user to spawn the process. We already have APIs like this for requesting extra permissions from users on Android and iOS, so it's not in-principle impossible either.

In practice I think such requests would be a security nightmare and best avoided, since essentially it would be like a prisoner asking the guard to let him out of jail and the guard just handing the prisoner the keys. That unsandboxed process could do literally anything it has permissions to do as a non-sandboxed user.


You are essentially describing the system that Codex (and, I presume, Claude Code et al.) already implements.

The devil is in the details. How much of the code running on my machine is confined to the sandbox vs how much is used in the boostrap phase? I haven't looked but I would hope it can survive some security audits.

If I'm following this it means you need to audit all code that the llm writes though as anything you run from another terminal window will be run as you with full permissions.

The thing is that on macOS at least, Codex does have the ability use an actual sandbox that I believe prevents certain write operations and network access.

Is it asking you permission to run that python command? If so, then that's expected: commands that you approve get to run without the sandbox.

The point is that Codex can (by default) run commands on its own, without approval (e.g., running `make` on the project it's working on), but they're subject to the imposed OS sandbox.

This is controlled by the `--sandbox` and `--ask-for-approval` arguments to `codex`.


You really shouldn’t be running agents outside of a container. That’s 101.

Bit more general; don't run agents without some sort of restriction to what they can do provided by the OS in some way. Containers is one way, VMs another, most cases it's enough with just a chroot and using the unix permission system the rest of your system already uses.

What happens if I do?

What's the difference between resetting a container or resetting a VPS?

On local machine I have it under its own user, so I can access its files but it cannot access mine. But I'm not a security expert, so I'd love to hear if that's actually solid.

On my $3 VPS, it has root, because that's the whole point (it's my sysadmin). If it blows it up, I wanna say "I'm down $3", but it doesn't even seem to be that since I can just restore it from an backup.


I'm trying to understand this workflow. I have just started using codex. Literally 2 days in. I have it hooked up to my githbub repo and it just runs in the cloud and creates a pr. I have it touching only UI and middle layer code. No db changes, I always tell it to not touch the models.

Does Codex randomly decide to disable the sandbox like Claude Code does?

I'm thinking that was just a typeface (looks like Profil) that was available to their printer. It also shows up in the word WARRANTY in the back of the manual:

https://s3data.computerhistory.org/brochures/apple.applei.19...


Ah, interesting. I'm imagining that the printer asked them, 'Where's your logo, we'll put it here in this space,' and they sent over the giant woodcut thing and the printer says, "You know what, nevermind, I don't think we need it here after all."

Doesn't `git merge -s ours` do this?

    This resolves any number of heads, but the resulting tree of the merge is always
    that of the current branch head, effectively ignoring all changes from all other
    branches. It is meant to be used to supersede old development history of side
    branches. Note that this is different from the -Xours option to the ort merge strategy.


It drives me nuts that local governments in the US continue to use Twitter/X to disseminate communications, despite having perfectly good web sites of their own.


Those websites aren't easy to update. I have a website of my own too, and even though I've set it up to be as painless as possible, it's always going to be easier for me to open a social media app and post.

Now imagine that the local government has a website that can only be changed by contacting a web developer, who takes 1-2 business days to reply. It might not be as bad as that, but I wouldn't be surprised if that's the ballpark.


Most content websites that are managed by a organisation such as a council/government or are usually driven by some CMS software. Updates are usually done by a content/social media team. These people are also posting the updates to twitter.

It isn't the late 90s/2000s anymore where people are uploading HTML files over FTP.


Every city and town has a website with information on services and paying taxes. They usually use a third party payment system in my experience, but the main site is theirs and they still use shitter and bookface.


If our governments can't update an HTML page same way they update a twitter status then we are all doomed and should just nuke ourselves to get it over with.


As someone who witnessed a lot of the quality decline at Apple from the inside, hiring more people is decidedly not the answer. All that does is encourages management to engage in more churn, which is the source of these sorts of bugs.

The answer, unfortunately, is that features need to be sunsetted/removed, the engineering org shrunk, and for a smaller group to concentrate on a reliable product core.


It's not about the sheer numer of people, it's about their quality as managers, engineers, art directors and designers. Hire the best, pay them accordingly. Things can thrive without austerity, given enough good resources.


I disagree. A lot of the problems come down to part A of the system not coordinating well with part B. Take the brouhaha about the Tahoe window corners: obviously there wasn't enough communication between folks designing the window frame art and the folks implementing the window resize logic.

You can hire the best, but coordination among a group of people scales quadratically.


No, this bill would be subject to the filibuster (since it's not a reconciliation bill under the Budget Act), so it's not a simple majority.


The Apple AirPort Extreme didn't by default until recently: https://support.apple.com/en-nz/103996


More like Extreme-ly bad router.


Big ships are hard to turn, even in the backwards direction.


Renters can also enroll their kids in public schools. And in terms of mobility, renters might be stuck in a one- or two-year lease, far longer than it might take to sell a house.

Maybe those transient homeowners are the ones who shouldn't get to vote...


I think you're kind of (completely) missing my point. Who signs two year leases at a motel?

Obviously someone with a kid enrolled in school and locked into a long-term lease is not transient and has a comparable amount of skin in the game as a homeowner.


I must admit I somehow missed "at the motel" part of your post. I'm sorry!

(I still disagree with your broader point, since I don't think you can meaningfully draw a "has skin in the game" line.)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: