My default mental model is that a permissive toolset can be fine if the sandbox is strong, since the worst failures should still be contained to the sandbox. I agree the tricky part is when the harness crosses the boundary and mutates external state, like making API calls or touching production resources.
In those cases, I try to make the tool interface restrictive by design, since it’s hard to know the right guardrails in advance. The goal is to eliminate entire classes of failure by making certain actions impossible, not just discouraged.
What were the actual failure modes you saw at GTWY.ai that motivated the step-gating approach?
Good framing. The failures that pushed us toward this weren’t dramatic hacks or sandbox escapes, they were boring, quiet mistakes that only showed up over time.
While building GTWY.ai, we saw agents slowly accumulate authority across steps. A step would start as read-only, but context and assumptions leaked forward, and a later reasoning turn would act a little broader than intended. Nothing malicious, just confident extrapolation.
What helped wasn’t better prompts or heavier isolation. It was forcing each step to explicitly declare its inputs, tools, and outputs, then tearing all of that down before the next step ran. Once an agent couldn’t carry permissions or assumptions forward, whole classes of bugs disappeared. The system became less clever, but far more predictable.
If this matches what you’re seeing, GTWY has a demo session that walks through the step-gating model end to end. Watching the permissions appear and disappear per step cleared up a lot of things for me. Worth a look if you’re curious.
In those cases, I try to make the tool interface restrictive by design, since it’s hard to know the right guardrails in advance. The goal is to eliminate entire classes of failure by making certain actions impossible, not just discouraged.
What were the actual failure modes you saw at GTWY.ai that motivated the step-gating approach?
reply