Stop Treating Agent Sandboxes as Cattle
Utpal Nadiger · May 4, 2026
This article is in direct response to “The agent Harness belongs outside the sandbox” by Andrea Luzzardi. The premise of this article is that, well, you can (and in most cases should) run the agent harness inside the sandbox.
I'll follow it up with 3 specific rebuttals from what is in the blog and what I think is fundamentally flawed in the arguments mentioned there. Lastly, to the author: Mendral looks incredible, more power to you!
Now for the rebuttals!
Rebuttal 1 of 3
Andrea Luzzardi, Mendral
“Running the harness outside the sandbox gets you things the inside model can't. Your credentials stay out of the sandbox. The loop holds the LLM API keys, the user tokens, the database access. The sandbox holds only the environment the agent needs to do its work. There's nothing in there for the agent to escape to, so there's no permission model to enforce and no credential leak to contain.”
We think that this is a solved problem & has been for years.
What you essentially want is credentials never at rest in the sandbox. A network egress proxy gives you that (there are ones that are open source like fly's tokenizer or mitmproxy):
- The sandbox holds a handle (an opaque token, a placeholder, a short lived metadata service response). No real credential material.
- Outbound traffic routes through the proxy. The proxy substitutes the real token at the boundary.
- The upstream service sees a properly authenticated request. The sandbox NEVER sees the real secret.
Credentials
The sandbox never sees the real secret
Sandbox
holds an opaque handle
placeholder or short lived token
no real credential material
Egress proxy
intercepts outbound traffic
substitutes the real secret
applies scope and rate limits
Upstream service
receives a properly authenticated request
never knows about the sandbox
platform owns the root credential
This is what Fly's Tokenizer does. It's what AWS IMDS does for EC2 and Lambda with short lived role credentials. It's also the pattern Hashicorp Vault popularized fifteen years ago. It's the default for human developers and CI systems already, and it transfers cleanly to agent sandboxes.
Rebuttal 2 of 3
Andrea Luzzardi, Mendral
“A lot of what an agent does doesn't need a sandbox at all: thinking, calling APIs, summarizing, waiting for CI. Some sessions never touch a sandbox. With the harness outside, you provision one only when the agent needs to run a command, and suspend it whenever it's idle. When the harness lives inside the sandbox you can't do any of this, because you can't suspend the thing the loop is running on.”
Precisely right on the cost concern, idle compute shouldn't bill. But this isn't an argument for running the harness outside the sandbox, but about hibernation and elasticity, both of which are properties of the sandbox primitive & doesn't concern the location of the harness.
“You can't suspend the thing the loop is running on” is only true if your sandbox can't hibernate the whole VM. Opencomputer can:
- Hibernation drops idle sandboxes to disk. The entire VM ie. process tree, in-memory state, file handles, the loop itself is frozen and resumable in ~25ms. While hibernated, you're billing for snapshot storage and not compute. So stuff like CI waits, LLM round-trips, multi-minute “thinking” stretches all happen while the sandbox is essentially “off”.
- Elasticity scales the live VM between 1GB/1vCPU and 16GB/4vCPU based on observed memory pressure, with a 1 min cooldown on scale up and 15 min of low utilization data required to scale down. Idle agent reasoning runs at the bottom tier. A
cargo buildornpm installtriggers a scale-up; it drops back when the work is done. The harness lives inside throughout, and it just rides the resize! - For workloads that know their own shape better than the autoscaler can infer, there's an in-VM scaling API at
169.254.169.254so the agent can scale itself up before a known heavy step and back down after. We think this is especially valuable in an era where agents are becoming more ambitious and have more autonomy.
Lifecycle
A live sandbox can be paused, resized, and resumed in place
Hibernate idle sandboxes to disk
The entire VM, process tree, in-memory state, file handles, the loop itself, is frozen and resumable in roughly 25 milliseconds. While hibernated you bill for snapshot storage, not compute. CI waits, LLM round trips, and multi-minute thinking stretches all happen while the sandbox is essentially off.
Elastic resize between 1 GB and 16 GB
The live VM scales between 1 GB / 1 vCPU and 16 GB / 4 vCPU based on observed memory pressure, with a 1 minute cooldown on scale up and 15 minutes of low utilization data required to scale down. Idle agent reasoning runs at the bottom tier. A cargo build or npm install triggers a scale up, then drops back when the work is done. The harness rides the resize.
In-VM scaling API for agents that know their shape
For workloads that know their own shape better than the autoscaler can infer, an in-VM scaling API at 169.254.169.254 lets the agent scale itself up before a known heavy step and back down after. This matters more as agents become more ambitious and operate with more autonomy.
Resource profile
An agent session, hour by hour
Long idle stretches at 1 GB. Brief bursts to 16 GB for builds and installs.
Rebuttal 3 of 3
Andrea Luzzardi, Mendral
“Sandboxes become cattle. If one dies mid-session, the loop provisions a new one and keeps going. When the harness runs inside, the sandbox is the session, and losing it loses the session.”
This is also a real concern, no one wants to lose a multi hour session because a host went down ofc.
But this ALSO isn't an argument about where the harness runs. It's an argument about whether your sandbox primitive has “durability” built into it.
“Cattle vs pets” offers two options and asks you to pick one. There's a third, and we think of it as git branches for VMs. With Opencomputer.dev:
Three options
Pets, cattle, and git branches for VMs
Pets
hand fed, irreplaceable
Cattle
ephemeral, restart anywhere
Git branches for VMs
the third option
- Hibernation freezes the entire VM (process tree, in-memory state, file handles, the loop itself) and resumes it in ~25ms. Rolling deploys, scale events, restarts that are planned etc. all survivable. The loop kinda doesn't notice anything happened.
- For unexpected stuff, Checkpoints snapshot filesystem and installed state at any point in the session, and you can have up to 10 of them per sandbox. If a sandbox dies hard (host failure, kernel panic) you fork a fresh sandbox from the most recent checkpoint and resume. The harness re-reads on disk state ie. conversation history, planning state, todo list, the same way Claude Code does after you close your laptop and open it back up.
- Also forks aren't just for recovery. You can branch from any checkpoint to explore alternative paths in parallel: three migration strategies, two debugging hypotheses, two different refactors, without paying to bootstrap each one from scratch. The original keeps running.
All this to say that losing a sandbox isn't losing the session.
It's restoring from a snapshot, the same coordination primitive every production database has used for the last forty years!
The original article spends a whole section on durable execution: agent loops are long running, have to survive deploys, and Mendral solved it with Inngest checkpointing each turn as a step. That's durable execution infrastructure they had to build because the loop lives on the backend. With the agent running inside a computer + checkpoints, the sandbox is the unit of durability ie. the entire compute environment, which means it isn't a function call. Inngest is a great tool, but the problem it's solving here doesn't exist if the sandbox is the host.
Andrea's article isn't really ‘the harness belongs outside the sandbox.’ It's ‘the harness belongs outside an ephemeral sandbox.’ The thesis is sort of tautological once you state the assumption. Persistent sandboxes (ala computers) don't have these problems.
Reframe
What changes when the sandbox is the host
Ephemeral sandbox
harness must live outside
Persistent sandbox
harness lives inside
Between right and easy, users will always default to easy. And we are on a mission to make building agents as easy and delightful as possible!