Stop Treating Agent Sandboxes as Cattle

Utpal Nadiger · May 4, 2026

This article is in direct response to “The agent Harness belongs outside the sandbox” by Andrea Luzzardi. The premise of this article is that, well, you can (and in most cases should) run the agent harness inside the sandbox.

I'll follow it up with 3 specific rebuttals from what is in the blog and what I think is fundamentally flawed in the arguments mentioned there. Lastly, to the author: Mendral looks incredible, more power to you!

Now for the rebuttals!

Rebuttal 1 of 3

Andrea Luzzardi, Mendral

“Running the harness outside the sandbox gets you things the inside model can't. Your credentials stay out of the sandbox. The loop holds the LLM API keys, the user tokens, the database access. The sandbox holds only the environment the agent needs to do its work. There's nothing in there for the agent to escape to, so there's no permission model to enforce and no credential leak to contain.”

The agent harness belongs outside the sandbox →

We think that this is a solved problem & has been for years.

What you essentially want is credentials never at rest in the sandbox. A network egress proxy gives you that (there are ones that are open source like fly's tokenizer or mitmproxy):

The sandbox holds a handle (an opaque token, a placeholder, a short lived metadata service response). No real credential material.
Outbound traffic routes through the proxy. The proxy substitutes the real token at the boundary.
The upstream service sees a properly authenticated request. The sandbox NEVER sees the real secret.

Credentials

The sandbox never sees the real secret

Sandbox

holds an opaque handle

placeholder or short lived token

no real credential material

→↓

Egress proxy

intercepts outbound traffic

substitutes the real secret

applies scope and rate limits

→↓

Upstream service

receives a properly authenticated request

never knows about the sandbox

platform owns the root credential

This is what Fly's Tokenizer does. It's what AWS IMDS does for EC2 and Lambda with short lived role credentials. It's also the pattern Hashicorp Vault popularized fifteen years ago. It's the default for human developers and CI systems already, and it transfers cleanly to agent sandboxes.

Rebuttal 2 of 3

Andrea Luzzardi, Mendral

“A lot of what an agent does doesn't need a sandbox at all: thinking, calling APIs, summarizing, waiting for CI. Some sessions never touch a sandbox. With the harness outside, you provision one only when the agent needs to run a command, and suspend it whenever it's idle. When the harness lives inside the sandbox you can't do any of this, because you can't suspend the thing the loop is running on.”

The agent harness belongs outside the sandbox →

Precisely right on the cost concern, idle compute shouldn't bill. But this isn't an argument for running the harness outside the sandbox, but about hibernation and elasticity, both of which are properties of the sandbox primitive & doesn't concern the location of the harness.

“You can't suspend the thing the loop is running on” is only true if your sandbox can't hibernate the whole VM. Opencomputer can:

Hibernation drops idle sandboxes to disk. The entire VM ie. process tree, in-memory state, file handles, the loop itself is frozen and resumable in ~25ms. While hibernated, you're billing for snapshot storage and not compute. So stuff like CI waits, LLM round-trips, multi-minute “thinking” stretches all happen while the sandbox is essentially “off”.
Elasticity scales the live VM between 1GB/1vCPU and 16GB/4vCPU based on observed memory pressure, with a 1 min cooldown on scale up and 15 min of low utilization data required to scale down. Idle agent reasoning runs at the bottom tier. A cargo build or npm install triggers a scale-up; it drops back when the work is done. The harness lives inside throughout, and it just rides the resize!
For workloads that know their own shape better than the autoscaler can infer, there's an in-VM scaling API at 169.254.169.254 so the agent can scale itself up before a known heavy step and back down after. We think this is especially valuable in an era where agents are becoming more ambitious and have more autonomy.

Lifecycle

A live sandbox can be paused, resized, and resumed in place

Hibernate idle sandboxes to disk

~25ms resume

The entire VM, process tree, in-memory state, file handles, the loop itself, is frozen and resumable in roughly 25 milliseconds. While hibernated you bill for snapshot storage, not compute. CI waits, LLM round trips, and multi-minute thinking stretches all happen while the sandbox is essentially off.

↓

Elastic resize between 1 GB and 16 GB

autoscale

The live VM scales between 1 GB / 1 vCPU and 16 GB / 4 vCPU based on observed memory pressure, with a 1 minute cooldown on scale up and 15 minutes of low utilization data required to scale down. Idle agent reasoning runs at the bottom tier. A cargo build or npm install triggers a scale up, then drops back when the work is done. The harness rides the resize.

↓

In-VM scaling API for agents that know their shape

169.254.169.254

For workloads that know their own shape better than the autoscaler can infer, an in-VM scaling API at 169.254.169.254 lets the agent scale itself up before a known heavy step and back down after. This matters more as agents become more ambitious and operate with more autonomy.

Resource profile

An agent session, hour by hour

Long idle stretches at 1 GB. Brief bursts to 16 GB for builds and installs.

1 GB idle

16 GB burst

hibernated, snapshot storage only

Rebuttal 3 of 3

Andrea Luzzardi, Mendral

“Sandboxes become cattle. If one dies mid-session, the loop provisions a new one and keeps going. When the harness runs inside, the sandbox is the session, and losing it loses the session.”

The agent harness belongs outside the sandbox →

This is also a real concern, no one wants to lose a multi hour session because a host went down ofc.

But this ALSO isn't an argument about where the harness runs. It's an argument about whether your sandbox primitive has “durability” built into it.

“Cattle vs pets” offers two options and asks you to pick one. There's a third, and we think of it as git branches for VMs. With Opencomputer.dev:

Three options

Pets, cattle, and git branches for VMs

Pets

hand fed, irreplaceable

single host failure loses the session

no story for planned restarts

ops cost grows linearly with sessions

Cattle

ephemeral, restart anywhere

harness must live outside to survive

session state lives in the orchestrator

every recovery is a cold start

Git branches for VMs

the third option

hibernate to survive planned restarts

checkpoint to recover from hard failures

fork to explore alternatives in parallel

Hibernation freezes the entire VM (process tree, in-memory state, file handles, the loop itself) and resumes it in ~25ms. Rolling deploys, scale events, restarts that are planned etc. all survivable. The loop kinda doesn't notice anything happened.
For unexpected stuff, Checkpoints snapshot filesystem and installed state at any point in the session, and you can have up to 10 of them per sandbox. If a sandbox dies hard (host failure, kernel panic) you fork a fresh sandbox from the most recent checkpoint and resume. The harness re-reads on disk state ie. conversation history, planning state, todo list, the same way Claude Code does after you close your laptop and open it back up.
Also forks aren't just for recovery. You can branch from any checkpoint to explore alternative paths in parallel: three migration strategies, two debugging hypotheses, two different refactors, without paying to bootstrap each one from scratch. The original keeps running.

All this to say that losing a sandbox isn't losing the session.

It's restoring from a snapshot, the same coordination primitive every production database has used for the last forty years!

The original article spends a whole section on durable execution: agent loops are long running, have to survive deploys, and Mendral solved it with Inngest checkpointing each turn as a step. That's durable execution infrastructure they had to build because the loop lives on the backend. With the agent running inside a computer + checkpoints, the sandbox is the unit of durability ie. the entire compute environment, which means it isn't a function call. Inngest is a great tool, but the problem it's solving here doesn't exist if the sandbox is the host.

Andrea's article isn't really ‘the harness belongs outside the sandbox.’ It's ‘the harness belongs outside an ephemeral sandbox.’ The thesis is sort of tautological once you state the assumption. Persistent sandboxes (ala computers) don't have these problems. Session durability is also the axis we used to compare seven sandbox platforms after CodeSandbox shut down its CI and Repos products — most of the field still assumes the cattle model.

Reframe

What changes when the sandbox is the host

Ephemeral sandbox

harness must live outside

credentials live in the orchestrator

idle billing handled by not provisioning

durability handled by the backend loop

Persistent sandbox

harness lives inside

credentials live in the egress proxy

idle billing handled by hibernation

durability handled by checkpoints and forks

Between right and easy, users will always default to easy. And we are on a mission to make building agents as easy and delightful as possible!

Stop Treating Agent Sandboxes as Cattle

References