← Back to blog

    Stop Treating Agent Sandboxes as Cattle

    Utpal Nadiger · May 4, 2026

    This article is in direct response to “The agent Harness belongs outside the sandbox” by Andrea Luzzardi. The premise of this article is that, well, you can (and in most cases should) run the agent harness inside the sandbox.

    I'll follow it up with 3 specific rebuttals from what is in the blog and what I think is fundamentally flawed in the arguments mentioned there. Lastly, to the author: Mendral looks incredible, more power to you!

    Now for the rebuttals!

    Rebuttal 1 of 3

    Andrea Luzzardi, Mendral

    “Running the harness outside the sandbox gets you things the inside model can't. Your credentials stay out of the sandbox. The loop holds the LLM API keys, the user tokens, the database access. The sandbox holds only the environment the agent needs to do its work. There's nothing in there for the agent to escape to, so there's no permission model to enforce and no credential leak to contain.”
    The agent harness belongs outside the sandbox →

    We think that this is a solved problem & has been for years.

    What you essentially want is credentials never at rest in the sandbox. A network egress proxy gives you that (there are ones that are open source like fly's tokenizer or mitmproxy):

    • The sandbox holds a handle (an opaque token, a placeholder, a short lived metadata service response). No real credential material.
    • Outbound traffic routes through the proxy. The proxy substitutes the real token at the boundary.
    • The upstream service sees a properly authenticated request. The sandbox NEVER sees the real secret.

    Credentials

    The sandbox never sees the real secret

    Sandbox

    holds an opaque handle

    placeholder or short lived token

    no real credential material

    Egress proxy

    intercepts outbound traffic

    substitutes the real secret

    applies scope and rate limits

    Upstream service

    receives a properly authenticated request

    never knows about the sandbox

    platform owns the root credential

    This is what Fly's Tokenizer does. It's what AWS IMDS does for EC2 and Lambda with short lived role credentials. It's also the pattern Hashicorp Vault popularized fifteen years ago. It's the default for human developers and CI systems already, and it transfers cleanly to agent sandboxes.

    Rebuttal 2 of 3

    Andrea Luzzardi, Mendral

    “A lot of what an agent does doesn't need a sandbox at all: thinking, calling APIs, summarizing, waiting for CI. Some sessions never touch a sandbox. With the harness outside, you provision one only when the agent needs to run a command, and suspend it whenever it's idle. When the harness lives inside the sandbox you can't do any of this, because you can't suspend the thing the loop is running on.”
    The agent harness belongs outside the sandbox →

    Precisely right on the cost concern, idle compute shouldn't bill. But this isn't an argument for running the harness outside the sandbox, but about hibernation and elasticity, both of which are properties of the sandbox primitive & doesn't concern the location of the harness.

    “You can't suspend the thing the loop is running on” is only true if your sandbox can't hibernate the whole VM. Opencomputer can:

    • Hibernation drops idle sandboxes to disk. The entire VM ie. process tree, in-memory state, file handles, the loop itself is frozen and resumable in ~25ms. While hibernated, you're billing for snapshot storage and not compute. So stuff like CI waits, LLM round-trips, multi-minute “thinking” stretches all happen while the sandbox is essentially “off”.
    • Elasticity scales the live VM between 1GB/1vCPU and 16GB/4vCPU based on observed memory pressure, with a 1 min cooldown on scale up and 15 min of low utilization data required to scale down. Idle agent reasoning runs at the bottom tier. A cargo build or npm install triggers a scale-up; it drops back when the work is done. The harness lives inside throughout, and it just rides the resize!
    • For workloads that know their own shape better than the autoscaler can infer, there's an in-VM scaling API at 169.254.169.254 so the agent can scale itself up before a known heavy step and back down after. We think this is especially valuable in an era where agents are becoming more ambitious and have more autonomy.

    Lifecycle

    A live sandbox can be paused, resized, and resumed in place

    1

    Hibernate idle sandboxes to disk

    ~25ms resume

    The entire VM, process tree, in-memory state, file handles, the loop itself, is frozen and resumable in roughly 25 milliseconds. While hibernated you bill for snapshot storage, not compute. CI waits, LLM round trips, and multi-minute thinking stretches all happen while the sandbox is essentially off.

    2

    Elastic resize between 1 GB and 16 GB

    autoscale

    The live VM scales between 1 GB / 1 vCPU and 16 GB / 4 vCPU based on observed memory pressure, with a 1 minute cooldown on scale up and 15 minutes of low utilization data required to scale down. Idle agent reasoning runs at the bottom tier. A cargo build or npm install triggers a scale up, then drops back when the work is done. The harness rides the resize.

    3

    In-VM scaling API for agents that know their shape

    169.254.169.254

    For workloads that know their own shape better than the autoscaler can infer, an in-VM scaling API at 169.254.169.254 lets the agent scale itself up before a known heavy step and back down after. This matters more as agents become more ambitious and operate with more autonomy.

    Resource profile

    An agent session, hour by hour

    Long idle stretches at 1 GB. Brief bursts to 16 GB for builds and installs.

    1 GB idle
    16 GB burst
    hibernated, snapshot storage only

    Rebuttal 3 of 3

    Andrea Luzzardi, Mendral

    “Sandboxes become cattle. If one dies mid-session, the loop provisions a new one and keeps going. When the harness runs inside, the sandbox is the session, and losing it loses the session.”
    The agent harness belongs outside the sandbox →

    This is also a real concern, no one wants to lose a multi hour session because a host went down ofc.

    But this ALSO isn't an argument about where the harness runs. It's an argument about whether your sandbox primitive has “durability” built into it.

    “Cattle vs pets” offers two options and asks you to pick one. There's a third, and we think of it as git branches for VMs. With Opencomputer.dev:

    Three options

    Pets, cattle, and git branches for VMs

    Pets

    hand fed, irreplaceable

    single host failure loses the session
    no story for planned restarts
    ops cost grows linearly with sessions

    Cattle

    ephemeral, restart anywhere

    harness must live outside to survive
    session state lives in the orchestrator
    every recovery is a cold start

    Git branches for VMs

    the third option

    hibernate to survive planned restarts
    checkpoint to recover from hard failures
    fork to explore alternatives in parallel
    • Hibernation freezes the entire VM (process tree, in-memory state, file handles, the loop itself) and resumes it in ~25ms. Rolling deploys, scale events, restarts that are planned etc. all survivable. The loop kinda doesn't notice anything happened.
    • For unexpected stuff, Checkpoints snapshot filesystem and installed state at any point in the session, and you can have up to 10 of them per sandbox. If a sandbox dies hard (host failure, kernel panic) you fork a fresh sandbox from the most recent checkpoint and resume. The harness re-reads on disk state ie. conversation history, planning state, todo list, the same way Claude Code does after you close your laptop and open it back up.
    • Also forks aren't just for recovery. You can branch from any checkpoint to explore alternative paths in parallel: three migration strategies, two debugging hypotheses, two different refactors, without paying to bootstrap each one from scratch. The original keeps running.

    All this to say that losing a sandbox isn't losing the session.

    It's restoring from a snapshot, the same coordination primitive every production database has used for the last forty years!

    The original article spends a whole section on durable execution: agent loops are long running, have to survive deploys, and Mendral solved it with Inngest checkpointing each turn as a step. That's durable execution infrastructure they had to build because the loop lives on the backend. With the agent running inside a computer + checkpoints, the sandbox is the unit of durability ie. the entire compute environment, which means it isn't a function call. Inngest is a great tool, but the problem it's solving here doesn't exist if the sandbox is the host.

    Andrea's article isn't really ‘the harness belongs outside the sandbox.’ It's ‘the harness belongs outside an ephemeral sandbox.’ The thesis is sort of tautological once you state the assumption. Persistent sandboxes (ala computers) don't have these problems.

    Reframe

    What changes when the sandbox is the host

    Ephemeral sandbox

    harness must live outside

    credentials live in the orchestrator
    idle billing handled by not provisioning
    durability handled by the backend loop

    Persistent sandbox

    harness lives inside

    credentials live in the egress proxy
    idle billing handled by hibernation
    durability handled by checkpoints and forks

    Between right and easy, users will always default to easy. And we are on a mission to make building agents as easy and delightful as possible!

    References