Where Should the Agent(s) Live?

Utpal Nadiger, Mohamed Habib, Igor Zalutski · March 20, 2026

In The Agentic Workload, we explained why software agent workloads are fundamentally different from traditional applications. Agents introduce a new set of deployment needs and constraints, and this post explores those constraints and what they mean for designing agentic systems.

The ideas here apply broadly, but they matter most for platforms that give end users direct access to generative AI capabilities, like Lovable or Bolt, where untrusted input can reach the agent. Internal background agents often operate under a more trusted user model, but many of the same isolation and placement tradeoffs still come up.

We will explore a few core questions

What should the isolation model look like?
Should the agent be separated from the tool-call environment?
What state should persist inside the compute environment?
How should the platform trade off speed, safety, and cost?

Security

The broad access that makes agents so powerful also makes them dangerous. For many applications, agents need filesystem access, process control, network access, and the ability to generate and execute arbitrary code.

Isolation usually happens in two layers: an OS sandbox around the agent process itself, and a stronger execution boundary around the whole environment it operates in. Anthropic's sandboxing guidance and OpenAI's Codex docs are useful references for the first layer; Luis Cardoso, Pierce Freeman, and our Sandbox Fingerprinting writeup are good references for the second.

Security model

Isolation works in layers

Layer

OS sandbox

Constrains the agent process itself with filesystem, process, and network boundaries.

Example: Claude Code policy

{
  "sandbox": {
    "enabled": true,
    "filesystem": {
      "allowWrite": ["/tmp/build"],
      "denyRead": ["~/.aws/credentials"]
    },
    "network": {
      "allowedDomains": ["github.com", "*.npmjs.org"]
    }
  }
}

This example uses Claude Code's sandbox settings, but other agent tools expose similar OS-level controls. Here, the policy allows writes to `/tmp/build`, blocks reads from local AWS credentials, and limits outbound access to a small domain allowlist.

Layer

Execution environment

Container or VM isolation defines the boundary around the whole computer the agent operates in.

Kernel

Container profile

Shared with the host.

VM profile

Separate guest kernel and a stronger isolation boundary.

Filesystem

Container profile

Mounted workspace plus a small writable scratch area.

VM profile

Full guest filesystem that behaves more like a real machine.

Network

Container profile

Optional and policy-controlled, often narrowed or disabled entirely.

VM profile

Optional and policy-controlled at the machine boundary.

Tooling

Container profile

Good for packaged environments, but some developer setups need extra work.

VM profile

Usually closer to "just works" for broad developer tooling.

Nested containers

Container profile

Possible, but often awkward without extra privileges or indirection.

VM profile

Typically more natural for Docker-style inner workflows.

Container profile

VM profile

Kernel

Shared with the host.

Separate guest kernel and a stronger isolation boundary.

Filesystem

Mounted workspace plus a small writable scratch area.

Full guest filesystem that behaves more like a real machine.

Network

Optional and policy-controlled, often narrowed or disabled entirely.

Optional and policy-controlled at the machine boundary.

Tooling

Good for packaged environments, but some developer setups need extra work.

Usually closer to "just works" for broad developer tooling.

Nested containers

Possible, but often awkward without extra privileges or indirection.

Typically more natural for Docker-style inner workflows.

Use both layers together: OS sandboxing limits what the agent process can do, while the execution boundary provides another layer of defense and contains the full machine-level blast radius.

Aside: Credentials

Even with strong isolation, agents still need a safe way to access external services. At a minimum, the agent needs to authenticate to the model provider, so it is important to design the system to minimize blast radius if that environment is compromised. Prompt injection, policy bypass attempts, and the simple fact that the agent can execute arbitrary code all push in the same direction: assume the environment may eventually be coerced into trying to exfiltrate whatever credentials it can reach.

A common answer is to give the sandbox only a short-lived session token and route privileged operations through a proxy or control plane. Browser Use pushes this idea further with what they call a "zero-secret sandbox" approach: the agent inside the sandbox holds no credentials at all, and every privileged operation, including model inference, is routed through an external control plane that owns the secrets on the agent's behalf.

Credentials

Token design goals

Short-lived

If leaked, the token expires quickly.

Session-bound

A stolen token only works for one environment.

Scoped

Usage stays attributable to the right user or org.

Revocable

The control plane can kill a token immediately if a session looks compromised.

Proxy flow

Sandbox agent

gets per-session token

cannot see provider secret

token bound to user or org

→↓

Auth proxy

validates short TTL

checks session scope

applies attribution + limits

→↓

Model provider

receives proxied request

uses platform-owned credential

never exposes root key to agent

Agent Placement

Once isolation is a given, the next design choice is where the agent should live relative to the environment where code actually executes. Should the agent run inside the same isolated computer where it reads files, installs dependencies, and executes commands? Or should it remain outside that environment and send tool calls across the boundary into a separate execution target?

That distinction turns out to matter a lot. It affects latency, security boundaries, state management, and how naturally the environment behaves like a real computer. The biggest performance question is whether repeated tool calls have to cross the sandbox boundary over and over again, or whether they can execute locally alongside the agent.

The Basic Agent Loop

Before getting into the specific placement models, it helps to define the basic loop most coding agents follow:

Core loop

The agent alternates between reasoning and execution

Agent to model

The agent sends task state, context, and prior tool results to decide what should happen next.

↓

Model back to agent

The model returns a plan, a tool call, or a partial response for the agent to interpret.

↓

Agent to tool

The agent invokes a tool like filesystem access, shell execution, package install, tests, or an external API.

↓

Tool back to agent

The tool returns output, side effects, or errors, and that result becomes new working context.

↓

Back to the model

The updated state goes back into the next model call, and the loop repeats until the task is complete.

↺

back to step 1

There are four main sources of time in that loop:

Latency + runtime

Four sources of time accumulate in every loop

Model network latency

network

Network latency to the model provider on each request and response.

Agent-sandbox hop latency

network

Network latency between the agent and sandbox when the tool-call environment is separated.

Model runtime

compute

Time the model spends producing the next action, plan, or response.

Tool runtime

compute

Time spent actually running commands, editing files, installing dependencies, or running tests.

Agent placement mostly changes what happens around tool calls. Local tools add little overhead beyond the work itself; remote tools add another network hop each time. On real tasks, that compounds quickly across dozens of loops, which is why agent placement is a performance decision as much as a security one.

The agent can live outside the sandbox, inside it, or in a hybrid setup.

1/ Agent outside the sandbox

In this model, the agent lives in its own environment and reaches across the sandbox boundary whenever it needs to execute tools or touch the filesystem. This keeps orchestration separate, but every tool call pays the cost of crossing that boundary.

The security upside is that durable credentials, orchestration logic, and conversation state can stay outside the sandbox. That reduces blast radius, but it does not make the system immune to prompt injection: a manipulated agent can still abuse whatever bridges and permissions the control plane exposes.

2/ Agent inside the sandbox

Here, the agent is co-located with the tool and filesystem environment. Tool calls stay local, which removes the repeated sandbox round-trip penalty, but it also means the agent itself lives inside the stronger isolation boundary.

This shrinks the gap between reasoning and execution, but it also means a compromised agent sits next to the same files and tools it is using. This pattern works best with a zero-trust posture toward the sandbox itself: even inside the isolation boundary, the agent should not be assumed trustworthy enough to hold durable secrets.

3/ Hybrid placement

A hybrid model keeps safe, common tool calls close to the agent while routing risky actions into a sandboxed execution target. This preserves much of the latency benefit of co-location without giving every capability the same trust boundary.

Here, "safe" means low-blast-radius operations such as workspace file access or routine local commands. "Risky" means actions that cross trust boundaries, such as networked installs, privileged system access, or anything that could expose secrets or affect external systems. For example, reading and writing project files might execute locally alongside the agent, while dependency installs and arbitrary code execution get routed into the sandboxed environment where network access and process execution are more tightly controlled.

Head-to-Head Comparison

Speed and Latency

The effect of these placement decisions becomes clearer when the task is held constant and only the system design varies. The interactive comparison below runs the same workload across all three placement models. The latency assumptions are adjustable, so it is possible to see how placement alone changes execution behavior and total task completion time.

Interactive comparison

Use the shared controls below to compare how the three placement models respond to the same workload and latency assumptions.

Sandbox boundary160ms

Model API latency120ms

Model response1400ms

Tool execution650ms

Tool calls per task30 (Medium)

Safe tool calls (%)60%

Agent Outside Sandbox

Total completion

81.8s

Total

81.8s

Agent Inside Sandbox

Total completion

69.2s

Total

69.2s

Agent Hybrid Sandbox

Total completion

74.0s

Total

74.0s

Security Boundaries

The broader security model does not change, but the risk concentrates in different places depending on where the agent runs. On the surface, placing the agent outside the sandbox is the safest default, hybrid sits in the middle, and placing the agent inside is the riskiest if the question is simply what a compromised agent can touch directly. This is because an agent that can execute arbitrary code from within the environment is also closer to the files, tools, networked systems, and credentials that exist there.

In practice, though, that ordering does not materially change the security posture you need to design for. The attack surfaces differ across the three patterns, but all of them still need strong environment isolation and strong secret isolation, especially because prompt injection has the potential to be routed through whatever tools and permissions the system exposes.

System Complexity

Latency is only part of the tradeoff. Every additional execution boundary also increases operational complexity.

Once the agent, the safe tool environment, and the sandboxed tool environment are no longer the same place, logs, traces, credentials, filesystem state, and failures are spread across multiple systems. That makes reproduction and debugging much harder, even if the latency cost is acceptable.

Operational cost

Each boundary adds another thing to track

Agent outside

more bridges

credentials + conversation stay in control plane

tool calls cross into sandbox

debugging spans multiple environments

Agent inside

fewest boundaries

agent and tools share one environment

less file movement and proxy glue

fewer places for state to drift

Hybrid

most moving parts

router decides safe vs risky tools

state must stay coherent across environments

misclassification becomes a failure mode

Across all three comparisons

Taking speed, security boundaries, and system complexity together, we favor placing the agent inside the isolated compute environment where it will actually execute code. It is usually the fastest and simplest approach, and it does not materially change the security posture as long as the sandbox is strongly isolated, treated as untrusted, and durable credentials stay outside it.

Sandbox Lifecycle Patterns

Beyond securing the isolated environment and deciding where to place the agent, there is still the question of how that environment should live over time. Anthropic's Agent SDK hosting guidance and related docs on secure deployment lay out a useful set of patterns here: ephemeral sessions, long-running sessions, hybrid sessions, and shared containers. These patterns describe how compute is created, how long it persists, and how state carries across work.

The right choice depends on the shape of the workload. Some agents do one-shot work and can safely disappear when they are done. Others need to stay warm, preserve state, or serve a continuous stream of user and system events. The diagrams below make those differences concrete.

1/ Ephemeral sessions

A new sandbox is created for a task, the agent does its work, and the environment is destroyed when the task completes. This is operationally simple, but it starts to break down when complexity extends beyond what is possible to accomplish with a single prompt.

2/ Long-running sessions

The sandbox stays alive across tasks and interactions. This reduces repeated startup cost, keeps state close to the work, and is often the best fit for high-frequency or proactive agents.

3/ Hybrid sessions

The sandbox can shut down between bursts of work, but state is preserved and reloaded when the user or system returns. This pattern trades some startup overhead for much better economics than keeping everything hot all the time.

4/ Shared container

Multiple agents or processes share one long-lived environment. This can work when close collaboration is essential, but it increases the risk of conflicts and makes isolation between agents much weaker.

Unlike the patterns above, shared containers are less about lifecycle and more about tenancy. A shared environment can itself be ephemeral, long-running, or hybrid; the key difference is that multiple agents or users share the same boundary, which weakens isolation in exchange for tighter collaboration.

Long-lived and hybrid patterns tend to make the most sense from both a performance and economic perspective. Long-lived sessions minimize repeated startup and hydration costs for agents that are active continuously, while hybrid sessions preserve most of the same statefulness benefits without paying to keep every environment hot during idle periods.

Ephemeral sessions are still useful for narrow one-shot tasks, and shared containers can make sense for tightly coordinated multi-agent systems, but for most user-facing agent products a long-lived or hybrid approach is best.

OpenComputer takes a middle path between long-lived and hybrid designs. Where possible, environments are paused and resumed on the same host so they can preserve local state and avoid unnecessary cold starts. At the same time, state can also be persisted and restored externally when a workload is more intermittent or when the environment needs to move between hosts.

Conclusion

Agentic systems have operational needs that do not line up neatly with traditional application architectures. They need isolated execution, broad tool access, persistent working state, and a setup where reasoning and execution stay close enough that repeated tool calls do not pile on avoidable latency.

For most user-facing agent products, this means combining OS-level sandboxing, strong environment isolation, and trust-minimized credentials. Colocating the agent with its execution environment inside that boundary keeps latency low and operational complexity to a minimum.

On the lifecycle side, the most practical production choices are usually always-on, long-lived environments. They preserve continuity where agents need it while allowing elastic resizing at runtime to match the workload.

The broader takeaway is simple: agents do not just need sandboxes. They need computers with the right isolation, the right placement, and the right lifecycle model for the workload they are serving. That is what we are building with OpenComputer.

Where Should the Agent(s) Live?

Security

Aside: Credentials

Agent Placement

The Basic Agent Loop

1/ Agent outside the sandbox

2/ Agent inside the sandbox

3/ Hybrid placement

Head-to-Head Comparison

Speed and Latency

Agent Outside Sandbox

Agent Inside Sandbox

Agent Hybrid Sandbox

Security Boundaries

System Complexity

Sandbox Lifecycle Patterns

1/ Ephemeral sessions

2/ Long-running sessions

3/ Hybrid sessions

4/ Shared container

Conclusion

References