← Back to blog

    Where Should the Agent(s) Live?

    Utpal Nadiger, Mohamed Habib, Igor Zalutski · March 20, 2026

    In The Agentic Workload, we explained why software agent workloads are fundamentally different from traditional applications. Agents introduce a new set of deployment needs and constraints, and this post explores those constraints and what they mean for designing agentic systems.

    The ideas here apply broadly, but they matter most for platforms that give end users direct access to generative AI capabilities, like Lovable or Bolt, where untrusted input can reach the agent. Internal background agents often operate under a more trusted user model, but many of the same isolation and placement tradeoffs still come up.

    We will explore a few core questions

    1. What should the isolation model look like?
    2. Should the agent be separated from the tool-call environment?
    3. What state should persist inside the compute environment?
    4. How should the platform trade off speed, safety, and cost?

    Security

    The broad access that makes agents so powerful also makes them dangerous. For many applications, agents need filesystem access, process control, network access, and the ability to generate and execute arbitrary code.

    Isolation usually happens in two layers: an OS sandbox around the agent process itself, and a stronger execution boundary around the whole environment it operates in. Anthropic's sandboxing guidance and OpenAI's Codex docs are useful references for the first layer; Luis Cardoso, Pierce Freeman, and our Sandbox Fingerprinting writeup are good references for the second.

    Security model

    Isolation works in layers

    Layer

    OS sandbox

    Constrains the agent process itself with filesystem, process, and network boundaries.

    Example: Claude Code policy

    {
      "sandbox": {
        "enabled": true,
        "filesystem": {
          "allowWrite": ["/tmp/build"],
          "denyRead": ["~/.aws/credentials"]
        },
        "network": {
          "allowedDomains": ["github.com", "*.npmjs.org"]
        }
      }
    }

    This example uses Claude Code's sandbox settings, but other agent tools expose similar OS-level controls. Here, the policy allows writes to `/tmp/build`, blocks reads from local AWS credentials, and limits outbound access to a small domain allowlist.

    Layer

    Execution environment

    Container or VM isolation defines the boundary around the whole computer the agent operates in.

    Kernel

    Container profile

    Shared with the host.

    VM profile

    Separate guest kernel and a stronger isolation boundary.

    Filesystem

    Container profile

    Mounted workspace plus a small writable scratch area.

    VM profile

    Full guest filesystem that behaves more like a real machine.

    Network

    Container profile

    Optional and policy-controlled, often narrowed or disabled entirely.

    VM profile

    Optional and policy-controlled at the machine boundary.

    Tooling

    Container profile

    Good for packaged environments, but some developer setups need extra work.

    VM profile

    Usually closer to "just works" for broad developer tooling.

    Nested containers

    Container profile

    Possible, but often awkward without extra privileges or indirection.

    VM profile

    Typically more natural for Docker-style inner workflows.

    Use both layers together: OS sandboxing limits what the agent process can do, while the execution boundary provides another layer of defense and contains the full machine-level blast radius.

    Aside: Credentials

    Even with strong isolation, agents still need a safe way to access external services. At a minimum, the agent needs to authenticate to the model provider, so it is important to design the system to minimize blast radius if that environment is compromised. Prompt injection, policy bypass attempts, and the simple fact that the agent can execute arbitrary code all push in the same direction: assume the environment may eventually be coerced into trying to exfiltrate whatever credentials it can reach.

    A common answer is to give the sandbox only a short-lived session token and route privileged operations through a proxy or control plane. Browser Use pushes this idea further with what they call a "zero-secret sandbox" approach: the agent inside the sandbox holds no credentials at all, and every privileged operation, including model inference, is routed through an external control plane that owns the secrets on the agent's behalf.

    Credentials

    Token design goals

    1

    Short-lived

    If leaked, the token expires quickly.

    2

    Session-bound

    A stolen token only works for one environment.

    3

    Scoped

    Usage stays attributable to the right user or org.

    4

    Revocable

    The control plane can kill a token immediately if a session looks compromised.

    Proxy flow

    Sandbox agent

    gets per-session token

    cannot see provider secret

    token bound to user or org

    Auth proxy

    validates short TTL

    checks session scope

    applies attribution + limits

    Model provider

    receives proxied request

    uses platform-owned credential

    never exposes root key to agent

    Agent Placement

    Once isolation is a given, the next design choice is where the agent should live relative to the environment where code actually executes. Should the agent run inside the same isolated computer where it reads files, installs dependencies, and executes commands? Or should it remain outside that environment and send tool calls across the boundary into a separate execution target?

    That distinction turns out to matter a lot. It affects latency, security boundaries, state management, and how naturally the environment behaves like a real computer. The biggest performance question is whether repeated tool calls have to cross the sandbox boundary over and over again, or whether they can execute locally alongside the agent.

    The Basic Agent Loop

    Before getting into the specific placement models, it helps to define the basic loop most coding agents follow:

    Core loop

    The agent alternates between reasoning and execution

    1

    Agent to model

    The agent sends task state, context, and prior tool results to decide what should happen next.

    2

    Model back to agent

    The model returns a plan, a tool call, or a partial response for the agent to interpret.

    3

    Agent to tool

    The agent invokes a tool like filesystem access, shell execution, package install, tests, or an external API.

    4

    Tool back to agent

    The tool returns output, side effects, or errors, and that result becomes new working context.

    5

    Back to the model

    The updated state goes back into the next model call, and the loop repeats until the task is complete.

    back to step 1

    There are four main sources of time in that loop:

    Latency + runtime

    Four sources of time accumulate in every loop

    Model network latency

    network

    Network latency to the model provider on each request and response.

    Agent-sandbox hop latency

    network

    Network latency between the agent and sandbox when the tool-call environment is separated.

    Model runtime

    compute

    Time the model spends producing the next action, plan, or response.

    Tool runtime

    compute

    Time spent actually running commands, editing files, installing dependencies, or running tests.

    Agent placement mostly changes what happens around tool calls. Local tools add little overhead beyond the work itself; remote tools add another network hop each time. On real tasks, that compounds quickly across dozens of loops, which is why agent placement is a performance decision as much as a security one.

    The agent can live outside the sandbox, inside it, or in a hybrid setup.

    1/ Agent outside the sandbox

    In this model, the agent lives in its own environment and reaches across the sandbox boundary whenever it needs to execute tools or touch the filesystem. This keeps orchestration separate, but every tool call pays the cost of crossing that boundary.

    The security upside is that durable credentials, orchestration logic, and conversation state can stay outside the sandbox. That reduces blast radius, but it does not make the system immune to prompt injection: a manipulated agent can still abuse whatever bridges and permissions the control plane exposes.

    Sandbox boundaryAgent environment🤖 Agent (Outside Sandbox)Model Provider APIthinkingTools + FS

    2/ Agent inside the sandbox

    Here, the agent is co-located with the tool and filesystem environment. Tool calls stay local, which removes the repeated sandbox round-trip penalty, but it also means the agent itself lives inside the stronger isolation boundary.

    This shrinks the gap between reasoning and execution, but it also means a compromised agent sits next to the same files and tools it is using. This pattern works best with a zero-trust posture toward the sandbox itself: even inside the isolation boundary, the agent should not be assumed trustworthy enough to hold durable secrets.

    Sandbox boundary🤖 Agent (Inside Sandbox)Model Provider APIthinkingTools + FS

    3/ Hybrid placement

    A hybrid model keeps safe, common tool calls close to the agent while routing risky actions into a sandboxed execution target. This preserves much of the latency benefit of co-location without giving every capability the same trust boundary.

    Here, "safe" means low-blast-radius operations such as workspace file access or routine local commands. "Risky" means actions that cross trust boundaries, such as networked installs, privileged system access, or anything that could expose secrets or affect external systems. For example, reading and writing project files might execute locally alongside the agent, while dependency installs and arbitrary code execution get routed into the sandboxed environment where network access and process execution are more tightly controlled.

    Agent environmentSandbox boundary🤖 Agent (Hybrid)Model Provider APIthinking"Safe" Tools + FS"Risky" Tools + FS

    Head-to-Head Comparison

    Speed and Latency

    The effect of these placement decisions becomes clearer when the task is held constant and only the system design varies. The interactive comparison below runs the same workload across all three placement models. The latency assumptions are adjustable, so it is possible to see how placement alone changes execution behavior and total task completion time.

    Interactive comparison

    Use the shared controls below to compare how the three placement models respond to the same workload and latency assumptions.

    Sandbox boundary160ms
    Model API latency120ms
    Model response1400ms
    Tool execution650ms
    Tool calls per task30 (Medium)
    Safe tool calls (%)60%

    Agent Outside Sandbox

    Sandbox boundaryAgent environment🤖 Agent (Outside Sandbox)Model Provider APIthinkingTools + FS

    Total completion

    81.8s

    Total

    81.8s

    Agent Inside Sandbox

    Sandbox boundary🤖 Agent (Inside Sandbox)Model Provider APIthinkingTools + FS

    Total completion

    69.2s

    Total

    69.2s

    Agent Hybrid Sandbox

    Agent environmentSandbox boundary🤖 Agent (Hybrid)Model Provider APIthinking"Safe" Tools + FS"Risky" Tools + FS

    Total completion

    74.0s

    Total

    74.0s

    Security Boundaries

    The broader security model does not change, but the risk concentrates in different places depending on where the agent runs. On the surface, placing the agent outside the sandbox is the safest default, hybrid sits in the middle, and placing the agent inside is the riskiest if the question is simply what a compromised agent can touch directly. This is because an agent that can execute arbitrary code from within the environment is also closer to the files, tools, networked systems, and credentials that exist there.

    In practice, though, that ordering does not materially change the security posture you need to design for. The attack surfaces differ across the three patterns, but all of them still need strong environment isolation and strong secret isolation, especially because prompt injection has the potential to be routed through whatever tools and permissions the system exposes.

    System Complexity

    Latency is only part of the tradeoff. Every additional execution boundary also increases operational complexity.

    Once the agent, the safe tool environment, and the sandboxed tool environment are no longer the same place, logs, traces, credentials, filesystem state, and failures are spread across multiple systems. That makes reproduction and debugging much harder, even if the latency cost is acceptable.

    Operational cost

    Each boundary adds another thing to track

    Agent outside

    more bridges

    credentials + conversation stay in control plane
    tool calls cross into sandbox
    debugging spans multiple environments

    Agent inside

    fewest boundaries

    agent and tools share one environment
    less file movement and proxy glue
    fewer places for state to drift

    Hybrid

    most moving parts

    router decides safe vs risky tools
    state must stay coherent across environments
    misclassification becomes a failure mode

    Across all three comparisons

    Taking speed, security boundaries, and system complexity together, we favor placing the agent inside the isolated compute environment where it will actually execute code. It is usually the fastest and simplest approach, and it does not materially change the security posture as long as the sandbox is strongly isolated, treated as untrusted, and durable credentials stay outside it.

    Sandbox Lifecycle Patterns

    Beyond securing the isolated environment and deciding where to place the agent, there is still the question of how that environment should live over time. Anthropic's Agent SDK hosting guidance and related docs on secure deployment lay out a useful set of patterns here: ephemeral sessions, long-running sessions, hybrid sessions, and shared containers. These patterns describe how compute is created, how long it persists, and how state carries across work.

    The right choice depends on the shape of the workload. Some agents do one-shot work and can safely disappear when they are done. Others need to stay warm, preserve state, or serve a continuous stream of user and system events. The diagrams below make those differences concrete.

    1/ Ephemeral sessions

    A new sandbox is created for a task, the agent does its work, and the environment is destroyed when the task completes. This is operationally simple, but it starts to break down when complexity extends beyond what is possible to accomplish with a single prompt.

    User InputPrompt or taskSystem EventsAutomated triggersControl PlaneRoutes workSandbox AInitializingSandbox B

    2/ Long-running sessions

    The sandbox stays alive across tasks and interactions. This reduces repeated startup cost, keeps state close to the work, and is often the best fit for high-frequency or proactive agents.

    User InputPrompt or taskSystem EventsAutomated triggersControl PlaneRoutes workSandbox AProcessingSandbox BWaiting

    3/ Hybrid sessions

    The sandbox can shut down between bursts of work, but state is preserved and reloaded when the user or system returns. This pattern trades some startup overhead for much better economics than keeping everything hot all the time.

    User InputPrompt or taskSystem EventsAutomated triggersControl PlaneRoutes workSession StateHistory + filesystemSandbox AHydratingSandbox B

    4/ Shared container

    Multiple agents or processes share one long-lived environment. This can work when close collaboration is essential, but it increases the risk of conflicts and makes isolation between agents much weaker.

    Unlike the patterns above, shared containers are less about lifecycle and more about tenancy. A shared environment can itself be ephemeral, long-running, or hybrid; the key difference is that multiple agents or users share the same boundary, which weakens isolation in exchange for tighter collaboration.

    User InputPrompt or taskSystem EventsAutomated triggersControl PlaneRoutes workGlobal SandboxShared environmentAgent 01RunningAgent 02SpawnedAgent 03Spawned

    Long-lived and hybrid patterns tend to make the most sense from both a performance and economic perspective. Long-lived sessions minimize repeated startup and hydration costs for agents that are active continuously, while hybrid sessions preserve most of the same statefulness benefits without paying to keep every environment hot during idle periods.

    Ephemeral sessions are still useful for narrow one-shot tasks, and shared containers can make sense for tightly coordinated multi-agent systems, but for most user-facing agent products a long-lived or hybrid approach is best.

    OpenComputer takes a middle path between long-lived and hybrid designs. Where possible, environments are paused and resumed on the same host so they can preserve local state and avoid unnecessary cold starts. At the same time, state can also be persisted and restored externally when a workload is more intermittent or when the environment needs to move between hosts.

    Conclusion

    Agentic systems have operational needs that do not line up neatly with traditional application architectures. They need isolated execution, broad tool access, persistent working state, and a setup where reasoning and execution stay close enough that repeated tool calls do not pile on avoidable latency.

    For most user-facing agent products, this means combining OS-level sandboxing, strong environment isolation, and trust-minimized credentials. Colocating the agent with its execution environment inside that boundary keeps latency low and operational complexity to a minimum.

    On the lifecycle side, the most practical production choices are usually always-on, long-lived environments. They preserve continuity where agents need it while allowing elastic resizing at runtime to match the workload.

    The broader takeaway is simple: agents do not just need sandboxes. They need computers with the right isolation, the right placement, and the right lifecycle model for the workload they are serving. That is what we are building with OpenComputer.

    References