What "elastic compute" means in 2026

Igor Zalutski · April 7, 2026

EC2 is the backbone of the modern internet, there's little doubt about that. Amazon started the cloud revolution back in 2006 by introducing scalable object storage decoupled from compute, followed shortly by compute with reliable provisioning that you could scale as you need - it grew out of the internal platform that Amazon had to build for itself to support the scale of e-commerce at its own web store.

20 years later, cloud VMs like EC2 are running the majority of SaaS applications. Often via more specialised higher-level services like container orchestrators or lambda functions, but the core remains more or less the same - VMs decoupled from the underlying hardware via hypervisors, highly optimised operating system images, and all the insane complexity of making a highly reliable host operating system with enough nines of reliability guarantees to support whatever you want to build, even if it's another Netflix with comparable popularity and load profile.

But now we have agents. A growing number of workloads are not even applications - they are ephemeral one-off artifacts that agents produce, or the agents themselves, often replicated across isolated sessions to ensure user's data is immune from the uncertainties introduced by LLMs. Are the tried and proven compute paradigms stay the same? Or do they change?

The elasticity timeline

2006

EC2

Provision a VM in minutes

2013

Containers

Spin up in seconds, share the kernel

2014

Lambda

Per-invocation, millisecond billing

2024

Type-1 Sandboxes

Ephemeral VMs for untrusted code, ~100ms boot

2025

Type-2 Sandboxes

Isolate the agent itself, persistent sessions

2026

Elastic from inside

Agent controls its own CPU/RAM on demand

Each generation made "elastic" mean something faster and more granular.

What's similar

Many things didn't change at all. For example, agents still need all the usual stuff that a web server would. Agents need the same exact environment a web server would thrive in - an agent is a process like any other. Windows vs Linux aside, there's really nothing special that the host needs to provide for the agent to run on it.

The best coding agents, at least as of April 2026, are CLIs - they read and write files, use bash, start processes. They can go much further than a web server in what they can do on the host system - more on that later - but fundamentally they are just like any other *nix process. You do not need anything special to run a coding agent. A fresh EC2 for example would be a reasonably good environment for one agent.

But this "one agent" scenario is pretty much where similarities end.

More elastic than EC2

The first bottleneck to be exposed by agents was the need to run untrusted code generated by an LLM on the fly. Say you've built your agent with Langchain or ai-sdk or another similar framework. Fundamentally it's a web application like any other - meaning it's likely deployed in some sort of a compute environment, perhaps containers or functions, with some degree of sharing the underlying compute across multiple requests.

Where do you run code generated by the LLM? Even if there is no resource sharing across requests (eg lambda), running ai-generated code on the same host that handles the request risks at a minimum secret exfiltration, or worse it could get access to all user's data if you get prompt-injected. You'd want bullet-proof isolation, ideally at kernel level - like a VM provides.

What counted as "elastic" in 2003 - provision a new VM in a minute or so - is not nearly elastic enough in 2026.

So... start a new VM for every piece of code generated by an LLM? That'd take time to boot. Maintain a pool of warmed up VMs and share them to some extent across multiple LLM sessions? Possible but rather complex plus a tradeoff on security. This is how we realised that not all compute problems are solved yet. Agents introduced the need for compute that is more elastic than EC2.

Enter sandboxes! Or "type-1 sandboxes" as I call them, or "as-tool" scenario (see two patterns) - ephemeral environments that spin-up on demand for untrusted code execution. An agent loop with something like ai-sdk would create a new sandbox every time the LLM generates a new piece of code. The faster such a sandbox starts, the faster is your agent - hence the competition on cold start latency amongst vendors.

Type-1 Sandbox: code execution as a tool

LLM generates code

"run this Python script"

→

Spin up sandbox

~100ms cold start

→

Execute & return

stdout → agent context

→

Destroy sandbox

ephemeral, no state

Sandboxing the agent itself

As-tool sandboxes were the first of the new kinds of compute needed by agents; but not the last. Claude Code started the coding agent revolution; most agree that November 2025 was a bit of a watershed moment, before and after feel distinctly different. Many engineers switched from IDEs to CLI-first agents like Claude Code and Codex and never looked back. And these agents turned out to be surprisingly good! Autonomy is an ongoing debate but few would challenge their utility with a human still at the driving wheel.

CLI agents turned out to be far more than coding assistants though. As engineers started using Claude Code as primary daily drivers, they also started throwing other problems at it - it was already open so why not. And oftentimes Claude Code performed better than specialised domain-specific agents these engineers were building from the ground up! This is because surprisingly many non-coding problems generalise well into coding ones - for example spreadsheets, all sorts of data analyses, reports and so on. The model doesn't have to know the answer - it can generate code to arrive at the answer.

That emergent property led to development of some counter-intuitive patterns. Bill Chen from OpenAI encouraged to use Codex as a reusable building block at AIE - and it makes a lot of sense! Turns out you don't really need to build a custom harness for many if not most use cases. You can just customise Claude Code or Codex or OpenCode with skills and it'll do the job just fine.

Naturally seeing how people used their harnesses big labs introduced SDKs to make this pattern easier to implement - enter Claude Agent SDK / Codex SDK (and Codex App Server). These SDKs are a bit counterintuitive in that they require the underlying CLI to be present in the environment. In some sense they are "coding agent wrappers" that expose convenience methods to build apps around them. OpenAI takes this pattern one step further by formalising the low-level API of the harness with Codex App Server.

But how do you run agents built with these SDKs? Since the SDKs rely on the CLIs, they could read and write any files on the host machine. So some sort of isolation is needed. Anthropic proposes 3 deployment patterns, none of which look like a typical web app - so you cannot really run an agent built with Claude SDK on Google Cloud Run or Lambda or any other popular runtime for web applications (because of the security reasons mentioned above). Instead, you need to securely isolate them - enter "type 2 sandboxes".

With CLI-based SDKs for building agents, it is no longer enough to isolate the code that the agent generates. You also need to isolate the agent itself.

Type-1 · as-tool

Lifetimeseconds

Purposerun untrusted code

Created byyour app

Key metriccold start latency

Type-2 · agent isolation

Lifetimehours to days

Purposeisolate the agent itself

Created byyour app, per user/session

Key metricpersistent state + security

Semantics here is somewhat different from "type-1" - a typical LLM-generated piece of code runs for a few seconds, whereas an agent session could run for hours or even days. It is much more like a persistent VM; but created on-demand in the application, often one for each user or even for each agent session for added security.

Elasticity from inside the box

So now we have CLI agents running inside "type-2" sandboxes, are we done with compute? Is it finally solved? No, not really.

Say you are building a coding agent that resolves issues reacting to events from Slack or Linear. And say you have a large Rust codebase, which is notoriously heavy on CPU and RAM. Which box do you put your agents in? 16gb for the whole duration of your agent session? This seems wasteful (and expensive) because most of the time it'll be idle waiting for LLM responses. But any less and you risk your Rust build failing; you could run builds in a separate ad-hoc box, but then you need to move all the files around, which will be rather slow.

What if the agent could control the resources it has from inside the box it already runs in?

Agent session resource usage over time

Rust codebase scenario — most time idle, brief burst for builds

1 GB idle

16 GB burst (cargo build)

>10x savings

This seems very natural, but understandably none of the traditional compute providers were built for that scenario. Compute was always meant to be controlled externally - unsurprisingly, because until very recently we only had human intelligence!

At opencomputer we are solving this by exposing elasticity endpoints to the agent inside the box. It can request more CPU/RAM as it sees fit, and very granularly (billing is per-second). This way in a heavy Rust codebase scenario you can stay at 1gb most of the time and only burst to 16gb for the duration of the build - saving more than 10x on compute. Give it a try!

Written by a human