Sandboxes
an exploration of the ever growing Sandbox market
Throw a dart on a map of SF and you’ll hit someone building a sandbox. That’s what it feels like at least.
Sandboxes started as safe environments for running untrusted code. As agentic workloads become ever more complex, sandboxes are now evolving into the operating environments agents actually work inside.
What makes this market interesting is that there is no longer a single sandbox workload. Every major provider is optimizing around a different use case and implicitly betting on a different agentic future.
Workloads
Sandbox workloads typically differ across two axes: how long they run, and how much state they require. In practice, this collapses into three dominant use cases:
Simple code execution
Short-lived environments that spin up, execute code, and disappear. This is the default for most LLM use cases today.
Stateful environments
Longer-lived environments where state matters. These look more like dev environments: persistent filesystems, multi-step workflows, and sessions that last hours.
Compute-heavy workloads
These look like traditional infrastructure. Training small models, running inference, batch jobs, and CI-like workloads. They are often GPU-bound, parallelizable, and cost-intensive.
The important shift now is that sandboxes are persisting far beyond a single request. With agents becoming more stateful, this execution layer starts looking less like a disposable utility and more like an operating system.
Design Space
Stateless code execution is relatively straightforward but persistent agent environments are not. Once agents need to maintain filesystems, memory, and long-running workflows, the execution environment itself becomes part of the system architecture.
Isolation Primitive
Different workloads require different isolation guarantees. Containers are lightweight and fast. MicroVMs optimize for stronger isolation boundaries and are generally seen as the default “right” primitive. gVisor sits somewhere in the middle.
Performance
Performance has to be evaluated multi-dimensionally once a sandbox has to persist beyond a single request.
For simple code execution workloads, cold start time matters the most. If a tool call takes several seconds before execution even begins, the entire experience will feel sluggish.
For longer-running workloads, resume time often matters more. When a user pauses and later resumes a session, they expect it to reappear instantly. This pushes providers towards snapshotting, warm pools, and persistent execution environments.
At scale, providers also begin optimizing around an entirely different set of constraints: concurrent sandbox orchestration, CPU performance, disk performance, and workload density.
Statefulness
Agent workloads are increasingly stateful.
Early agentic systems largely operate through isolated tool calls: execute code, return a result, destroy the sandbox. But as agents become more long-running, they begin accumulating context across files, logs, and cached data. Across many agent architectures today, the filesystem has become the default memory layer.
This fundamentally changes what the sandbox needs to be. The execution environment is no longer disposable; it starts behaving like a persistent workspace. This means snapshotting and restoring environments across sessions, branching and forking execution states, mounting external storage systems (S3, blob storage, databases), and preserving long-running state across workflows.
Portability and Pluggability
Sandboxes will need to be compatible with any harness.
Anthropic’s Claude Managed Agents and OpenAI’s Agents SDK have already treated sandboxes as entirely interchangeable.
This portability means that teams will optimize around whichever provider best matches their workload, budget, and orchestration model.
Where the harness lives
One of the more interesting architectural questions is around where the harness itself should live.
If the harness lives outside the sandbox, orchestration can remain always-on while environments spin up only when execution is required. This improves responsiveness but creates more complicated trust boundaries between infrastructure and untrusted execution. This is the direction we’ve seen Claude Managed Agents and OpenAI’s Agents SDK take.
Alternatively, if the harness lives inside the sandbox, isolation becomes dramatically simpler but startup latency becomes critical. The tradeoff is less flexibility around orchestration.
Today’s Independent Players
The major sandbox providers today have organized around different assumptions about what the future of agent workloads look like.
Modal
Modal is the closest thing to a scaled incumbent. It started as a serverless compute platform and treats sandboxes as another workload on top of a broader compute layer. It supports CPU, GPU, and serverless jobs.
The tradeoff is that it is not optimized for agent semantics. If you want granular control over where your workloads run (in VPC or on prem) Modal’s default model can become constraining.
Modal is effectively betting that the dominant execution model for agents will look like traditional infrastructure workloads.
e2b
e2b goes back to the core sandbox use case — run arbitrary code safely. It uses Firecracker microVMs for isolation and is particularly strong for code execution.
The tradeoff is operational complexity. Teams now need to manage images, networking, and Kubernetes integrations. You’re no longer just calling an API.
e2b is betting that most agent execution will remain stateless.
Daytona
Daytona is making a more opinionated bet that sandboxes should feel like real environments. The product behaves more like a cloud devbox: persistent filesystems, long-lived sessions, and fast startup times. They use a container based primitive to power this.
The tradeoff is that this model is heavier than what most workloads need right now. If you’re executing a quick tool call, this is overkill.
Daytona is effectively betting that the future of agents looks like persistent operating environments with filesystems, memory, and long-running state.
Beyond these three, there is a growing long tail of specialized players.
Build over Buy
At enough scale, companies stop buying sandboxes and start building them in-house.
I spoke with an AI-native search company that initially used e2b. Their early workloads were around running untrusted code. Over time, their products became more agent-centric. They shipped their marquee product that required supporting longer-running sessions, persistent environments, and complex orchestration. What they needed was starting to look more like an operating environment for their agents.
Eventually, they replaced e2b with their homegrown sandbox platform hyper-optimized around their specific workloads and traffic patterns.
Once a company reaches sufficient scale — or develops sufficiently opinionated workloads — the economics and architectural incentives shift from buy to build. At that point, the sandbox layer becomes far too important to outsource to a third-party.
Where the Market is Going
Market Dynamics
First, the baseline product is commoditizing. Spinning up a sandbox is no longer hard. Cold starts across providers are converging and isolation primitives are widely available. For the average workload, sandboxes will largely be interchangeable.
Second, the real competition is not other sandbox startups. Hyperscalers are already beginning to enter the market. AWS already exposes most of the primitives needed to build a custom sandbox through Lambda, ECS, and Firecracker. They already own the surrounding infrastructure and can offer far more customization than any independent provider.
Third, BYOS (bring your own sandbox) is becoming the default interface. OpenAI’s recent Agents SDK update supports multiple sandbox providers, and so does Claude Managed Agents. Switching costs will approach zero.
The execution abstraction itself is also still evolving alongside agents, which makes long-term differentiation harder to predict.
What this means
Given these dynamics, the sandbox market will likely split: commoditization for the average case and specialization for niche cases.
Simple execution workloads will be serviceable by anyone. Smaller teams will default to whatever is bundled into their hosting platform or cloud provider because the marginal benefit of choosing a standalone sandbox provider is low.
The more interesting workloads are the ones becoming persistent and stateful. Long-running agents are still early, but this is clearly where the market is heading. This is where isolation, persistence, and orchestration really matter. These workloads demand much more opinionated infrastructure.
As a result, the number of meaningful independent providers will shrink. A handful of specialized players will continue serving niche use cases. Hyperscalers will absorb the undifferentiated long tail of sandbox workloads. And for those where sandboxing is tightly coupled with the core product, they will build in-house.
Today, there’s still a large part of the market that is underserved. Most teams do not have the resources to build in house. Enterprises need significantly more control than current products provide. And agent workloads are also evolving so quickly that many teams are struggling to understand what “correct” even looks like. That leaves a very large window where independent sandbox providers can continue capturing meaningful value.
The sandbox market isn’t hot for no reason. As agent development continues trending towards persistence and statefulness, sandboxes will become the operating system for agents.


Amazing