Skip to content

How Baponi Works

Baponi is a sandboxed code execution platform for AI agents. You send code, it runs in an isolated sandbox with multi-layer security, and you get results back. One HTTP call. Sub-20ms overhead. No container lifecycle to manage.

Baponi runs as a cloud service or deploys entirely inside your own infrastructure. Same platform, same Helm chart. For organizations where no data can leave the VPC, Baponi runs on your Kubernetes cluster, with your identity provider, your database, and your monitoring.

This guide covers the execution model, every API parameter, how state persistence works, and how to integrate Baponi with your AI agent.

Terminal window
curl -X POST https://api.baponi.ai/v1/sandbox/execute \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"code": "print(sum(range(100)))",
"language": "python"
}'
{
"success": true,
"stdout": "4950\n",
"stderr": "",
"exit_code": 0
}

That’s a real execution in a fully isolated sandbox. Sub-20ms to set up the isolation. The rest is your code running.

If you’ve built an AI agent, you’ve probably used your LLM provider’s built-in code execution: OpenAI’s code_interpreter, Anthropic’s code_execution, and Google’s code_execution. They’re great for getting started:

# Most LLM providers make code execution a one-liner
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=[{"type": "code_execution_20250825", "name": "code_execution"}],
messages=[{
"role": "user",
"content": "Plot a sine wave and save it as plot.png",
}],
)

Simple. The LLM writes code, runs it, returns the result. For a quick calculation, this is perfect. But the moment your agent needs to do real work, you hit walls:

  • No file access. Your data lives in S3 or GCS. The sandbox can’t see it. You end up base64-encoding CSVs into the prompt or building upload/download plumbing around every call.
  • No internet access. These sandboxes are fully locked down. Your agent wants to pip install a package, call an API, or fetch a URL? Can’t do it. With Baponi, you choose per sandbox: blocked (default) or unrestricted (full outbound internet).
  • No control over the runtime or resources. You get whatever Python version, packages, CPU, and memory the provider decided on. Need pandas 2.2, a custom ML model, Node.js, a longer timeout, or more compute? Can’t change any of it.
  • No access to your infrastructure. The sandbox can’t reach your databases, cloud services, or internal APIs, and you’d never want to paste credentials into LLM-generated code anyway. With Baponi, connectors inject credentials securely at runtime. Your agent just runs psql or bq query, no passwords in code.
  • Container lifecycle headaches. The provider’s sandbox stays alive for a few minutes after your call, but you never really know how long. Is it still running? Can you resume it? If it timed out, did you lose the files in it? You end up writing defensive code to pull results out before the container dies, or re-uploading everything to start over. With Baponi, there’s no container to keep alive. Every call gets a fresh sandbox in under 20ms, state is snapshotted automatically, and nothing runs between calls. No lifecycle to manage, no idle billing, no state to lose.

So you look at third-party sandbox providers. Now you get control, but the simplicity is gone and the bill starts climbing:

# Third-party sandbox SDKs: control, but you manage the lifecycle
from sandbox_provider import Sandbox
sandbox = Sandbox(image="python:3.14") # boot a container (200-500ms)
sandbox.upload("data.csv", "/home/data.csv") # upload files one by one
sandbox.run("pip install pandas") # install packages
result = sandbox.run("import pandas; ...") # finally, actual work
output = sandbox.download("/home/output.csv") # download results
sandbox.close() # don't forget this

You’re now managing container lifecycle. What happens when your agent makes 50 tool calls in a conversation? What happens when the sandbox crashes mid-session? When you forget to close it? When you need to handle files going in and out of every execution?

Your agent code is suddenly more about sandbox management than about what the agent actually does.

We asked a simple question: why should you manage containers at all?

With Baponi, your API key already knows everything: what image to use, how much CPU and RAM to allocate, what storage to mount, what credentials to inject, what network policy to apply. You set that up once in the admin console. After that, your code just calls execute.

There’s no sandbox object to create in code. No container setup. No state to track. No lifecycle to manage. When you call the API, Baponi spins up a fresh isolated environment, runs your code, and tears it down. This works because our isolation boots in under 20ms, fast enough that there’s no reason to keep anything alive between calls.

Here’s how you give Claude code execution with our Python SDK:

from anthropic import Anthropic
from baponi.anthropic import code_sandbox
client = Anthropic()
response = client.beta.messages.tool_runner(
model="claude-sonnet-4-6",
tools=[code_sandbox],
messages=[{
"role": "user",
"content": "Analyze the PDF I uploaded and give me a summary."
}],
).get_final_response()
print(response.content[0].text)

That’s the entire integration. Claude writes code, Baponi runs it, Claude analyzes the result, all handled automatically.

Same pattern for every major framework:

from baponi.openai import code_sandbox # OpenAI Agents SDK
from baponi.google import code_sandbox # Google Gemini
from baponi.langchain import code_sandbox # LangChain
from baponi.crewai import code_sandbox # CrewAI

No container setup. No files to upload. No state to track. No cleanup.

This is where the approach really pays off. With other sandbox providers, getting data in and out means upload APIs, download APIs, temporary URLs, and file-transfer code around every call.

With Baponi, your data is already mounted:

# Your GCS bucket is mounted at /data/my-bucket/
# LLM-generated code just uses it:
import pandas as pd
df = pd.read_csv("/data/my-bucket/customers.csv")
result = df.groupby("region").agg({"revenue": "sum"})
result.to_csv("/data/my-bucket/reports/summary.csv")

Bring Your Own Bucket (BYOB): Connect your S3, GCS, or Azure bucket in the admin console. It mounts into every sandbox automatically. Your code reads and writes with normal file I/O. No upload API, no download API, no temporary URLs.

Managed volumes: Baponi-hosted storage that works exactly like BYOB, no cloud bucket setup required. First 10 GB free, expandable as you grow. Your agent writes files during execution, you upload via the API’s signed-URL endpoints, and everything is available in your sandboxes.

No file-transfer code. No base64 encoding in prompts. The LLM writes open("/data/bucket/file.csv") and it works.

Need the sandbox environment to persist between calls (installed packages, working files, in-progress computations)? Add one parameter:

from baponi import Baponi
client = Baponi(api_key="sk-...")
# Generate a unique thread_id per session: descriptive prefix + random suffix
thread_id = "data-analysis-x8k2m9p4z1"
# Call 1: install packages and start work
result = client.execute(
"pip install --user pandas scikit-learn && echo done",
language="bash",
thread_id=thread_id,
)
# Call 2: everything from call 1 is still there
result = client.execute(
"import pandas, sklearn; print(pandas.__version__)",
language="python",
thread_id=thread_id,
)

Each thread_id should be unique. We recommend a descriptive prefix plus a random suffix (like data-analysis-x8k2m9p4z1) so it’s easy to identify in logs.

Baponi saves only the diff of the home directory to cloud storage after each call and restores it on the next. Between calls, nothing is running. It’s a saved state, not a running process. Diffs are typically small (installed packages, working files), so thread snapshots barely cost anything. Pick it up five minutes later or five days later, same result.

And here’s the key insight: most use cases don’t even need thread_id. If your data lives in mounted storage (/data/*), sandboxes are fully ephemeral while your data is always persistent. Mount your bucket, keep sandboxes stateless. You get the simplicity of ephemeral execution with the durability of cloud storage.

All of this (the image, resources, storage mounts, credentials, network policy) is configured once in the admin console and attached to a sandbox. Your API key points to that sandbox. When your code calls execute, it inherits everything automatically.

The developer writing the agent never thinks about infrastructure. The admin who set up the sandbox never thinks about agent logic. Clean separation.

For self-hosted deployments, the same admin console runs inside your cluster. Same configuration, same API, same developer experience, just deployed on infrastructure you control.

ParameterRequiredDefaultDescription
codeYes-The code to execute. 1 byte to 1 MB.
languageNopythonpython, node, or bash. Must be supported by the sandbox’s image.
timeoutNo60Maximum execution time in seconds (1–3600, plan-dependent).
thread_idNo-Enables persistent /home directory across calls. Alphanumeric, hyphens, underscores, max 128 chars. See stateful execution.
metadataNo-Key-value pairs for audit logging. Up to 10 keys. Not sent to the sandbox, purely for your records.

Five parameters. One required. See the Execute API reference for the full request/response specification, error codes, and worked examples.

Everything else (the runtime image, CPU, RAM, network policy, storage mounts, injected credentials, environment variables) lives in the sandbox configuration behind your API key. Your execute call doesn’t need to know about any of it.

We deliberately kept the request surface minimal. The reasoning:

  • code + language are obviously required to run anything.
  • timeout belongs on the request because different code snippets need different time budgets. A quick calculation needs 5 seconds; a data pipeline might need 10 minutes. The caller knows this at invocation time; the sandbox admin doesn’t.
  • thread_id is per-conversation state, not per-sandbox config. The AI agent decides whether this execution belongs to an ongoing session. It would be wrong to configure this on the sandbox.
  • metadata is caller context (which user triggered this, which model, which session). It’s audit data that the sandbox has no business seeing.

Everything else (image, CPU, RAM, network, storage, credentials) is an infrastructure decision. It doesn’t change between requests and shouldn’t be in the hot path. Configure it once, forget about it.

FieldDescription
successtrue if exit code is 0.
stdoutCaptured standard output.
stderrCaptured standard error.
exit_codeProcess exit code (0 = success).

Ephemeral by default, stateful when you need it

Section titled “Ephemeral by default, stateful when you need it”

Every execution has three filesystem zones with different persistence rules:

PathWithout thread_idWith thread_id
/tmpEphemeral (RAM disk)Ephemeral (RAM disk)
/home/baponiEphemeral (RAM disk)Persistent, synced to cloud storage
/data/*Always persistent (cloud storage)Always persistent (cloud storage)

Use thread_id when you need the sandbox environment to survive between calls: installed packages, downloaded files, in-progress work in the home directory.

Terminal window
# Call 1: install pandas
curl -X POST .../v1/sandbox/execute \
-d '{"code": "pip install --user pandas && echo done", "language": "bash", "thread_id": "session-1"}'
# Call 2: pandas is already there
curl -X POST .../v1/sandbox/execute \
-d '{"code": "import pandas; print(pandas.__version__)", "language": "python", "thread_id": "session-1"}'

Between these calls, nothing is running. Baponi saves only the diff of /home/baponi, not the entire image, to cloud storage after the first call and restores it before the second. The thread isn’t a live process sitting idle. It’s a saved filesystem state. Diffs are typically small (installed packages, config files, working data), so thread snapshots add very little to your storage usage.

Only one execution can use a thread_id at a time. A second request with the same thread_id will fail if one is already running. Different thread IDs execute fully in parallel.

How long does thread state last? That’s configured on the sandbox when you create it in the admin console. Options: 24 hours, 7 days, 30 days, or forever (the default). Thread snapshots count toward your storage billing. First 10 GB is free.

If your data lives in mounted storage (/data/*), you often don’t need thread_id at all:

# No thread_id needed - /data/my-bucket/ is always there
import json
data = json.load(open("/data/my-bucket/input.json"))
result = process(data)
json.dump(result, open("/data/my-bucket/output/result.json", "w"))

The sandbox is fully ephemeral. Your data lives in the bucket. Baponi mounts it fresh on every call and syncs writes back before returning.

Metadata: audit trail, not execution context

Section titled “Metadata: audit trail, not execution context”

The metadata field accepts up to 10 key-value pairs:

{
"metadata": {
"user_id": "user_abc",
"session_id": "chat_xyz",
"model": "claude-sonnet-4"
}
}

These are stored in your execution logs for auditing and filtering. They are not sent to the sandbox. Your code never sees them. Use them to trace which user, session, or model triggered each execution.

Your API key points to one sandbox configuration. That configuration determines the full execution environment:

SettingWhere it’s configuredWhat it does
Runtime imageSandboxPython 3.14, Node.js 25, Bash, or your own OCI image with custom runtimes and packages.
CPUSandbox0.5–4 cores. RAM scales 1:1 (2 CPU = 2 GiB RAM).
Max timeoutPlan tier60s (Free), 3600s (Pro), unlimited (Enterprise).
Network policySandbox + API keyblocked (default) or unrestricted. API key can restrict further, never relax.
Storage connectionsAdmin consoleWhich S3/GCS/Azure buckets mount where, with per-key path scoping.
ConnectorsAdmin consoleWhich database, cloud, or API credentials to inject.
Environment variablesSandbox, API key, or requestCustom key-value pairs injected into the sandbox (e.g., API_BASE_URL, LOG_LEVEL). Set at sandbox level (applies to all keys), API key level (applies to all requests with that key), or per-request in the env_vars body field. Most specific scope wins when the same key appears at multiple levels.

All set-and-forget. Configure once, use the API key forever. Per-request env_vars are the exception - they let callers pass execution-specific configuration without changing the admin console setup.

The API key can narrow what the sandbox allows (restrict storage to a sub-path, override network to blocked, limit connector access) but can never broaden it. This lets you issue scoped keys for different agents or tenants from a single sandbox configuration.

Baponi gives you two ways to mount persistent data into sandboxes.

Don’t want to manage your own cloud bucket? Baponi gives every organization storage that works exactly like BYOB, with the first 10 GB free and expandable as you grow. Same mount behavior, same file I/O, same persistence. Files get there the same way they get anywhere: your agent writes them during execution, or you upload via the Files API signed-URL endpoints. The admin console lets you browse and manage what’s stored. Your data is private to your organization.

Connect your own S3, GCS, or Azure Blob Storage bucket in the admin console. It mounts into every sandbox automatically, so your code uses normal file operations:

import pandas as pd
# GCS bucket "company-data" mounted at /data/company-data/
df = pd.read_csv("/data/company-data/customers/q1-2026.csv")
summary = df.groupby("region").agg({"revenue": "sum"})
summary.to_csv("/data/company-data/reports/q1-summary.csv")

No cloud SDK in your code. No credentials in your code. Just file paths. Writes sync back to your bucket automatically before the execution response returns.

Baponi enforces path constraints at three levels. Each level can only narrow the scope - never widen it:

  1. Connection prefix - When you add a bucket in the admin console, you can set an optional prefix that constrains all access to a specific subtree. This is the broadest boundary. Use it to lock down a connection to a specific folder - for example, when you’ve agreed with a partner to only access a specific path in their bucket, or when you want to guarantee no API key can wander outside a designated area.

  2. API key prefix - When you create an API key, you can set a deeper prefix per storage connection. It must start with the connection prefix. Use it for per-agent or per-tenant isolation.

  3. Request sub_paths - Callers can pass sub_paths in the request body to narrow the mount to a specific subdirectory for a single execution. It must respect both the connection and API key prefixes.

Example: a partner shares their bucket partner-data restricted to shared/acme/. You issue per-tenant API keys:

LevelPrefixWhat the sandbox sees at /data/partner-data/
Connectionshared/acme/All keys confined to shared/acme/
API key for Tenant Ashared/acme/tenant-a/Only Tenant A’s data
API key for Tenant Bshared/acme/tenant-b/Only Tenant B’s data

Path scoping is enforced server-side. The AI agent can’t break out of its assigned prefix. It doesn’t even know other prefixes exist. Prefix matching is segment-aware (tenants/a matches tenants/a/subdir but not tenants/ab), and path traversal attempts (.., URL-encoded sequences) are rejected.

For the full constraint model, worked examples, and security properties, see the Storage Path Scoping guide.

BYOB storage is always free and unlimited. You pay your cloud provider for the bucket; Baponi doesn’t charge for the mount.

Connectors: credentials your code never sees

Section titled “Connectors: credentials your code never sees”

Connectors solve a common problem: how do you let an AI agent query your database or access cloud services without exposing credentials in code?

Configure connectors in the admin console: PostgreSQL, MySQL, BigQuery, Redis, MongoDB, S3, GCS, Azure, GitHub (9 types today). Attach them to an API key. Baponi injects the credentials as standard config files that CLI tools expect:

ConnectorWhat gets injectedWhat your code runs
PostgreSQL.pgpass + PGPASSFILEpsql -h host -d db -c "SELECT ..."
AWS S3.aws/credentials + AWS_PROFILEaws s3 cp s3://bucket/file .
BigQueryADC JSON + GOOGLE_APPLICATION_CREDENTIALSbq query "SELECT ..."
GitHubGH_TOKENgh repo clone org/repo
RedisREDIS_HOST + REDISCLI_AUTHredis-cli GET key

Your code uses standard tools and standard environment variables. No Baponi-specific SDK or wrapper.

Credentials exist only in a RAM-based mount during execution. They’re never written to the sandbox filesystem, never visible in environment variable dumps, and destroyed the moment execution completes. Even with thread_id, credentials are injected fresh on every call and never persisted in thread state.

Self-hosted deployments can leverage your existing Workload Identity and IAM, no service account keys to manage at all.

Blocked (default): No network interface at all. No DNS, no outbound connections. The sandbox processes data from storage mounts and returns results. This is the right choice for most AI agent workloads.

Unrestricted: Full outbound internet access. Internal networks (private IPs, cloud metadata endpoints) are always blocked at the network layer. Bandwidth scales with CPU allocation: 200 Mbps per core.

The network policy is set on the sandbox configuration. An API key can downgrade unrestricted to blocked, but never the reverse, so you can issue a “read-only analysis” key from a sandbox that normally has network access.

The SDK has ready-made integrations for every major LLM framework. Here’s a complete example with Anthropic using tool_runner. Claude writes code, Baponi runs it, Claude analyzes the result, all in one call:

from anthropic import Anthropic
from baponi.anthropic import code_sandbox
client = Anthropic()
response = client.beta.messages.tool_runner(
model="claude-sonnet-4-6",
tools=[code_sandbox],
messages=[{
"role": "user",
"content": "Analyze the PDF I uploaded and give me a summary."
}],
).get_final_response()
print(response.content[0].text)

Same pattern for OpenAI Agents SDK, Google Gemini, LangChain, and CrewAI. Each has a code_sandbox tool you import and plug in:

from baponi.openai import code_sandbox # Agent(tools=[code_sandbox])
from baponi.google import code_sandbox # config={"tools": [code_sandbox]}
from baponi.langchain import code_sandbox # create_react_agent(llm, tools=[code_sandbox])
from baponi.crewai import code_sandbox # Agent(role="coder", tools=[code_sandbox])

For custom configuration (API key, default thread_id, base URL for self-hosted), each module has a create_code_sandbox() factory.

No SDK needed. The API is a single endpoint. Any HTTP client works:

import requests
BAPONI_URL = "https://api.baponi.ai/v1/sandbox/execute"
BAPONI_KEY = "sk-..."
# Define the tool yourself
tool = {
"name": "run_code",
"description": "Run Python, Node.js, or Bash code in a sandboxed environment with access to data files at /data/.",
"input_schema": {
"type": "object",
"properties": {
"code": {"type": "string", "description": "Code to execute"},
"language": {"type": "string", "enum": ["python", "node", "bash"], "default": "python"},
},
"required": ["code"],
},
}
# When the LLM calls the tool:
result = requests.post(
BAPONI_URL,
headers={"Authorization": f"Bearer {BAPONI_KEY}"},
json={"code": tool_call.input["code"], "language": tool_call.input.get("language", "python")},
).json()
# result["stdout"], result["stderr"], result["success"]

Baponi is a native MCP (Model Context Protocol) server. If your tool supports MCP (Claude Desktop, Claude Code, Cursor, Windsurf, or any MCP-compatible agent framework), add Baponi with a server URL and an API key. The client discovers available tools automatically (sandbox_execute, files_download) and the LLM calls them directly.

No tool definitions. No forwarding code. No SDK. The LLM talks to Baponi directly. See the MCP Protocol reference for client configuration, transport details, and all available methods.

What languages does Baponi support? Python 3.14, Node.js 25, and Bash out of the box. Import your own OCI image through the admin console for custom runtimes.

How is this different from running code in Docker? Docker gives you containers you manage: pull images, start, stop, clean up. Baponi gives you an API endpoint. You send code, get results. No images to pull, no lifecycle to track. The isolation is deeper than Docker’s defaults (multi-layer sandboxing, seccomp syscall filtering, cgroups resource limits, zero capabilities) with less operational work on your end.

What happens if two requests use the same thread_id at once? The second request fails. Only one execution can use a given thread_id at a time to prevent filesystem conflicts. Different thread_id values execute fully in parallel.

How long is thread_id state retained? Retention is configured on the sandbox when you create it in the admin console. Options: 24 hours, 7 days, 30 days, or forever (the default). Only the diff is stored, not the full image, so thread snapshots are typically very small. They count toward your storage billing (first 10 GB free).

Can sandboxes access the internet? Only when configured with unrestricted network policy. The default is blocked, no outbound connections at all.

What’s the maximum execution time? 60 seconds on Free, 1 hour on Pro, configurable on Enterprise. See pricing for full tier details.

Can I use my own Docker image? Yes. Import any OCI-compatible image through the admin console. Baponi auto-discovers available interpreters, system libraries, and architecture.

Are credentials safe? Credentials exist only in a RAM-based mount during execution. They’re never written to disk, never persisted in thread state, and destroyed when execution completes. Only allowlisted environment variables (like PGPASSFILE or AWS_PROFILE) are injected. No raw passwords in the environment.

What’s the overhead per execution? Typically 12–18ms for sandbox setup. This is the time to create the isolated environment and set up mounts. Everything after that is your code running.

Can I run Baponi in my own infrastructure? Yes. Baponi deploys to any Kubernetes cluster with a single Helm chart. Your cloud, your VPC, your rules. Bring your own SSO/OIDC provider, your own PostgreSQL, your own monitoring stack. The same platform, same API, same admin console, just running on infrastructure you control. We also support air-gapped deployments for the most restricted environments.

Is there a Python or Node.js SDK? The Python SDK (pip install baponi) provides execute() for inline results, execute_stream() for real-time NDJSON streaming, and execute_webhook() for async webhook delivery. It also includes ready-made tool integrations for Anthropic, OpenAI, Google Gemini, LangChain, and CrewAI. The API is also a single HTTP endpoint, any HTTP client works without the SDK. For MCP-compatible tools, integration is zero-code.