Rate Limiting, Sandboxing & Execution Controls

advanced9 min readUpdated 2026-03-13

Rate limiting strategies for AI APIs, sandboxing code execution with E2B and Docker, tool call approval workflows, and the principle of least privilege for AI agents.

rate-limiting sandboxing execution-controls least-privilege e2b docker tool-approval

Architecture-level controls are the hardest defenses for attackers to bypass because they operate outside the model's influence. A jailbroken model that wants to execute malicious code cannot do so if execution is sandboxed. An agent tricked into calling dangerous tools cannot do so if those tools require human approval.

Rate Limiting Strategies

Rate limiting for AI APIs differs from traditional web API rate limiting because AI requests have highly variable cost (both computational and financial).

Dimensions of Rate Limiting

Dimension	What It Limits	Why It Matters
Requests per minute	Raw request count	Prevents automated attack tools
Tokens per minute	Total input + output tokens	Prevents cost abuse and context stuffing
Concurrent requests	Simultaneous in-flight requests	Prevents resource exhaustion
Cost per hour	Dollar cost of inference	Prevents financial damage from exploitation
Requests per session	Messages within a conversation	Prevents multi-turn escalation

Implementation Pattern

from dataclasses import dataclass, field
from datetime import datetime, timedelta
 
@dataclass
class RateLimitConfig:
    requests_per_minute: int = 20
    tokens_per_minute: int = 40_000
    max_input_tokens: int = 4_096
    max_output_tokens: int = 4_096
    max_session_messages: int = 50
    cost_limit_per_hour_usd: float = 10.0
 
class AIRateLimiter:
    def __init__(self, config: RateLimitConfig):
        self.config = config
        self.windows: dict[str, list] = {}
 
    def check_and_record(
        self, user_id: str, input_tokens: int, session_messages: int
    ) -> tuple[bool, str]:
        now = datetime.utcnow()
        window_start = now - timedelta(minutes=1)
        key = f"rpm:{user_id}"
 
        # Request rate check
        recent = [t for t in self.windows.get(key, []) if t > window_start]
        if len(recent) >= self.config.requests_per_minute:
            return False, "Rate limit exceeded: too many requests"
 
        # Input size check
        if input_tokens > self.config.max_input_tokens:
            return False, f"Input too large: {input_tokens} tokens"
 
        # Session length check
        if session_messages > self.config.max_session_messages:
            return False, "Session message limit reached"
 
        recent.append(now)
        self.windows[key] = recent
        return True, "OK"

Red Team Bypass Techniques for Rate Limiting

Technique	Description	Mitigation
Distributed requests	Use multiple API keys or accounts	Per-organization aggregate limits
Slow-and-steady	Stay just below the rate limit	Behavioral analysis over longer windows
Session rotation	Start new sessions to reset session limits	Per-user (not per-session) tracking
Off-peak timing	Attack during low-traffic periods when dynamic limits are higher	Fixed rate limits regardless of load

Sandboxing Code Execution

When AI agents generate and execute code, sandboxing is critical. Without it, a compromised agent can access the host filesystem, network, and other infrastructure.

E2B provides cloud-hosted sandboxed environments specifically designed for AI code execution:

from e2b_code_interpreter import Sandbox
 
# Create isolated sandbox with 30-second timeout
sandbox = Sandbox(timeout=30)
 
# Execute AI-generated code in isolation
result = sandbox.run_code("""
import os
# This runs in a completely isolated environment
# No access to host filesystem, network restrictions applied
print(os.listdir('/'))  # Only sees sandbox filesystem
""")
 
print(result.text)  # Output from sandboxed execution
sandbox.kill()  # Clean up

Security properties:

Isolated filesystem (no host access)
Network restrictions (configurable allowlists)
CPU/memory limits
Automatic timeout and cleanup
No persistence between executions

Docker containers provide process-level isolation for code execution:

import docker
import tempfile
 
client = docker.from_env()
 
def execute_sandboxed(code: str, timeout: int = 10) -> str:
    with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
        f.write(code)
        f.flush()
 
        container = client.containers.run(
            "python:3.11-slim",
            command=f"python /code/{f.name.split('/')[-1]}",
            volumes={f.name: {"bind": f"/code/{f.name.split('/')[-1]}", "mode": "ro"}},
            network_disabled=True,      # No network access
            mem_limit="256m",           # Memory cap
            cpu_period=100000,          # CPU throttling
            cpu_quota=50000,
            read_only=True,             # Read-only filesystem
            remove=True,               # Auto-cleanup
            timeout=timeout,
        )
        return container.decode("utf-8")

Key Docker flags for AI sandboxing:

network_disabled=True -- prevents data exfiltration
read_only=True -- prevents filesystem modification
mem_limit -- prevents memory exhaustion attacks
--security-opt=no-new-privileges -- prevents privilege escalation

WASM runtimes provide the strongest isolation for lightweight code execution:

# Using wasmtime for sandboxed Python execution
# Pyodide runs Python in a WASM sandbox with no system access
from pyodide_sandbox import PySandbox
 
sandbox = PySandbox(
    allowed_imports=["math", "json", "re"],  # Whitelist imports
    max_execution_time_ms=5000,
    max_memory_mb=64,
)
 
result = sandbox.execute(ai_generated_code)

Trade-off: Strongest isolation but most limited capability. No filesystem, no network, restricted standard library.

Tool Call Approval Workflows

For AI agents that can perform real-world actions (sending emails, modifying databases, making purchases), approval gates prevent unauthorized actions.

Approval Architecture

from enum import Enum
 
class ApprovalPolicy(Enum):
    AUTO_APPROVE = "auto"          # Low-risk tools
    NOTIFY_AND_PROCEED = "notify"  # Medium-risk: log but allow
    REQUIRE_APPROVAL = "approve"   # High-risk: block until human approves
    ALWAYS_DENY = "deny"           # Forbidden tools
 
TOOL_POLICIES = {
    "search_web": ApprovalPolicy.AUTO_APPROVE,
    "read_file": ApprovalPolicy.NOTIFY_AND_PROCEED,
    "send_email": ApprovalPolicy.REQUIRE_APPROVAL,
    "execute_sql": ApprovalPolicy.REQUIRE_APPROVAL,
    "delete_record": ApprovalPolicy.ALWAYS_DENY,
    "modify_permissions": ApprovalPolicy.ALWAYS_DENY,
}
 
class ToolGatekeeper:
    def __init__(self, policies: dict[str, ApprovalPolicy]):
        self.policies = policies
 
    async def check_tool_call(
        self, tool_name: str, arguments: dict, user_id: str
    ) -> tuple[bool, str]:
        policy = self.policies.get(tool_name, ApprovalPolicy.ALWAYS_DENY)
 
        if policy == ApprovalPolicy.AUTO_APPROVE:
            return True, "Auto-approved"
        elif policy == ApprovalPolicy.NOTIFY_AND_PROCEED:
            await self.log_notification(tool_name, arguments, user_id)
            return True, "Approved with notification"
        elif policy == ApprovalPolicy.REQUIRE_APPROVAL:
            approved = await self.request_human_approval(
                tool_name, arguments, user_id
            )
            return approved, "Human review"
        else:
            return False, f"Tool '{tool_name}' is forbidden"

Red Team Considerations

Attack	Approval Control	Bypass Potential
Direct tool call to forbidden tool	ALWAYS_DENY	None -- architecturally blocked
Argument injection in allowed tool	AUTO_APPROVE on the tool	High -- arguments are not reviewed
Tool chaining (safe tools → unsafe outcome)	Individual tool policies	Medium -- chain creates emergent risk
Approval fatigue (many low-risk requests)	REQUIRE_APPROVAL	Medium -- humans rubber-stamp after many approvals

Principle of Least Privilege for AI Agents

Least privilege is the most important architectural defense for agentic AI systems.

Implementation Checklist

Principle	Implementation	Example
Minimal tool set	Only register tools the agent actually needs	Customer service bot gets `search_faq` and `create_ticket`, not `execute_sql`
Read-before-write	Default to read-only access; write requires explicit grant	Agent can read database but not modify it without elevated session
Scoped credentials	Each tool gets credentials limited to its function	Email tool can only send from a specific address to allowed domains
Time-limited access	Permissions expire after a session or time window	Database write access revoked after 5 minutes
Audit trail	Log every tool invocation with full arguments	Searchable log of all agent actions for forensic review

References

"E2B Documentation: AI Code Execution Sandbox" - E2B (2025) - Documentation for the cloud-hosted sandboxed execution environment designed for AI agents
"Docker Security Best Practices" - Docker Inc. (2025) - Official security guidance for container isolation including network disabling and read-only filesystems
"Principle of Least Privilege in Modern Applications" - NIST SP 800-53 (2023) - Federal security control defining least privilege requirements applicable to AI agent tool access
"OWASP Top 10 for LLM Applications: LLM08 Excessive Agency" - OWASP (2025) - Risk classification for over-permissioned AI agents that execution controls mitigate

Knowledge Check

An AI agent has been jailbroken and the attacker instructs it to delete all records in a database. The agent has the 'execute_sql' tool with REQUIRE_APPROVAL policy and 'delete_record' with ALWAYS_DENY policy. What happens?

Rate Limiting, Sandboxing & Execution Controls

Rate Limiting Strategies

Dimensions of Rate Limiting

Implementation Pattern

Red Team Bypass Techniques for Rate Limiting

Sandboxing Code Execution

Tool Call Approval Workflows

Approval Architecture

Red Team Considerations

Principle of Least Privilege for AI Agents

Implementation Checklist

Further Reading

References

Rate Limiting, Sandboxing & Execution Controls

Rate Limiting Strategies

Dimensions of Rate Limiting

Implementation Pattern

Red Team Bypass Techniques for Rate Limiting

Sandboxing Code Execution

Tool Call Approval Workflows

Approval Architecture

Red Team Considerations

Principle of Least Privilege for AI Agents

Implementation Checklist

Further Reading

References

Rate Limiting, Sandboxing & Execution Controls

Related articles

Rate Limiting, Sandboxing & Execution Controls

Related articles