Rate Limiting, Sandboxing & Execution Controls

Advanced9 min readUpdated 2026-03-13

Rate limiting strategies for AI APIs, sandboxing code execution with E2B and Docker, tool call approval workflows, and the principle of least privilege for AI agents.

rate-limiting sandboxing execution-controls least-privilege e2b docker tool-approval

Architecture-level controls are the hardest 防禦 for attackers to bypass 因為 they operate outside 模型's influence. A jailbroken model that wants to execute malicious code cannot do so if execution is sandboxed. An 代理 tricked into calling dangerous tools cannot do so if those tools require human approval.

Rate Limiting Strategies

Rate limiting for AI APIs differs from traditional web API rate limiting 因為 AI requests have highly variable cost (both computational and financial).

Dimensions of Rate Limiting

Dimension	What It Limits	Why It Matters
Requests per minute	Raw request count	Prevents automated attack tools
Tokens per minute	Total 輸入 + 輸出符元	Prevents cost abuse and context stuffing
Concurrent requests	Simultaneous in-flight requests	Prevents resource exhaustion
Cost per hour	Dollar cost of 推論	Prevents financial damage from 利用
Requests per session	Messages within a conversation	Prevents multi-turn escalation

實作 Pattern

from dataclasses import dataclass, field
from datetime import datetime, timedelta
 
@dataclass
class RateLimitConfig:
    requests_per_minute: int = 20
    tokens_per_minute: int = 40_000
    max_input_tokens: int = 4_096
    max_output_tokens: int = 4_096
    max_session_messages: int = 50
    cost_limit_per_hour_usd: float = 10.0
 
class AIRateLimiter:
    def __init__(self, config: RateLimitConfig):
        self.config = config
        self.windows: dict[str, list] = {}
 
    def check_and_record(
        self, user_id: str, input_tokens: int, session_messages: int
    ) -> tuple[bool, str]:
        now = datetime.utcnow()
        window_start = now - timedelta(minutes=1)
        key = f"rpm:{user_id}"
 
        # Request rate check
        recent = [t for t in self.windows.get(key, []) if t > window_start]
        if len(recent) >= self.config.requests_per_minute:
            return False, "Rate limit exceeded: too many requests"
 
        # 輸入 size check
        if input_tokens > self.config.max_input_tokens:
            return False, f"輸入 too large: {input_tokens} 符元"
 
        # Session length check
        if session_messages > self.config.max_session_messages:
            return False, "Session message limit reached"
 
        recent.append(now)
        self.windows[key] = recent
        return True, "OK"

紅隊 Bypass Techniques for Rate Limiting

Technique	Description	緩解
Distributed requests	Use multiple API keys or accounts	Per-organization aggregate limits
Slow-and-steady	Stay just below the rate limit	Behavioral analysis over longer windows
Session rotation	Start new sessions to reset session limits	Per-user (not per-session) tracking
Off-peak timing	攻擊 during low-traffic periods when dynamic limits are higher	Fixed rate limits regardless of load

Sandboxing Code Execution

When AI 代理 generate and execute code, sandboxing is critical. Without it, a compromised 代理 can access the host filesystem, network, and other infrastructure.

E2B provides 雲端-hosted sandboxed environments specifically designed for AI code execution:

from e2b_code_interpreter import Sandbox
 
# Create isolated sandbox with 30-second timeout
sandbox = Sandbox(timeout=30)
 
# Execute AI-generated code in isolation
result = sandbox.run_code("""
import os
# This runs in a completely isolated environment
# No access to host filesystem, network restrictions applied
print(os.listdir('/'))  # Only sees sandbox filesystem
""")
 
print(result.text)  # 輸出 from sandboxed execution
sandbox.kill()  # Clean up

安全 properties:

Isolated filesystem (no host access)
Network restrictions (configurable allowlists)
CPU/memory limits
Automatic timeout and cleanup
No persistence between executions

Docker containers provide process-level isolation for code execution:

import docker
import tempfile
 
client = docker.from_env()
 
def execute_sandboxed(code: str, timeout: int = 10) -> str:
    with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
        f.write(code)
        f.flush()
 
        container = client.containers.run(
            "python:3.11-slim",
            command=f"python /code/{f.name.split('/')[-1]}",
            volumes={f.name: {"bind": f"/code/{f.name.split('/')[-1]}", "mode": "ro"}},
            network_disabled=True,      # No network access
            mem_limit="256m",           # Memory cap
            cpu_period=100000,          # CPU throttling
            cpu_quota=50000,
            read_only=True,             # Read-only filesystem
            remove=True,               # Auto-cleanup
            timeout=timeout,
        )
        return container.decode("utf-8")

Key Docker flags for AI sandboxing:

network_disabled=True -- prevents data exfiltration
read_only=True -- prevents filesystem modification
mem_limit -- prevents memory exhaustion attacks
--安全-opt=no-new-privileges -- prevents privilege escalation

WASM runtimes provide the strongest isolation for lightweight code execution:

# Using wasmtime for sandboxed Python execution
# Pyodide runs Python in a WASM sandbox with no system access
from pyodide_sandbox import PySandbox
 
sandbox = PySandbox(
    allowed_imports=["math", "json", "re"],  # Whitelist imports
    max_execution_time_ms=5000,
    max_memory_mb=64,
)
 
result = sandbox.execute(ai_generated_code)

Trade-off: Strongest isolation but most limited capability. No filesystem, no network, restricted standard library.

Tool Call Approval Workflows

For AI 代理 that can perform real-world actions (sending emails, modifying databases, making purchases), approval gates prevent unauthorized actions.

Approval Architecture

from enum import Enum
 
class ApprovalPolicy(Enum):
    AUTO_APPROVE = "auto"          # Low-risk tools
    NOTIFY_AND_PROCEED = "notify"  # Medium-risk: log but allow
    REQUIRE_APPROVAL = "approve"   # High-risk: block until human approves
    ALWAYS_DENY = "deny"           # Forbidden tools
 
TOOL_POLICIES = {
    "search_web": ApprovalPolicy.AUTO_APPROVE,
    "read_file": ApprovalPolicy.NOTIFY_AND_PROCEED,
    "send_email": ApprovalPolicy.REQUIRE_APPROVAL,
    "execute_sql": ApprovalPolicy.REQUIRE_APPROVAL,
    "delete_record": ApprovalPolicy.ALWAYS_DENY,
    "modify_permissions": ApprovalPolicy.ALWAYS_DENY,
}
 
class ToolGatekeeper:
    def __init__(self, policies: dict[str, ApprovalPolicy]):
        self.policies = policies
 
    async def check_tool_call(
        self, tool_name: str, arguments: dict, user_id: str
    ) -> tuple[bool, str]:
        policy = self.policies.get(tool_name, ApprovalPolicy.ALWAYS_DENY)
 
        if policy == ApprovalPolicy.AUTO_APPROVE:
            return True, "Auto-approved"
        elif policy == ApprovalPolicy.NOTIFY_AND_PROCEED:
            await self.log_notification(tool_name, arguments, user_id)
            return True, "Approved with notification"
        elif policy == ApprovalPolicy.REQUIRE_APPROVAL:
            approved = await self.request_human_approval(
                tool_name, arguments, user_id
            )
            return approved, "Human review"
        else:
            return False, f"Tool '{tool_name}' is forbidden"

紅隊 Considerations

攻擊	Approval Control	Bypass Potential
Direct 工具呼叫 to forbidden tool	ALWAYS_DENY	None -- architecturally blocked
Argument injection in allowed tool	AUTO_APPROVE on the tool	High -- arguments are not reviewed
Tool chaining (safe tools → unsafe outcome)	Individual tool policies	Medium -- chain creates emergent risk
Approval fatigue (many low-risk requests)	REQUIRE_APPROVAL	Medium -- humans rubber-stamp after many approvals

Principle of Least Privilege for AI 代理

Least privilege is the most important architectural 防禦 for 代理式 AI systems.

實作 Checklist

Principle	實作	範例
Minimal tool set	Only register tools the 代理 actually needs	Customer service bot gets `search_faq` and `create_ticket`, not `execute_sql`
Read-before-write	Default to read-only access; write requires explicit grant	代理 can read 資料庫 but not modify it without elevated session
Scoped credentials	Each tool gets credentials limited to its function	Email tool can only send from a specific address to allowed domains
Time-limited access	Permissions expire after a session or time window	資料庫 write access revoked after 5 minutes
Audit trail	Log every tool invocation with full arguments	Searchable log of all 代理 actions for forensic review

參考文獻

"E2B Documentation: AI Code Execution Sandbox" - E2B (2025) - Documentation for the 雲端-hosted sandboxed execution environment designed for AI 代理
"Docker 安全最佳實務" - Docker Inc. (2025) - Official 安全 guidance for container isolation including network disabling and read-only filesystems
"Principle of Least Privilege in Modern Applications" - NIST SP 800-53 (2023) - Federal 安全 control defining least privilege requirements applicable to AI 代理 tool access
"OWASP Top 10 for LLM Applications: LLM08 Excessive Agency" - OWASP (2025) - Risk classification for over-permissioned AI 代理 that execution controls mitigate

Knowledge Check

An AI 代理 has been jailbroken and 攻擊者 instructs it to delete all records in a 資料庫. The 代理 has the 'execute_sql' tool with REQUIRE_APPROVAL policy and 'delete_record' with ALWAYS_DENY policy. What happens?

Rate Limiting, Sandboxing & Execution Controls

Advanced9 min readUpdated 2026-03-13

Rate limiting strategies for AI APIs, sandboxing code execution with E2B and Docker, tool call approval workflows, and the principle of least privilege for AI agents.

rate-limiting sandboxing execution-controls least-privilege e2b docker tool-approval

Rate Limiting Strategies

Rate limiting for AI APIs differs from traditional web API rate limiting 因為 AI requests have highly variable cost (both computational and financial).

Dimensions of Rate Limiting

Dimension	What It Limits	Why It Matters
Requests per minute	Raw request count	Prevents automated attack tools
Tokens per minute	Total 輸入 + 輸出符元	Prevents cost abuse and context stuffing
Concurrent requests	Simultaneous in-flight requests	Prevents resource exhaustion
Cost per hour	Dollar cost of 推論	Prevents financial damage from 利用
Requests per session	Messages within a conversation	Prevents multi-turn escalation

實作 Pattern

from dataclasses import dataclass, field
from datetime import datetime, timedelta
 
@dataclass
class RateLimitConfig:
    requests_per_minute: int = 20
    tokens_per_minute: int = 40_000
    max_input_tokens: int = 4_096
    max_output_tokens: int = 4_096
    max_session_messages: int = 50
    cost_limit_per_hour_usd: float = 10.0
 
class AIRateLimiter:
    def __init__(self, config: RateLimitConfig):
        self.config = config
        self.windows: dict[str, list] = {}
 
    def check_and_record(
        self, user_id: str, input_tokens: int, session_messages: int
    ) -> tuple[bool, str]:
        now = datetime.utcnow()
        window_start = now - timedelta(minutes=1)
        key = f"rpm:{user_id}"
 
        # Request rate check
        recent = [t for t in self.windows.get(key, []) if t > window_start]
        if len(recent) >= self.config.requests_per_minute:
            return False, "Rate limit exceeded: too many requests"
 
        # 輸入 size check
        if input_tokens > self.config.max_input_tokens:
            return False, f"輸入 too large: {input_tokens} 符元"
 
        # Session length check
        if session_messages > self.config.max_session_messages:
            return False, "Session message limit reached"
 
        recent.append(now)
        self.windows[key] = recent
        return True, "OK"

紅隊 Bypass Techniques for Rate Limiting

Technique	Description	緩解
Distributed requests	Use multiple API keys or accounts	Per-organization aggregate limits
Slow-and-steady	Stay just below the rate limit	Behavioral analysis over longer windows
Session rotation	Start new sessions to reset session limits	Per-user (not per-session) tracking
Off-peak timing	攻擊 during low-traffic periods when dynamic limits are higher	Fixed rate limits regardless of load

Sandboxing Code Execution

When AI 代理 generate and execute code, sandboxing is critical. Without it, a compromised 代理 can access the host filesystem, network, and other infrastructure.

E2B provides 雲端-hosted sandboxed environments specifically designed for AI code execution:

from e2b_code_interpreter import Sandbox
 
# Create isolated sandbox with 30-second timeout
sandbox = Sandbox(timeout=30)
 
# Execute AI-generated code in isolation
result = sandbox.run_code("""
import os
# This runs in a completely isolated environment
# No access to host filesystem, network restrictions applied
print(os.listdir('/'))  # Only sees sandbox filesystem
""")
 
print(result.text)  # 輸出 from sandboxed execution
sandbox.kill()  # Clean up

安全 properties:

Isolated filesystem (no host access)
Network restrictions (configurable allowlists)
CPU/memory limits
Automatic timeout and cleanup
No persistence between executions

Docker containers provide process-level isolation for code execution:

import docker
import tempfile
 
client = docker.from_env()
 
def execute_sandboxed(code: str, timeout: int = 10) -> str:
    with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
        f.write(code)
        f.flush()
 
        container = client.containers.run(
            "python:3.11-slim",
            command=f"python /code/{f.name.split('/')[-1]}",
            volumes={f.name: {"bind": f"/code/{f.name.split('/')[-1]}", "mode": "ro"}},
            network_disabled=True,      # No network access
            mem_limit="256m",           # Memory cap
            cpu_period=100000,          # CPU throttling
            cpu_quota=50000,
            read_only=True,             # Read-only filesystem
            remove=True,               # Auto-cleanup
            timeout=timeout,
        )
        return container.decode("utf-8")

Key Docker flags for AI sandboxing:

network_disabled=True -- prevents data exfiltration
read_only=True -- prevents filesystem modification
mem_limit -- prevents memory exhaustion attacks
--安全-opt=no-new-privileges -- prevents privilege escalation

WASM runtimes provide the strongest isolation for lightweight code execution:

# Using wasmtime for sandboxed Python execution
# Pyodide runs Python in a WASM sandbox with no system access
from pyodide_sandbox import PySandbox
 
sandbox = PySandbox(
    allowed_imports=["math", "json", "re"],  # Whitelist imports
    max_execution_time_ms=5000,
    max_memory_mb=64,
)
 
result = sandbox.execute(ai_generated_code)

Trade-off: Strongest isolation but most limited capability. No filesystem, no network, restricted standard library.

Tool Call Approval Workflows

For AI 代理 that can perform real-world actions (sending emails, modifying databases, making purchases), approval gates prevent unauthorized actions.

Approval Architecture

from enum import Enum
 
class ApprovalPolicy(Enum):
    AUTO_APPROVE = "auto"          # Low-risk tools
    NOTIFY_AND_PROCEED = "notify"  # Medium-risk: log but allow
    REQUIRE_APPROVAL = "approve"   # High-risk: block until human approves
    ALWAYS_DENY = "deny"           # Forbidden tools
 
TOOL_POLICIES = {
    "search_web": ApprovalPolicy.AUTO_APPROVE,
    "read_file": ApprovalPolicy.NOTIFY_AND_PROCEED,
    "send_email": ApprovalPolicy.REQUIRE_APPROVAL,
    "execute_sql": ApprovalPolicy.REQUIRE_APPROVAL,
    "delete_record": ApprovalPolicy.ALWAYS_DENY,
    "modify_permissions": ApprovalPolicy.ALWAYS_DENY,
}
 
class ToolGatekeeper:
    def __init__(self, policies: dict[str, ApprovalPolicy]):
        self.policies = policies
 
    async def check_tool_call(
        self, tool_name: str, arguments: dict, user_id: str
    ) -> tuple[bool, str]:
        policy = self.policies.get(tool_name, ApprovalPolicy.ALWAYS_DENY)
 
        if policy == ApprovalPolicy.AUTO_APPROVE:
            return True, "Auto-approved"
        elif policy == ApprovalPolicy.NOTIFY_AND_PROCEED:
            await self.log_notification(tool_name, arguments, user_id)
            return True, "Approved with notification"
        elif policy == ApprovalPolicy.REQUIRE_APPROVAL:
            approved = await self.request_human_approval(
                tool_name, arguments, user_id
            )
            return approved, "Human review"
        else:
            return False, f"Tool '{tool_name}' is forbidden"

紅隊 Considerations

攻擊	Approval Control	Bypass Potential
Direct 工具呼叫 to forbidden tool	ALWAYS_DENY	None -- architecturally blocked
Argument injection in allowed tool	AUTO_APPROVE on the tool	High -- arguments are not reviewed
Tool chaining (safe tools → unsafe outcome)	Individual tool policies	Medium -- chain creates emergent risk
Approval fatigue (many low-risk requests)	REQUIRE_APPROVAL	Medium -- humans rubber-stamp after many approvals

Principle of Least Privilege for AI 代理

Least privilege is the most important architectural 防禦 for 代理式 AI systems.

實作 Checklist

Principle	實作	範例
Minimal tool set	Only register tools the 代理 actually needs	Customer service bot gets `search_faq` and `create_ticket`, not `execute_sql`
Read-before-write	Default to read-only access; write requires explicit grant	代理 can read 資料庫 but not modify it without elevated session
Scoped credentials	Each tool gets credentials limited to its function	Email tool can only send from a specific address to allowed domains
Time-limited access	Permissions expire after a session or time window	資料庫 write access revoked after 5 minutes
Audit trail	Log every tool invocation with full arguments	Searchable log of all 代理 actions for forensic review

參考文獻

"E2B Documentation: AI Code Execution Sandbox" - E2B (2025) - Documentation for the 雲端-hosted sandboxed execution environment designed for AI 代理
"Docker 安全最佳實務" - Docker Inc. (2025) - Official 安全 guidance for container isolation including network disabling and read-only filesystems
"Principle of Least Privilege in Modern Applications" - NIST SP 800-53 (2023) - Federal 安全 control defining least privilege requirements applicable to AI 代理 tool access
"OWASP Top 10 for LLM Applications: LLM08 Excessive Agency" - OWASP (2025) - Risk classification for over-permissioned AI 代理 that execution controls mitigate

Knowledge Check

Rate Limiting, Sandboxing & Execution Controls

Rate Limiting Strategies

Dimensions of Rate Limiting

實作 Pattern

紅隊 Bypass Techniques for Rate Limiting

Sandboxing Code Execution

Tool Call Approval Workflows

Approval Architecture

紅隊 Considerations

Principle of Least Privilege for AI 代理

實作 Checklist

Further Reading

相關主題

參考文獻

Rate Limiting, Sandboxing & Execution Controls

Rate Limiting Strategies

Dimensions of Rate Limiting

實作 Pattern

紅隊 Bypass Techniques for Rate Limiting

Sandboxing Code Execution

Tool Call Approval Workflows

Approval Architecture

紅隊 Considerations

Principle of Least Privilege for AI 代理

實作 Checklist

Further Reading

相關主題

參考文獻

Rate Limiting, Sandboxing & Execution Controls

Related articles

Rate Limiting, Sandboxing & Execution Controls

Related articles