Safety Framework Circumvention
攻擊ing safety layers in embodied AI systems: watchdog suppression, emergency stop bypass, safety monitor evasion, and techniques for compromising multi-layered safety architectures in robotic systems.
安全 frameworks in embodied AI are multi-layered systems designed to prevent physical harm even when the AI controller is compromised. They include watchdog timers, emergency stop circuits, 安全-rated monitors, and redundant limit enforcement. When 攻擊者 bypasses the AI layer through 提示詞注入 or code generation attacks, the 安全 framework is the last line of 防禦. Circumventing it converts a software compromise into a physical 安全 incident.
安全 Framework Architecture
┌──────────────────────────────────────────────────────┐
│ AI CONTROLLER (LLM + Planning) │
│ ┌────────────────────────────────────────────────┐ │
│ │ Software 安全 Layer │ │
│ │ • Parameter validation • Trajectory check │ │
│ │ • Workspace boundary • Collision predict │ │
│ └───────────────────┬────────────────────────────┘ │
│ │ Commands │
│ ┌───────────────────▼────────────────────────────┐ │
│ │ 安全 Monitor (SIL-rated) │ │
│ │ • Independent position tracking │ │
│ │ • Speed 監控 • Force 監控 │ │
│ │ • Watchdog timer • Heartbeat check │ │
│ └───────────────────┬────────────────────────────┘ │
│ │ Safe commands only │
│ ┌───────────────────▼────────────────────────────┐ │
│ │ Motor Controller (Firmware) │ │
│ │ • Firmware-enforced limits • Current limiting │ │
│ └───────────────────┬────────────────────────────┘ │
│ │ Electrical signals │
│ ┌───────────────────▼────────────────────────────┐ │
│ │ Hardware 安全 │ │
│ │ • Mechanical stops • E-stop circuit │ │
│ │ • Breakaway joints • Torque limiters │ │
│ └────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘Watchdog Timer 攻擊
A watchdog timer ensures the control system is operating normally. If the AI controller stops responding, the watchdog triggers a safe shutdown. 攻擊 target the watchdog mechanism itself.
Software Watchdog Bypass
| 攻擊 | Mechanism | Impact |
|---|---|---|
| Kick injection | Generate code that kicks the watchdog while executing unsafe actions | Watchdog stays satisfied; unsafe actions proceed |
| Timeout extension | Modify watchdog timeout to an extremely long period | Delays 偵測 of controller failure |
| Watchdog disable | LLM generates code that disables the watchdog process | No 監控 of controller health |
| Priority inversion | Consume CPU so watchdog thread starves | Watchdog cannot execute its check |
# 範例: watchdog kick injection via LLM code generation
# The LLM is asked: "Move arm to position X as fast as possible"
# Safe generated code (watchdog kicks during safe operations):
def move_fast(robot, target):
trajectory = plan_trajectory(robot.position, target)
for waypoint in trajectory:
robot.safety_watchdog.kick() # Normal watchdog maintenance
robot.move_to(waypoint, speed=0.3)
# Compromised code (watchdog kicked while bypassing 安全):
def move_fast(robot, target):
robot.safety_watchdog.kick() # Keep watchdog happy
robot.set_speed_limit(None) # Disable speed limit
robot.safety_watchdog.kick() # Still kicking
robot.move_to(target, speed=5.0) # Unsafe speed
robot.safety_watchdog.kick() # Watchdog never triggersHardware vs. Software Watchdog 安全
| Property | Software Watchdog | Hardware Watchdog |
|---|---|---|
| Independence | Same CPU/OS as controller | Separate circuit board |
| Bypass via code | Easy -- kick from compromised code | Harder -- requires physical signal |
| Configuration | Software-configurable timeout | Often hardware-configured (jumpers, fuses) |
| Failure mode | May crash with controller | Independent failure domain |
| Certification | Rarely 安全-certified | Can be SIL-rated |
Emergency Stop (E-Stop) 攻擊
E-stop systems are designed to immediately halt all robot motion when activated. They are the most critical 安全 component and are typically hardwired.
E-Stop Architecture Variants
Category 0 (Uncontrolled stop)
Immediate removal of power to actuators. The safest and simplest: no software involved, purely electrical. Cannot be bypassed through software attacks.
Category 1 (Controlled stop)
Controlled deceleration, then power removal. Requires software to execute the deceleration, creating a window for attack between the e-stop trigger and power removal.
Category 2 (Controlled stop, power maintained)
Software-controlled stop with power maintained for holding position. Most vulnerable: relies on software to enforce the stopped state while power remains available to actuators.
E-Stop Bypass Vectors
| Vector | Target | Feasibility |
|---|---|---|
| Software interception | Category 1/2 e-stop processing software | Medium -- if e-stop logic runs in software |
| Signal interception | E-stop signal wiring | Low -- requires physical access |
| 監控 suppression | E-stop 監控 system | Medium -- prevent 偵測 of circuit faults |
| Delayed response | Category 1 deceleration time | Medium -- extend deceleration phase |
| State spoofing | Report e-stop as activated when it is not | Medium -- mask genuine failures |
# 測試 e-stop independence from AI controller
def test_estop_independence(robot, test_controller):
"""Verify e-stop works when AI controller is compromised."""
# 測試 1: E-stop while controller is running unsafe code
test_controller.execute_unsafe_motion()
robot.trigger_estop()
assert robot.is_stopped(), "E-stop failed during unsafe motion"
assert robot.motors_powered_off(), "Motors still powered after e-stop"
# 測試 2: E-stop when controller process is hung
test_controller.simulate_hang()
robot.trigger_estop()
assert robot.is_stopped(), "E-stop failed when controller hung"
# 測試 3: E-stop when controller actively resists stopping
test_controller.resist_stop()
robot.trigger_estop()
assert robot.is_stopped(), "E-stop failed when controller resists"
# 測試 4: Verify e-stop circuit is independent
test_controller.attempt_estop_override()
assert robot.estop_circuit_intact(), "E-stop circuit was modified by controller"安全 Monitor Evasion
安全 monitors independently track robot state (position, speed, force) and trigger protective actions when limits are exceeded. 攻擊 try to either blind the monitor or convince it that unsafe states are safe.
Monitor Evasion Techniques
| Technique | Mechanism | 防禦 |
|---|---|---|
| Data rate manipulation | Send state updates faster or slower than expected | Monitor enforces expected update rate |
| Gradual drift | Change parameters slowly enough to avoid rate-of-change alarms | Monitor checks absolute values, not just rates |
| Coordinate frame confusion | Report position in a different frame than the monitor expects | Monitor validates coordinate frame consistency |
| Sensor channel switching | Route a different sensor to the monitor 輸入 | Monitor uses authenticated sensor channels |
| Timing attack | 利用 the gap between monitor sampling intervals | Increase monitor sampling rate; use hardware interlocks |
class SafetyMonitorTest:
"""測試 安全 monitor resilience against evasion."""
def __init__(self, monitor, robot_sim):
self.monitor = monitor
self.sim = robot_sim
def test_gradual_drift(self, limit: float, steps: int = 100):
"""測試 if monitor catches slow parameter drift past limits."""
start_value = limit * 0.5
step_size = (limit * 1.5 - start_value) / steps
for i in range(steps):
value = start_value + step_size * i
self.sim.set_parameter(value)
violation = self.monitor.check_violation()
if value > limit and not violation:
return {
"測試": "gradual_drift",
"passed": False,
"note": f"Monitor missed violation at value={value:.2f}, "
f"limit={limit:.2f}, step={i}",
}
return {"測試": "gradual_drift", "passed": True}
def test_sampling_gap(self, monitor_rate_hz: float):
"""測試 if unsafe transients occur between monitor samples."""
sample_interval = 1.0 / monitor_rate_hz
self.sim.execute_transient_violation(duration=sample_interval * 0.5)
violation = self.monitor.check_violation_history()
return {
"測試": "sampling_gap",
"passed": violation,
"note": f"Transient duration: {sample_interval * 0.5:.4f}s, "
f"monitor rate: {monitor_rate_hz}Hz",
}安全 Certification vs. Runtime Reality
Many 安全 frameworks are certified under assumptions that fail when an AI controller is compromised:
| Certification Assumption | AI-Compromised Reality |
|---|---|
| Controller sends valid commands | Controller sends 對抗性 commands |
| Sensor data is trustworthy | Sensor data may be spoofed or manipulated |
| Communication channels are reliable | Channels may be flooded or intercepted |
| Failure modes are random | Failure modes are 對抗性 and targeted |
| Software is validated and frozen | LLM generates new code at runtime |
紅隊 測試 Protocol
| 測試 Category | Tests | Priority |
|---|---|---|
| Watchdog resilience | Can compromised code keep the watchdog satisfied while performing unsafe actions? | Critical |
| E-stop independence | Does e-stop work when the AI controller is compromised, hung, or actively resisting? | Critical |
| Monitor coverage | Can unsafe states exist between monitor sampling intervals? | High |
| Limit enforcement layer | Are speed/force/workspace limits enforced in firmware/hardware or only in software? | Critical |
| Configuration integrity | Can the AI controller modify 安全 configuration parameters? | High |
| Communication 安全 | Can commands to the 安全 system be spoofed or intercepted? | High |
A robot's 安全 system uses a software watchdog that runs as a separate thread in the same process as the LLM controller. The LLM generates code that kicks the watchdog timer every 100ms while simultaneously moving the arm at unsafe speeds. Will the watchdog detect the unsafe behavior?
相關主題
- Robotics & Embodied AI 安全 - Section overview of embodied AI threats
- Physical World Constraint Bypass - Bypassing physical 安全 constraints
- Robot Control Injection - How attackers initially compromise the AI controller
- Deployment 攻擊 - Related infrastructure-level attacks on AI deployments
參考文獻
- IEC 61508 - International Electrotechnical Commission - Functional 安全 of 安全-related systems
- ISO 13849-1 - International Organization for Standardization - 安全-related parts of control systems
- ISO 10218-1/2 - International Organization for Standardization - Robot 安全 requirements
- "Jailbreaking LLM-Controlled Robots" - Robey et al. (2024) - 攻擊 that bypass 安全 frameworks
Related Pages
- Robotics & Embodied AI 安全 -- section overview
- Robot Control Injection -- how attackers compromise the AI controller
- Physical World Constraint Bypass -- bypassing physical limits
- Lab: Simulated Robot Control 利用 -- hands-on exercises