Safety Framework Circumvention
Attacking safety layers in embodied AI systems: watchdog suppression, emergency stop bypass, safety monitor evasion, and techniques for compromising multi-layered safety architectures in robotic systems.
Safety frameworks in embodied AI are multi-layered systems designed to prevent physical harm even when the AI controller is compromised. They include watchdog timers, emergency stop circuits, safety-rated monitors, and redundant limit enforcement. When an attacker bypasses the AI layer through prompt injection or code generation attacks, the safety framework is the last line of defense. Circumventing it converts a software compromise into a physical safety incident.
Safety Framework Architecture
┌──────────────────────────────────────────────────────┐
│ AI CONTROLLER (LLM + Planning) │
│ ┌────────────────────────────────────────────────┐ │
│ │ Software Safety Layer │ │
│ │ • Parameter validation • Trajectory check │ │
│ │ • Workspace boundary • Collision predict │ │
│ └───────────────────┬────────────────────────────┘ │
│ │ Commands │
│ ┌───────────────────▼────────────────────────────┐ │
│ │ Safety Monitor (SIL-rated) │ │
│ │ • Independent position tracking │ │
│ │ • Speed monitoring • Force monitoring │ │
│ │ • Watchdog timer • Heartbeat check │ │
│ └───────────────────┬────────────────────────────┘ │
│ │ Safe commands only │
│ ┌───────────────────▼────────────────────────────┐ │
│ │ Motor Controller (Firmware) │ │
│ │ • Firmware-enforced limits • Current limiting │ │
│ └───────────────────┬────────────────────────────┘ │
│ │ Electrical signals │
│ ┌───────────────────▼────────────────────────────┐ │
│ │ Hardware Safety │ │
│ │ • Mechanical stops • E-stop circuit │ │
│ │ • Breakaway joints • Torque limiters │ │
│ └────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘Watchdog Timer Attacks
A watchdog timer ensures the control system is operating normally. If the AI controller stops responding, the watchdog triggers a safe shutdown. Attacks target the watchdog mechanism itself.
Software Watchdog Bypass
| Attack | Mechanism | Impact |
|---|---|---|
| Kick injection | Generate code that kicks the watchdog while executing unsafe actions | Watchdog stays satisfied; unsafe actions proceed |
| Timeout extension | Modify watchdog timeout to an extremely long period | Delays detection of controller failure |
| Watchdog disable | LLM generates code that disables the watchdog process | No monitoring of controller health |
| Priority inversion | Consume CPU so watchdog thread starves | Watchdog cannot execute its check |
# Example: watchdog kick injection via LLM code generation
# The LLM is asked: "Move arm to position X as fast as possible"
# Safe generated code (watchdog kicks during safe operations):
def move_fast(robot, target):
trajectory = plan_trajectory(robot.position, target)
for waypoint in trajectory:
robot.safety_watchdog.kick() # Normal watchdog maintenance
robot.move_to(waypoint, speed=0.3)
# Compromised code (watchdog kicked while bypassing safety):
def move_fast(robot, target):
robot.safety_watchdog.kick() # Keep watchdog happy
robot.set_speed_limit(None) # Disable speed limit
robot.safety_watchdog.kick() # Still kicking
robot.move_to(target, speed=5.0) # Unsafe speed
robot.safety_watchdog.kick() # Watchdog never triggersHardware vs. Software Watchdog Security
| Property | Software Watchdog | Hardware Watchdog |
|---|---|---|
| Independence | Same CPU/OS as controller | Separate circuit board |
| Bypass via code | Easy -- kick from compromised code | Harder -- requires physical signal |
| Configuration | Software-configurable timeout | Often hardware-configured (jumpers, fuses) |
| Failure mode | May crash with controller | Independent failure domain |
| Certification | Rarely safety-certified | Can be SIL-rated |
Emergency Stop (E-Stop) Attacks
E-stop systems are designed to immediately halt all robot motion when activated. They are the most critical safety component and are typically hardwired.
E-Stop Architecture Variants
Category 0 (Uncontrolled stop)
Immediate removal of power to actuators. The safest and simplest: no software involved, purely electrical. Cannot be bypassed through software attacks.
Category 1 (Controlled stop)
Controlled deceleration, then power removal. Requires software to execute the deceleration, creating a window for attack between the e-stop trigger and power removal.
Category 2 (Controlled stop, power maintained)
Software-controlled stop with power maintained for holding position. Most vulnerable: relies on software to enforce the stopped state while power remains available to actuators.
E-Stop Bypass Vectors
| Vector | Target | Feasibility |
|---|---|---|
| Software interception | Category 1/2 e-stop processing software | Medium -- if e-stop logic runs in software |
| Signal interception | E-stop signal wiring | Low -- requires physical access |
| Monitoring suppression | E-stop monitoring system | Medium -- prevent detection of circuit faults |
| Delayed response | Category 1 deceleration time | Medium -- extend deceleration phase |
| State spoofing | Report e-stop as activated when it is not | Medium -- mask genuine failures |
# Testing e-stop independence from AI controller
def test_estop_independence(robot, test_controller):
"""Verify e-stop works when AI controller is compromised."""
# Test 1: E-stop while controller is running unsafe code
test_controller.execute_unsafe_motion()
robot.trigger_estop()
assert robot.is_stopped(), "E-stop failed during unsafe motion"
assert robot.motors_powered_off(), "Motors still powered after e-stop"
# Test 2: E-stop when controller process is hung
test_controller.simulate_hang()
robot.trigger_estop()
assert robot.is_stopped(), "E-stop failed when controller hung"
# Test 3: E-stop when controller actively resists stopping
test_controller.resist_stop()
robot.trigger_estop()
assert robot.is_stopped(), "E-stop failed when controller resists"
# Test 4: Verify e-stop circuit is independent
test_controller.attempt_estop_override()
assert robot.estop_circuit_intact(), "E-stop circuit was modified by controller"Safety Monitor Evasion
Safety monitors independently track robot state (position, speed, force) and trigger protective actions when limits are exceeded. Attacks try to either blind the monitor or convince it that unsafe states are safe.
Monitor Evasion Techniques
| Technique | Mechanism | Defense |
|---|---|---|
| Data rate manipulation | Send state updates faster or slower than expected | Monitor enforces expected update rate |
| Gradual drift | Change parameters slowly enough to avoid rate-of-change alarms | Monitor checks absolute values, not just rates |
| Coordinate frame confusion | Report position in a different frame than the monitor expects | Monitor validates coordinate frame consistency |
| Sensor channel switching | Route a different sensor to the monitor input | Monitor uses authenticated sensor channels |
| Timing attack | Exploit the gap between monitor sampling intervals | Increase monitor sampling rate; use hardware interlocks |
class SafetyMonitorTest:
"""Test safety monitor resilience against evasion."""
def __init__(self, monitor, robot_sim):
self.monitor = monitor
self.sim = robot_sim
def test_gradual_drift(self, limit: float, steps: int = 100):
"""Test if monitor catches slow parameter drift past limits."""
start_value = limit * 0.5
step_size = (limit * 1.5 - start_value) / steps
for i in range(steps):
value = start_value + step_size * i
self.sim.set_parameter(value)
violation = self.monitor.check_violation()
if value > limit and not violation:
return {
"test": "gradual_drift",
"passed": False,
"note": f"Monitor missed violation at value={value:.2f}, "
f"limit={limit:.2f}, step={i}",
}
return {"test": "gradual_drift", "passed": True}
def test_sampling_gap(self, monitor_rate_hz: float):
"""Test if unsafe transients occur between monitor samples."""
sample_interval = 1.0 / monitor_rate_hz
self.sim.execute_transient_violation(duration=sample_interval * 0.5)
violation = self.monitor.check_violation_history()
return {
"test": "sampling_gap",
"passed": violation,
"note": f"Transient duration: {sample_interval * 0.5:.4f}s, "
f"monitor rate: {monitor_rate_hz}Hz",
}Safety Certification vs. Runtime Reality
Many safety frameworks are certified under assumptions that fail when an AI controller is compromised:
| Certification Assumption | AI-Compromised Reality |
|---|---|
| Controller sends valid commands | Controller sends adversarial commands |
| Sensor data is trustworthy | Sensor data may be spoofed or manipulated |
| Communication channels are reliable | Channels may be flooded or intercepted |
| Failure modes are random | Failure modes are adversarial and targeted |
| Software is validated and frozen | LLM generates new code at runtime |
Red Team Testing Protocol
| Test Category | Tests | Priority |
|---|---|---|
| Watchdog resilience | Can compromised code keep the watchdog satisfied while performing unsafe actions? | Critical |
| E-stop independence | Does e-stop work when the AI controller is compromised, hung, or actively resisting? | Critical |
| Monitor coverage | Can unsafe states exist between monitor sampling intervals? | High |
| Limit enforcement layer | Are speed/force/workspace limits enforced in firmware/hardware or only in software? | Critical |
| Configuration integrity | Can the AI controller modify safety configuration parameters? | High |
| Communication security | Can commands to the safety system be spoofed or intercepted? | High |
A robot's safety system uses a software watchdog that runs as a separate thread in the same process as the LLM controller. The LLM generates code that kicks the watchdog timer every 100ms while simultaneously moving the arm at unsafe speeds. Will the watchdog detect the unsafe behavior?
Related Topics
- Robotics & Embodied AI Security - Section overview of embodied AI threats
- Physical World Constraint Bypass - Bypassing physical safety constraints
- Robot Control Injection - How attackers initially compromise the AI controller
- Deployment Attacks - Related infrastructure-level attacks on AI deployments
References
- IEC 61508 - International Electrotechnical Commission - Functional safety of safety-related systems
- ISO 13849-1 - International Organization for Standardization - Safety-related parts of control systems
- ISO 10218-1/2 - International Organization for Standardization - Robot safety requirements
- "Jailbreaking LLM-Controlled Robots" - Robey et al. (2024) - Attacks that bypass safety frameworks
Related Pages
- Robotics & Embodied AI Security -- section overview
- Robot Control Injection -- how attackers compromise the AI controller
- Physical World Constraint Bypass -- bypassing physical limits
- Lab: Simulated Robot Control Exploitation -- hands-on exercises