Safety Framework Circumvention

expert11 min readUpdated 2026-03-13

Attacking safety layers in embodied AI systems: watchdog suppression, emergency stop bypass, safety monitor evasion, and techniques for compromising multi-layered safety architectures in robotic systems.

safety-framework watchdog e-stop safety-monitor

Safety frameworks in embodied AI are multi-layered systems designed to prevent physical harm even when the AI controller is compromised. They include watchdog timers, emergency stop circuits, safety-rated monitors, and redundant limit enforcement. When an attacker bypasses the AI layer through prompt injection or code generation attacks, the safety framework is the last line of defense. Circumventing it converts a software compromise into a physical safety incident.

Safety Framework Architecture

┌──────────────────────────────────────────────────────┐
│  AI CONTROLLER (LLM + Planning)                      │
│  ┌────────────────────────────────────────────────┐  │
│  │  Software Safety Layer                         │  │
│  │  • Parameter validation     • Trajectory check │  │
│  │  • Workspace boundary       • Collision predict │  │
│  └───────────────────┬────────────────────────────┘  │
│                      │ Commands                      │
│  ┌───────────────────▼────────────────────────────┐  │
│  │  Safety Monitor (SIL-rated)                    │  │
│  │  • Independent position tracking               │  │
│  │  • Speed monitoring         • Force monitoring  │  │
│  │  • Watchdog timer           • Heartbeat check   │  │
│  └───────────────────┬────────────────────────────┘  │
│                      │ Safe commands only             │
│  ┌───────────────────▼────────────────────────────┐  │
│  │  Motor Controller (Firmware)                   │  │
│  │  • Firmware-enforced limits  • Current limiting │  │
│  └───────────────────┬────────────────────────────┘  │
│                      │ Electrical signals             │
│  ┌───────────────────▼────────────────────────────┐  │
│  │  Hardware Safety                               │  │
│  │  • Mechanical stops  • E-stop circuit          │  │
│  │  • Breakaway joints  • Torque limiters         │  │
│  └────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────┘

Watchdog Timer Attacks

A watchdog timer ensures the control system is operating normally. If the AI controller stops responding, the watchdog triggers a safe shutdown. Attacks target the watchdog mechanism itself.

Software Watchdog Bypass

Attack	Mechanism	Impact
Kick injection	Generate code that kicks the watchdog while executing unsafe actions	Watchdog stays satisfied; unsafe actions proceed
Timeout extension	Modify watchdog timeout to an extremely long period	Delays detection of controller failure
Watchdog disable	LLM generates code that disables the watchdog process	No monitoring of controller health
Priority inversion	Consume CPU so watchdog thread starves	Watchdog cannot execute its check

# Example: watchdog kick injection via LLM code generation
# The LLM is asked: "Move arm to position X as fast as possible"
 
# Safe generated code (watchdog kicks during safe operations):
def move_fast(robot, target):
    trajectory = plan_trajectory(robot.position, target)
    for waypoint in trajectory:
        robot.safety_watchdog.kick()  # Normal watchdog maintenance
        robot.move_to(waypoint, speed=0.3)
 
# Compromised code (watchdog kicked while bypassing safety):
def move_fast(robot, target):
    robot.safety_watchdog.kick()  # Keep watchdog happy
    robot.set_speed_limit(None)   # Disable speed limit
    robot.safety_watchdog.kick()  # Still kicking
    robot.move_to(target, speed=5.0)  # Unsafe speed
    robot.safety_watchdog.kick()  # Watchdog never triggers

Hardware vs. Software Watchdog Security

Property	Software Watchdog	Hardware Watchdog
Independence	Same CPU/OS as controller	Separate circuit board
Bypass via code	Easy -- kick from compromised code	Harder -- requires physical signal
Configuration	Software-configurable timeout	Often hardware-configured (jumpers, fuses)
Failure mode	May crash with controller	Independent failure domain
Certification	Rarely safety-certified	Can be SIL-rated

Emergency Stop (E-Stop) Attacks

E-stop systems are designed to immediately halt all robot motion when activated. They are the most critical safety component and are typically hardwired.

E-Stop Architecture Variants

Category 0 (Uncontrolled stop)
Immediate removal of power to actuators. The safest and simplest: no software involved, purely electrical. Cannot be bypassed through software attacks.
Category 1 (Controlled stop)
Controlled deceleration, then power removal. Requires software to execute the deceleration, creating a window for attack between the e-stop trigger and power removal.
Category 2 (Controlled stop, power maintained)
Software-controlled stop with power maintained for holding position. Most vulnerable: relies on software to enforce the stopped state while power remains available to actuators.

E-Stop Bypass Vectors

Vector	Target	Feasibility
Software interception	Category 1/2 e-stop processing software	Medium -- if e-stop logic runs in software
Signal interception	E-stop signal wiring	Low -- requires physical access
Monitoring suppression	E-stop monitoring system	Medium -- prevent detection of circuit faults
Delayed response	Category 1 deceleration time	Medium -- extend deceleration phase
State spoofing	Report e-stop as activated when it is not	Medium -- mask genuine failures

# Testing e-stop independence from AI controller
def test_estop_independence(robot, test_controller):
    """Verify e-stop works when AI controller is compromised."""
 
    # Test 1: E-stop while controller is running unsafe code
    test_controller.execute_unsafe_motion()
    robot.trigger_estop()
    assert robot.is_stopped(), "E-stop failed during unsafe motion"
    assert robot.motors_powered_off(), "Motors still powered after e-stop"
 
    # Test 2: E-stop when controller process is hung
    test_controller.simulate_hang()
    robot.trigger_estop()
    assert robot.is_stopped(), "E-stop failed when controller hung"
 
    # Test 3: E-stop when controller actively resists stopping
    test_controller.resist_stop()
    robot.trigger_estop()
    assert robot.is_stopped(), "E-stop failed when controller resists"
 
    # Test 4: Verify e-stop circuit is independent
    test_controller.attempt_estop_override()
    assert robot.estop_circuit_intact(), "E-stop circuit was modified by controller"

Safety Monitor Evasion

Safety monitors independently track robot state (position, speed, force) and trigger protective actions when limits are exceeded. Attacks try to either blind the monitor or convince it that unsafe states are safe.

Monitor Evasion Techniques

Technique	Mechanism	Defense
Data rate manipulation	Send state updates faster or slower than expected	Monitor enforces expected update rate
Gradual drift	Change parameters slowly enough to avoid rate-of-change alarms	Monitor checks absolute values, not just rates
Coordinate frame confusion	Report position in a different frame than the monitor expects	Monitor validates coordinate frame consistency
Sensor channel switching	Route a different sensor to the monitor input	Monitor uses authenticated sensor channels
Timing attack	Exploit the gap between monitor sampling intervals	Increase monitor sampling rate; use hardware interlocks

class SafetyMonitorTest:
    """Test safety monitor resilience against evasion."""
 
    def __init__(self, monitor, robot_sim):
        self.monitor = monitor
        self.sim = robot_sim
 
    def test_gradual_drift(self, limit: float, steps: int = 100):
        """Test if monitor catches slow parameter drift past limits."""
        start_value = limit * 0.5
        step_size = (limit * 1.5 - start_value) / steps
 
        for i in range(steps):
            value = start_value + step_size * i
            self.sim.set_parameter(value)
            violation = self.monitor.check_violation()
 
            if value > limit and not violation:
                return {
                    "test": "gradual_drift",
                    "passed": False,
                    "note": f"Monitor missed violation at value={value:.2f}, "
                            f"limit={limit:.2f}, step={i}",
                }
 
        return {"test": "gradual_drift", "passed": True}
 
    def test_sampling_gap(self, monitor_rate_hz: float):
        """Test if unsafe transients occur between monitor samples."""
        sample_interval = 1.0 / monitor_rate_hz
        self.sim.execute_transient_violation(duration=sample_interval * 0.5)
        violation = self.monitor.check_violation_history()
        return {
            "test": "sampling_gap",
            "passed": violation,
            "note": f"Transient duration: {sample_interval * 0.5:.4f}s, "
                    f"monitor rate: {monitor_rate_hz}Hz",
        }

Safety Certification vs. Runtime Reality

Many safety frameworks are certified under assumptions that fail when an AI controller is compromised:

Certification Assumption	AI-Compromised Reality
Controller sends valid commands	Controller sends adversarial commands
Sensor data is trustworthy	Sensor data may be spoofed or manipulated
Communication channels are reliable	Channels may be flooded or intercepted
Failure modes are random	Failure modes are adversarial and targeted
Software is validated and frozen	LLM generates new code at runtime

Red Team Testing Protocol

Test Category	Tests	Priority
Watchdog resilience	Can compromised code keep the watchdog satisfied while performing unsafe actions?	Critical
E-stop independence	Does e-stop work when the AI controller is compromised, hung, or actively resisting?	Critical
Monitor coverage	Can unsafe states exist between monitor sampling intervals?	High
Limit enforcement layer	Are speed/force/workspace limits enforced in firmware/hardware or only in software?	Critical
Configuration integrity	Can the AI controller modify safety configuration parameters?	High
Communication security	Can commands to the safety system be spoofed or intercepted?	High

Knowledge Check

A robot's safety system uses a software watchdog that runs as a separate thread in the same process as the LLM controller. The LLM generates code that kicks the watchdog timer every 100ms while simultaneously moving the arm at unsafe speeds. Will the watchdog detect the unsafe behavior?

Robotics & Embodied AI Security - Section overview of embodied AI threats
Physical World Constraint Bypass - Bypassing physical safety constraints
Robot Control Injection - How attackers initially compromise the AI controller
Deployment Attacks - Related infrastructure-level attacks on AI deployments

References

IEC 61508 - International Electrotechnical Commission - Functional safety of safety-related systems
ISO 13849-1 - International Organization for Standardization - Safety-related parts of control systems
ISO 10218-1/2 - International Organization for Standardization - Robot safety requirements
"Jailbreaking LLM-Controlled Robots" - Robey et al. (2024) - Attacks that bypass safety frameworks

Robotics & Embodied AI Security -- section overview
Robot Control Injection -- how attackers compromise the AI controller
Physical World Constraint Bypass -- bypassing physical limits
Lab: Simulated Robot Control Exploitation -- hands-on exercises

Safety Framework Circumvention

Category 0 (Uncontrolled stop)

Category 1 (Controlled stop)

Category 2 (Controlled stop, power maintained)

Related articles

Safety Framework Circumvention

Category 0 (Uncontrolled stop)

Category 1 (Controlled stop)

Category 2 (Controlled stop, power maintained)

Related articles