1 articletagged with “behavior-diffing”
Comparing model behavior before and after incidents: output distribution analysis, safety regression detection, capability change measurement, and statistical significance testing.