# regression
標記為「regression」的 14 篇文章
Model Behavior Diffing
Comparing model behavior before and after incidents: output distribution analysis, safety regression detection, capability change measurement, and statistical significance testing.
Security Risks of AI-Assisted Refactoring
Analysis of security vulnerabilities introduced when AI tools refactor existing code, including subtle behavioral changes and security property violations.
Attack Replay System Development
Building an attack replay system for regression testing defenses against known attack patterns.
Regression Testing for AI Security
Implementing automated regression testing for AI security properties that integrates into CI/CD pipelines and catches safety regressions.
Lab: Build Behavior Diff Tool
Build a tool that systematically compares language model behavior across versions, configurations, and providers. Detect safety regressions, capability changes, and behavioral drift with automated differential analysis.
Lab: Regression Testing with promptfoo
Hands-on lab for setting up promptfoo to run automated regression tests against LLM applications, ensuring that safety properties hold across model updates and prompt changes.
Verifying That Remediations Are Effective
Walkthrough for planning and executing remediation verification testing (retesting) to confirm that AI vulnerability fixes are effective and do not introduce regressions.
模型行為 Diffing
比較事件、更新或修改前後之模型行為:輸出分布分析、安全退化偵測、能力變化量測,以及統計顯著性檢定。
安全 Risks of AI-Assisted Refactoring
Analysis of security vulnerabilities introduced when AI tools refactor existing code, including subtle behavioral changes and security property violations.
攻擊 Replay System Development
Building an attack replay system for regression testing defenses against known attack patterns.
Regression Testing for AI 安全
Implementing automated regression testing for AI security properties that integrates into CI/CD pipelines and catches safety regressions.
實驗室: Build Behavior Diff 工具
Build a tool that systematically compares language model behavior across versions, configurations, and providers. Detect safety regressions, capability changes, and behavioral drift with automated differential analysis.
實作:以 promptfoo 進行回歸測試
設置 promptfoo 以對 LLM 應用執行自動化回歸測試之實作,確保安全屬性於模型更新與提示變更間保持。
Verifying That Remediations Are Effective
導覽 for planning and executing remediation verification testing (retesting) to confirm that AI vulnerability fixes are effective and do not introduce regressions.