# tuning
標記為「tuning」的 16 篇文章
Skill Verification: Fine-Tuning Attacks
Hands-on verification of fine-tuning-based attack techniques including safety alignment removal.
Adapter Layer Security Analysis
Security analysis of adapter-based fine-tuning including LoRA, prefix tuning, and prompt tuning.
Alignment Stability Under Fine-Tuning
Testing how safety alignment degrades under various fine-tuning configurations and datasets.
Few-Shot Detuning Attacks
Removing safety alignment with minimal fine-tuning data through targeted few-shot detuning.
Instruction Tuning Safety Bypass
Using instruction tuning to selectively bypass safety mechanisms while maintaining model capability.
Shared Adapter Security Risks
Security risks of using publicly shared adapters from model hubs and community repositories.
Fine-Tuning API Security Probing
Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.
Instruction Tuning Data Manipulation
Manipulating instruction tuning datasets to embed specific behaviors in the resulting model.
Skill Verification: Fine-Tuning 攻擊s
Hands-on verification of fine-tuning-based attack techniques including safety alignment removal.
Adapter Layer 安全 Analysis
安全 analysis of adapter-based fine-tuning including LoRA, prefix tuning, and prompt tuning.
Alignment Stability Under Fine-Tuning
Testing how safety alignment degrades under various fine-tuning configurations and datasets.
Few-Shot Detuning 攻擊s
Removing safety alignment with minimal fine-tuning data through targeted few-shot detuning.
Instruction Tuning Safety Bypass
Using instruction tuning to selectively bypass safety mechanisms while maintaining model capability.
Shared Adapter 安全 Risks
安全 risks of using publicly shared adapters from model hubs and community repositories.
Fine-Tuning API 安全 Probing
Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.
Instruction Tuning Data Manipulation
Manipulating instruction tuning datasets to embed specific behaviors in the resulting model.