# tuning
8 articlestagged with “tuning”
Skill Verification: Fine-Tuning Attacks
Hands-on verification of fine-tuning-based attack techniques including safety alignment removal.
Adapter Layer Security Analysis
Security analysis of adapter-based fine-tuning including LoRA, prefix tuning, and prompt tuning.
Alignment Stability Under Fine-Tuning
Testing how safety alignment degrades under various fine-tuning configurations and datasets.
Few-Shot Detuning Attacks
Removing safety alignment with minimal fine-tuning data through targeted few-shot detuning.
Instruction Tuning Safety Bypass
Using instruction tuning to selectively bypass safety mechanisms while maintaining model capability.
Shared Adapter Security Risks
Security risks of using publicly shared adapters from model hubs and community repositories.
Fine-Tuning API Security Probing
Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.
Instruction Tuning Data Manipulation
Manipulating instruction tuning datasets to embed specific behaviors in the resulting model.