# tuning

8 artikelengetagd met “tuning”

Vaardigheidsverificatie: fine-tuning-aanvallen

Hands-on verification of fine-tuning-based attack techniques including safety alignment removal.

Security analysis of adapter-based fine-tuning including LoRA, prefix tuning, and prompt tuning.

Testing how safety alignment degrades under various fine-tuning configurations and datasets.

Removing safety alignment with minimal fine-tuning data through targeted few-shot detuning.

Using instruction tuning to selectively bypass safety mechanisms while maintaining model capability.

Security risks of using publicly shared adapters from model hubs and community repositories.

Probe fine-tuning APIs for security weaknesses including insufficient validation and unsafe default configurations.

Het manipuleren van instruction-tuning-datasets om specifiek gedrag in het resulterende model in te bedden.