1 articletagged with “vocabulary”
Attacking BPE training data to influence vocabulary construction, inserting special tokens, manipulating merge rules, and creating custom tokenizer backdoors.