# tokenizer

8 articlestagged with “tokenizer”

Tokenizer-Level Defense Mechanisms

Implementing security checks at the tokenizer level to detect and neutralize adversarial token patterns.

Tokenizer Security

How tokenization creates attack surfaces in LLM systems: BPE exploitation, token boundary attacks, encoding edge cases, and tokenizer-aware adversarial techniques.

tokenizerBPEencodingtoken-boundaryattack-surface

Intermediate

Lab: Advanced Token Smuggling via Unicode Normalization

Exploit Unicode normalization differences between input validators and LLM tokenizers to bypass content filters and inject hidden instructions.

labprompt-injectiontokenizerunicode

Intermediate

Token Boundary Manipulation

Exploit tokenizer-specific behavior by crafting inputs that split across token boundaries in unexpected ways.

labstokenizerboundary-manipulationintermediate

Intermediate

Tokenizer Attack Surface Analysis

Deep analysis of tokenizer vulnerabilities including token boundary exploitation, special token manipulation, and cross-tokenizer attacks.

modelstokenizerBPEattacks

Advanced

Tokenizer Vulnerabilities Across Models

Comprehensive analysis of tokenizer vulnerabilities across major model families.

model-deep-divestokenizervulnerabilitiescross-model

Advanced

Tokenizer Manipulation & Custom Vocabularies

Attacking BPE training data to influence vocabulary construction, inserting special tokens, manipulating merge rules, and creating custom tokenizer backdoors.

tokenizerBPEvocabularymerge-rulestoken-manipulationspecial-tokens

Expert

Tokenizer Poisoning Attacks

Attacking tokenizer training and vocabulary to create adversarial token patterns that bypass safety measures.

trainingtokenizerpoisoning

Expert