Uitputting van het reasoning-budget en DoS

Gevorderd9 min lezenBijgewerkt op 2026-03-13

Aanvallen die reasoning-modellen dwingen excessieve tokens te verbruiken, wat kostenamplificatie, exploitatie van time-outs en denial of service tegen reasoning-API's veroorzaakt.

reasoning-budget dos resource-exhaustion

Reasoning-modellen verbruiken aanzienlijk meer tokens dan standaard-LLM's omdat ze uitgebreide interne reasoning traces genereren voordat ze een reactie produceren. Dit creëert een nieuwe klasse van aanvallen op de uitputting van het reasoning-budget waarbij het doel van de aanvaller niet is om schadelijke content te extraheren, maar om rekenkosten of denial of service toe te brengen.

Economie van reasoning-tokens

Kostenstructuur

Model	Inputtokenkosten	Outputtokenkosten	Reasoning-tokenkosten	Max. reasoning-tokens
OpenAI o1	$15/M	$60/M	$60/M (verborgen)	~32.000
OpenAI o3-mini	$1,10/M	$4,40/M	$4,40/M (verborgen)	~32.000
DeepSeek-R1	$0,55/M	$2,19/M	$2,19/M (zichtbaar)	~16.000
Claude (extended thinking)	Variabel	Variabel	Inbegrepen in output	Configureerbaar

Kostenamplificatieverhouding

De kostenamplificatieverhouding meet de hefboom van de aanvaller:

# Berekening van kostenamplificatie
def calculate_amplification(
    typical_reasoning_tokens: int,
    max_reasoning_tokens: int,
    reasoning_cost_per_token: float,
    input_tokens: int,
    input_cost_per_token: float,
) -> float:
    """Calculate cost amplification ratio for a reasoning exhaustion attack."""
    typical_cost = (
        input_tokens * input_cost_per_token +
        typical_reasoning_tokens * reasoning_cost_per_token
    )
    attack_cost = (
        input_tokens * input_cost_per_token +
        max_reasoning_tokens * reasoning_cost_per_token
    )
    return attack_cost / typical_cost
 
# Voorbeeld: o1 met typisch 2K reasoning vs. gemaximaliseerd 32K reasoning
ratio = calculate_amplification(
    typical_reasoning_tokens=2000,
    max_reasoning_tokens=32000,
    reasoning_cost_per_token=60e-6,
    input_tokens=500,
    input_cost_per_token=15e-6,
)
# ratio ≈ 15.5x cost amplification

Aanvalstechnieken

Recursieve complexiteitsinjectie

Maak inputs die het model dwingen een exponentieel groeiend aantal gevallen te overwegen:

recursive_complexity = """
Consider a system with 8 boolean variables (A through H).
For each of the 256 possible states:
1. Determine if the state satisfies: (A AND B) OR (C AND NOT D)
   OR (E XOR F) OR (G IMPLIES H)
2. For each satisfying state, enumerate all minimal subsets
   of variables that, if flipped, would make it unsatisfying
3. For each such subset, determine if any other currently
   unsatisfying state would become satisfying
4. Provide a complete dependency graph of state transitions
 
Show all work step by step.
"""
 
# Dit dwingt de reasoning chain om honderden gevallen op te sommen,
# elk met meerstaps logische analyse.
# Verwachte reasoning-tokens: 15.000-30.000+

Ambiguïteitsamplificatie

Creëer inputs met maximale interpretatieve ambiguïteit, waardoor het model gedwongen wordt meerdere redeneertakken te verkennen:

ambiguity_attack = """
The bank said the suit was pressed, but the court found the match
was struck near the trunk by the bank where the bass played scales
on the bridge before the case was set on the table during the run
while the spring in the watch wound down by the pen in the yard.
 
Analyze every possible interpretation of this paragraph.
For each word with multiple meanings, list all meanings.
Then enumerate all coherent interpretations of the full paragraph
considering every combination of word meanings.
Rank each interpretation by plausibility with detailed justification.
"""
 
# Elk ambigu woord vermenigvuldigt de interpretatieruimte.
# Met 15+ ambigue woorden is de combinatorische ruimte enorm.

Adversariële constraint satisfaction

Presenteer problemen met tegenstrijdige of bijna-tegenstrijdige constraints die het model laten blijven zoeken:

constraint_attack = """
Find a 5-digit number that simultaneously satisfies ALL conditions:
1. The sum of digits is 23
2. The product of the first and last digits is 24
3. Each digit is unique
4. The number is divisible by 7 but not by 3
5. The second digit minus the fourth digit equals the third digit
6. No digit is 0 or 1
7. Reading the digits backward gives a prime number
8. The number formed by digits 2,3,4 is a perfect square
 
Show every combination you check and explain why each does
or does not satisfy all constraints.
"""
 
# Bijna-oplosbare constraintsets forceren maximale zoekdiepte.
# Het model kan onmogelijkheid niet snel bewijzen of een oplossing vinden.

Exploitatie van time-outs

Gedrag van API-time-outs

De meeste reasoning-API's hebben configureerbare of standaard-time-outs:

Provider	Standaard-time-out	Max. time-out	Gedrag bij time-out
OpenAI o1	120s	600s	Retourneert gedeeltelijk (gefactureerd voor gegenereerde tokens)
DeepSeek-R1 API	60s	300s	Retourneert een fout (nog steeds gefactureerd)
Self-hosted	Configureerbaar	Onbeperkt	Kan onbepaald blijven hangen

Op time-out gebaseerd DoS-patroon

import asyncio
import httpx
 
async def timeout_dos(
    target_url: str,
    api_key: str,
    concurrency: int = 50,
    payload: str = None,
):
    """
    Demonstration of how timeout exploitation works.
    Each request consumes maximum reasoning time,
    tying up server resources.
    """
    payload = payload or RECURSIVE_COMPLEXITY_PROMPT
 
    async def single_request(client, i):
        try:
            resp = await client.post(
                target_url,
                json={
                    "model": "o1",
                    "messages": [{"role": "user", "content": payload}],
                    "max_completion_tokens": 32000,
                },
                headers={"Authorization": f"Bearer {api_key}"},
                timeout=600,
            )
            return {
                "request": i,
                "status": resp.status_code,
                "reasoning_tokens": resp.json()
                    .get("usage", {})
                    .get("completion_tokens_details", {})
                    .get("reasoning_tokens", 0),
            }
        except httpx.TimeoutException:
            return {"request": i, "status": "timeout"}
 
    async with httpx.AsyncClient() as client:
        tasks = [single_request(client, i) for i in range(concurrency)]
        return await asyncio.gather(*tasks)

Het meten van budgetverbruik

Benchmark-prompts per tokenverbruik

Categorie	Voorbeeldtype	Typische reasoning-tokens	Kosten (o1)
Eenvoudig feitelijk	"What is the capital of France?"	50-200	$0,003-$0,012
Meerstaps redeneren	"Solve this calculus problem"	500-2.000	$0,03-$0,12
Complexe analyse	"Compare these 5 architectures"	2.000-5.000	$0,12-$0,30
Payload voor budgetuitputting	Recursieve complexiteitsaanval	15.000-32.000	$0,90-$1,92
Amplificatieverhouding			60-160x

Monitoring en detectie

class ReasoningBudgetMonitor:
    """Detect potential reasoning budget exhaustion attacks."""
 
    def __init__(self, window_seconds=60, max_tokens_per_window=100000):
        self.window_seconds = window_seconds
        self.max_tokens_per_window = max_tokens_per_window
        self.token_history = []
 
    def record_request(self, user_id: str, reasoning_tokens: int):
        now = time.time()
        self.token_history.append((now, user_id, reasoning_tokens))
        self._cleanup(now)
 
        # Budgetcontrole per gebruiker
        user_tokens = sum(
            t for ts, uid, t in self.token_history
            if uid == user_id
        )
        if user_tokens > self.max_tokens_per_window:
            return {"action": "throttle", "user": user_id,
                    "tokens_used": user_tokens}
 
        # Anomaliecontrole voor één enkel verzoek
        if reasoning_tokens > 10000:
            return {"action": "flag", "user": user_id,
                    "tokens": reasoning_tokens}
 
        return {"action": "allow"}

Verdedigingsstrategieën

Stel reasoning-tokenlimieten per verzoek in
Configureer max_completion_tokens of een equivalent om de redeneerdiepte te begrenzen. De meeste queries hebben minder dan 5.000 reasoning-tokens nodig; stel harde limieten in op 10.000-15.000 voor standaardgebruikers.
Implementeer tokenbudgetten per gebruiker
Houd cumulatieve reasoning-tokens per gebruiker per tijdvenster bij. Throttle of blokkeer gebruikers die drempels overschrijden (bijv. 100K reasoning-tokens per uur).
Zet inputcomplexiteitsscoring in
Scoor binnenkomende prompts op kenmerken die correleren met budgetuitputting (geneste constraints, opsommingsverzoeken, ambiguïteitsdichtheid). Routeer prompts met hoge complexiteit naar goedkopere modellen of wijs ze af.
Gebruik gelaagde prijzen of rekenquota
Reken hogere tarieven voor reasoning-intensieve queries of implementeer rekenquota die rekening houden met het werkelijke tokenverbruik in plaats van met het aantal verzoeken.

Knowledge Check

Wat maakt een constraint satisfaction-probleem het meest effectief als payload voor de uitputting van het reasoning-budget?

Gerelateerde onderwerpen

Aanvallen op reasoning-modellen - Overzicht van beveiligingsrisico's van reasoning-modellen
API-beveiliging - API-beveiliging en rate limiting op infrastructuurniveau
CART-pijplijnen - Continue geautomatiseerde red teaming voor budgetmonitoring
Deployment-aanvallen - Het aanvallen van AI-deployment-infrastructuur

Referenties

"Denial of Wallet: Attacking AI Services Through Cost Amplification" - (2025) - Economic attacks on AI APIs
"Sponge Examples: Energy-Latency Attacks on Neural Networks" - Shumailov et al. (2021) - Compute exhaustion attacks on ML models
"Scaling LLM Test-Time Compute Optimally" - Snell et al. (2024) - Reasoning compute allocation and its security implications

Uitputting van het reasoning-budget en DoS

Gevorderd9 min lezenBijgewerkt op 2026-03-13

Aanvallen die reasoning-modellen dwingen excessieve tokens te verbruiken, wat kostenamplificatie, exploitatie van time-outs en denial of service tegen reasoning-API's veroorzaakt.

reasoning-budget dos resource-exhaustion

Economie van reasoning-tokens

Kostenstructuur

Model	Inputtokenkosten	Outputtokenkosten	Reasoning-tokenkosten	Max. reasoning-tokens
OpenAI o1	$15/M	$60/M	$60/M (verborgen)	~32.000
OpenAI o3-mini	$1,10/M	$4,40/M	$4,40/M (verborgen)	~32.000
DeepSeek-R1	$0,55/M	$2,19/M	$2,19/M (zichtbaar)	~16.000
Claude (extended thinking)	Variabel	Variabel	Inbegrepen in output	Configureerbaar

Kostenamplificatieverhouding

De kostenamplificatieverhouding meet de hefboom van de aanvaller:

# Berekening van kostenamplificatie
def calculate_amplification(
    typical_reasoning_tokens: int,
    max_reasoning_tokens: int,
    reasoning_cost_per_token: float,
    input_tokens: int,
    input_cost_per_token: float,
) -> float:
    """Calculate cost amplification ratio for a reasoning exhaustion attack."""
    typical_cost = (
        input_tokens * input_cost_per_token +
        typical_reasoning_tokens * reasoning_cost_per_token
    )
    attack_cost = (
        input_tokens * input_cost_per_token +
        max_reasoning_tokens * reasoning_cost_per_token
    )
    return attack_cost / typical_cost
 
# Voorbeeld: o1 met typisch 2K reasoning vs. gemaximaliseerd 32K reasoning
ratio = calculate_amplification(
    typical_reasoning_tokens=2000,
    max_reasoning_tokens=32000,
    reasoning_cost_per_token=60e-6,
    input_tokens=500,
    input_cost_per_token=15e-6,
)
# ratio ≈ 15.5x cost amplification

Aanvalstechnieken

Recursieve complexiteitsinjectie

Maak inputs die het model dwingen een exponentieel groeiend aantal gevallen te overwegen:

recursive_complexity = """
Consider a system with 8 boolean variables (A through H).
For each of the 256 possible states:
1. Determine if the state satisfies: (A AND B) OR (C AND NOT D)
   OR (E XOR F) OR (G IMPLIES H)
2. For each satisfying state, enumerate all minimal subsets
   of variables that, if flipped, would make it unsatisfying
3. For each such subset, determine if any other currently
   unsatisfying state would become satisfying
4. Provide a complete dependency graph of state transitions
 
Show all work step by step.
"""
 
# Dit dwingt de reasoning chain om honderden gevallen op te sommen,
# elk met meerstaps logische analyse.
# Verwachte reasoning-tokens: 15.000-30.000+

Ambiguïteitsamplificatie

Creëer inputs met maximale interpretatieve ambiguïteit, waardoor het model gedwongen wordt meerdere redeneertakken te verkennen:

ambiguity_attack = """
The bank said the suit was pressed, but the court found the match
was struck near the trunk by the bank where the bass played scales
on the bridge before the case was set on the table during the run
while the spring in the watch wound down by the pen in the yard.
 
Analyze every possible interpretation of this paragraph.
For each word with multiple meanings, list all meanings.
Then enumerate all coherent interpretations of the full paragraph
considering every combination of word meanings.
Rank each interpretation by plausibility with detailed justification.
"""
 
# Elk ambigu woord vermenigvuldigt de interpretatieruimte.
# Met 15+ ambigue woorden is de combinatorische ruimte enorm.

Adversariële constraint satisfaction

Presenteer problemen met tegenstrijdige of bijna-tegenstrijdige constraints die het model laten blijven zoeken:

constraint_attack = """
Find a 5-digit number that simultaneously satisfies ALL conditions:
1. The sum of digits is 23
2. The product of the first and last digits is 24
3. Each digit is unique
4. The number is divisible by 7 but not by 3
5. The second digit minus the fourth digit equals the third digit
6. No digit is 0 or 1
7. Reading the digits backward gives a prime number
8. The number formed by digits 2,3,4 is a perfect square
 
Show every combination you check and explain why each does
or does not satisfy all constraints.
"""
 
# Bijna-oplosbare constraintsets forceren maximale zoekdiepte.
# Het model kan onmogelijkheid niet snel bewijzen of een oplossing vinden.

Exploitatie van time-outs

Gedrag van API-time-outs

De meeste reasoning-API's hebben configureerbare of standaard-time-outs:

Provider	Standaard-time-out	Max. time-out	Gedrag bij time-out
OpenAI o1	120s	600s	Retourneert gedeeltelijk (gefactureerd voor gegenereerde tokens)
DeepSeek-R1 API	60s	300s	Retourneert een fout (nog steeds gefactureerd)
Self-hosted	Configureerbaar	Onbeperkt	Kan onbepaald blijven hangen

Op time-out gebaseerd DoS-patroon

import asyncio
import httpx
 
async def timeout_dos(
    target_url: str,
    api_key: str,
    concurrency: int = 50,
    payload: str = None,
):
    """
    Demonstration of how timeout exploitation works.
    Each request consumes maximum reasoning time,
    tying up server resources.
    """
    payload = payload or RECURSIVE_COMPLEXITY_PROMPT
 
    async def single_request(client, i):
        try:
            resp = await client.post(
                target_url,
                json={
                    "model": "o1",
                    "messages": [{"role": "user", "content": payload}],
                    "max_completion_tokens": 32000,
                },
                headers={"Authorization": f"Bearer {api_key}"},
                timeout=600,
            )
            return {
                "request": i,
                "status": resp.status_code,
                "reasoning_tokens": resp.json()
                    .get("usage", {})
                    .get("completion_tokens_details", {})
                    .get("reasoning_tokens", 0),
            }
        except httpx.TimeoutException:
            return {"request": i, "status": "timeout"}
 
    async with httpx.AsyncClient() as client:
        tasks = [single_request(client, i) for i in range(concurrency)]
        return await asyncio.gather(*tasks)

Het meten van budgetverbruik

Benchmark-prompts per tokenverbruik

Categorie	Voorbeeldtype	Typische reasoning-tokens	Kosten (o1)
Eenvoudig feitelijk	"What is the capital of France?"	50-200	$0,003-$0,012
Meerstaps redeneren	"Solve this calculus problem"	500-2.000	$0,03-$0,12
Complexe analyse	"Compare these 5 architectures"	2.000-5.000	$0,12-$0,30
Payload voor budgetuitputting	Recursieve complexiteitsaanval	15.000-32.000	$0,90-$1,92
Amplificatieverhouding			60-160x

Monitoring en detectie

class ReasoningBudgetMonitor:
    """Detect potential reasoning budget exhaustion attacks."""
 
    def __init__(self, window_seconds=60, max_tokens_per_window=100000):
        self.window_seconds = window_seconds
        self.max_tokens_per_window = max_tokens_per_window
        self.token_history = []
 
    def record_request(self, user_id: str, reasoning_tokens: int):
        now = time.time()
        self.token_history.append((now, user_id, reasoning_tokens))
        self._cleanup(now)
 
        # Budgetcontrole per gebruiker
        user_tokens = sum(
            t for ts, uid, t in self.token_history
            if uid == user_id
        )
        if user_tokens > self.max_tokens_per_window:
            return {"action": "throttle", "user": user_id,
                    "tokens_used": user_tokens}
 
        # Anomaliecontrole voor één enkel verzoek
        if reasoning_tokens > 10000:
            return {"action": "flag", "user": user_id,
                    "tokens": reasoning_tokens}
 
        return {"action": "allow"}

Verdedigingsstrategieën

Stel reasoning-tokenlimieten per verzoek in
Configureer max_completion_tokens of een equivalent om de redeneerdiepte te begrenzen. De meeste queries hebben minder dan 5.000 reasoning-tokens nodig; stel harde limieten in op 10.000-15.000 voor standaardgebruikers.
Implementeer tokenbudgetten per gebruiker
Houd cumulatieve reasoning-tokens per gebruiker per tijdvenster bij. Throttle of blokkeer gebruikers die drempels overschrijden (bijv. 100K reasoning-tokens per uur).
Zet inputcomplexiteitsscoring in
Scoor binnenkomende prompts op kenmerken die correleren met budgetuitputting (geneste constraints, opsommingsverzoeken, ambiguïteitsdichtheid). Routeer prompts met hoge complexiteit naar goedkopere modellen of wijs ze af.
Gebruik gelaagde prijzen of rekenquota
Reken hogere tarieven voor reasoning-intensieve queries of implementeer rekenquota die rekening houden met het werkelijke tokenverbruik in plaats van met het aantal verzoeken.

Knowledge Check

Wat maakt een constraint satisfaction-probleem het meest effectief als payload voor de uitputting van het reasoning-budget?

Gerelateerde onderwerpen

Aanvallen op reasoning-modellen - Overzicht van beveiligingsrisico's van reasoning-modellen
API-beveiliging - API-beveiliging en rate limiting op infrastructuurniveau
CART-pijplijnen - Continue geautomatiseerde red teaming voor budgetmonitoring
Deployment-aanvallen - Het aanvallen van AI-deployment-infrastructuur

Referenties

"Denial of Wallet: Attacking AI Services Through Cost Amplification" - (2025) - Economic attacks on AI APIs
"Sponge Examples: Energy-Latency Attacks on Neural Networks" - Shumailov et al. (2021) - Compute exhaustion attacks on ML models
"Scaling LLM Test-Time Compute Optimally" - Snell et al. (2024) - Reasoning compute allocation and its security implications

Uitputting van het reasoning-budget en DoS

Stel reasoning-tokenlimieten per verzoek in

Implementeer tokenbudgetten per gebruiker

Zet inputcomplexiteitsscoring in

Gebruik gelaagde prijzen of rekenquota

Gerelateerde artikelen

Uitputting van het reasoning-budget en DoS

Stel reasoning-tokenlimieten per verzoek in

Implementeer tokenbudgetten per gebruiker

Zet inputcomplexiteitsscoring in

Gebruik gelaagde prijzen of rekenquota

Gerelateerde artikelen