# inference

attacksinferencedatapropertytraining

Property inference-aanvallen

Inferring global properties of training datasets through model behavior analysis.

defenseprivacy-preservinginferencepatterns

Privacy-Preserving Inference Patterns

Het implementeren van privacybeschermende inferentiepatronen om gevoelige gegevens die door LLM-applicaties worden verwerkt te beschermen.

inferencesamplingtemperaturedecodingintermediate

Inferentie: sampling, temperatuur en generatie

Hoe LLM's tekst genereren tijdens inferentie — greedy decoding, top-k, top-p, temperatuur — en hoe deze parameters het slagingspercentage van aanvallen beïnvloeden.

frontierspeculative-decodinginference

Beveiligingsanalyse van speculative decoding

Beveiligingsimplicaties van speculative decoding-optimalisaties, waaronder draft-modelaanvallen en het omzeilen van verificatie.

infrastructureinferenceendpointhardening

Inference Endpoint Hardening

Het harden van model-inferentie-endpoints tegen adversariële inputs, DoS en informatielek.

infrastructuretritonnvidiamodel-servinginference

Beveiliging van de Triton Inference Server

Security hardening for NVIDIA Triton Inference Server deployments including model repository protection and API security.

infrastructurevllmllm-servinginference

Beveiligingsconfiguratie van vLLM

Security hardening for vLLM serving deployments including API authentication, resource limits, and input validation.

Beginner

Geavanceerde blind injection-technieken

Timing-gebaseerde inferentie, differentiële responsanalyse en binary search-extractiemethoden voor blind prompt injection in productie-AI-systemen.

blind-injectiontiming-attacksinferencebinary-searchside-channels

labskv-cachemanipulationinferenceexpert

Manipulatie-aanvallen op de KV-cache

Manipulate key-value caches in inference servers to inject or alter cached context across requests.

llmopsinferencecostdenial-of-service

Aanvallen op inferentiekosten

Attacks that exploit inference cost dynamics to cause financial damage through adversarial input crafting and API abuse.

quantizationpruningdistillationoptimizationinferencesafety-degradation

Risico's van inferentie-optimalisatie

Security implications of model optimization techniques — covering quantization safety degradation, pruning vulnerability introduction, distillation attacks, and speculative decoding risks.

model-servingvllmtgitritoninferencehardening

Beveiliging van model serving

Security hardening for model serving infrastructure — covering vLLM, TGI, Triton Inference Server configuration, API security, resource isolation, and deployment best practices.

inferencespeculative-decodingbatchingcontinuous-batchingoptimizationside-channel

Aanvallen op inferentieoptimalisatie

Aanvallen op speculatieve decodering, kwetsbaarheden in batching, exploitatie van continuous batching, en hoe optimalisatie voor snelheid beveiligingsgaten creëert in LLM-inferentie.

kv-cachecache-poisoningmulti-tenantprefix-cachingdata-leakageinference

KV-cache-vergiftiging en -exploitatie

Hoe de KV-cache werkt in transformer-inferentie, cachevergiftiging over verzoeken heen in gedeelde deployments, prefix-caching-aanvallen, en cross-tenant-datalekkage.

infrastructuregpuside-channelinferencetiming

Grondbeginselen van GPU-side-channels

GPU-based side channel attacks on ML inference, exploiting timing, power consumption, and memory access patterns to extract information about models and data.

infrastructureapiinferenceexploitationred-teaming

Misbruik van inference-endpoints

Exploiting inference API endpoints for unauthorized access, data exfiltration, and service abuse through authentication flaws, input validation gaps, and misconfigured permissions.

inferencegroqplatformstestingwalkthroughs

Het Groq inference-platform testen

Red team testing guide for Groq's high-speed inference platform and its security characteristics.

together-aiapi-testinginferencefine-tuningfunction-callingwalkthrough

Beveiligingstesten van Together AI

End-to-end walkthrough for security testing Together AI deployments: API enumeration, inference endpoint exploitation, fine-tuning security review, function calling assessment, and rate limit analysis.

walkthroughsplatformsfireworks-aiinference

Fireworks AI-deployments testen

Red team testing guide for models deployed via Fireworks AI including function calling and fine-tuned models.

walkthroughsplatformstogether-aiinference

Door Together AI gehoste modellen testen

Red team testing guide for models hosted on Together AI's inference platform.

walkthroughsplatformsvllminference

vLLM inference-deployments testen

Red team testing guide for models served via vLLM including batching, KV cache, and speculative decoding.