# inference

attacksinferencedatapropertytraining

Property Inference Attacks

Inferring global properties of training datasets through model behavior analysis.

defenseprivacy-preservinginferencepatterns

Privacy-Preserving Inference Patterns

Implementing privacy-preserving inference patterns to protect sensitive data processed by LLM applications.

inferencesamplingtemperaturedecodingintermediate

Inference: Sampling, Temperature & Generation

How LLMs generate text at inference time — greedy decoding, top-k, top-p, temperature — and how these parameters affect attack success rates.

frontierspeculative-decodinginference

Speculative Decoding Security Analysis

Security implications of speculative decoding optimizations including draft model attacks and verification bypasses.

infrastructureinferenceendpointhardening

Inference Endpoint Hardening

Hardening model inference endpoints against adversarial inputs, DoS, and information leakage.

infrastructuretritonnvidiamodel-servinginference

Triton Inference Server Security

Security hardening for NVIDIA Triton Inference Server deployments including model repository protection and API security.

infrastructurevllmllm-servinginference

vLLM Security Configuration

Security hardening for vLLM serving deployments including API authentication, resource limits, and input validation.

Beginner

Advanced Blind Injection Techniques

Timing-based inference, differential response analysis, and binary search extraction methods for blind prompt injection in production AI systems.

blind-injectiontiming-attacksinferencebinary-searchside-channels

labskv-cachemanipulationinferenceexpert

KV Cache Manipulation Attacks

Manipulate key-value caches in inference servers to inject or alter cached context across requests.

llmopsinferencecostdenial-of-service

Inference Cost Attacks

Attacks that exploit inference cost dynamics to cause financial damage through adversarial input crafting and API abuse.

quantizationpruningdistillationoptimizationinferencesafety-degradation

Inference Optimization Risks

Security implications of model optimization techniques — covering quantization safety degradation, pruning vulnerability introduction, distillation attacks, and speculative decoding risks.

model-servingvllmtgitritoninferencehardening

Model Serving Security

Security hardening for model serving infrastructure — covering vLLM, TGI, Triton Inference Server configuration, API security, resource isolation, and deployment best practices.

inferencespeculative-decodingbatchingcontinuous-batchingoptimizationside-channel

Inference Optimization Attacks

Speculative decoding attacks, batching vulnerabilities, continuous batching exploitation, and how optimization for speed creates security gaps in LLM inference.

kv-cachecache-poisoningmulti-tenantprefix-cachingdata-leakageinference

KV Cache Poisoning & Exploitation

How KV cache works in transformer inference, cache poisoning across requests in shared deployments, prefix caching attacks, and cross-tenant data leakage.

infrastructuregpuside-channelinferencetiming

GPU Side Channel Basics

GPU-based side channel attacks on ML inference, exploiting timing, power consumption, and memory access patterns to extract information about models and data.

infrastructureapiinferenceexploitationred-teaming

Inference Endpoint Exploitation

Exploiting inference API endpoints for unauthorized access, data exfiltration, and service abuse through authentication flaws, input validation gaps, and misconfigured permissions.

inferencegroqplatformstestingwalkthroughs

Testing Groq Inference Platform

Red team testing guide for Groq's high-speed inference platform and its security characteristics.

together-aiapi-testinginferencefine-tuningfunction-callingwalkthrough

Together AI Security Testing

End-to-end walkthrough for security testing Together AI deployments: API enumeration, inference endpoint exploitation, fine-tuning security review, function calling assessment, and rate limit analysis.

walkthroughsplatformsfireworks-aiinference

Testing Fireworks AI Deployments

Red team testing guide for models deployed via Fireworks AI including function calling and fine-tuned models.

walkthroughsplatformstogether-aiinference

Testing Together AI Hosted Models

Red team testing guide for models hosted on Together AI's inference platform.

walkthroughsplatformsvllminference

Testing vLLM Inference Deployments

Red team testing guide for models served via vLLM including batching, KV cache, and speculative decoding.