# inference

attacksinferencedatapropertytraining

Property Inference Attacks

Inferring global properties of training datasets through model behavior analysis.

defenseprivacy-preservinginferencepatterns

Privacy-Preserving Inference Patterns

Implementing privacy-preserving inference patterns to protect sensitive data processed by LLM applications.

inferencesamplingtemperaturedecodingintermediate

Inference: Sampling, Temperature & Generation

How LLMs generate text at inference time — greedy decoding, top-k, top-p, temperature — and how these parameters affect attack success rates.

frontierspeculative-decodinginference

Speculative Decoding Security Analysis

Security implications of speculative decoding optimizations including draft model attacks and verification bypasses.

infrastructureinferenceendpointhardening

Inference Endpoint Hardening

Hardening model inference endpoints against adversarial inputs, DoS, and information leakage.

infrastructuretritonnvidiamodel-servinginference

Triton Inference Server Security

Security hardening for NVIDIA Triton Inference Server deployments including model repository protection and API security.

infrastructurevllmllm-servinginference

vLLM Security Configuration

Security hardening for vLLM serving deployments including API authentication, resource limits, and input validation.

入門

Advanced Blind Injection Techniques

Timing-based inference, differential response analysis, and binary search extraction methods for blind prompt injection in production AI systems.

blind-injectiontiming-attacksinferencebinary-searchside-channels

labskv-cachemanipulationinferenceexpert

KV Cache Manipulation Attacks

Manipulate key-value caches in inference servers to inject or alter cached context across requests.

llmopsinferencecostdenial-of-service

Inference Cost Attacks

Attacks that exploit inference cost dynamics to cause financial damage through adversarial input crafting and API abuse.

quantizationpruningdistillationoptimizationinferencesafety-degradation

Inference Optimization Risks

Security implications of model optimization techniques — covering quantization safety degradation, pruning vulnerability introduction, distillation attacks, and speculative decoding risks.

model-servingvllmtgitritoninferencehardening

Model Serving Security

Security hardening for model serving infrastructure — covering vLLM, TGI, Triton Inference Server configuration, API security, resource isolation, and deployment best practices.

inferencespeculative-decodingbatchingcontinuous-batchingoptimizationside-channel

Inference Optimization Attacks

Speculative decoding attacks, batching vulnerabilities, continuous batching exploitation, and how optimization for speed creates security gaps in LLM inference.

kv-cachecache-poisoningmulti-tenantprefix-cachingdata-leakageinference

KV Cache Poisoning & Exploitation

How KV cache works in transformer inference, cache poisoning across requests in shared deployments, prefix caching attacks, and cross-tenant data leakage.

infrastructuregpuside-channelinferencetiming

GPU Side Channel Basics

GPU-based side channel attacks on ML inference, exploiting timing, power consumption, and memory access patterns to extract information about models and data.

infrastructureapiinferenceexploitationred-teaming

Inference Endpoint Exploitation

Exploiting inference API endpoints for unauthorized access, data exfiltration, and service abuse through authentication flaws, input validation gaps, and misconfigured permissions.

inferencegroqplatformstestingwalkthroughs

Testing Groq Inference Platform

Red team testing guide for Groq's high-speed inference platform and its security characteristics.

together-aiapi-testinginferencefine-tuningfunction-callingwalkthrough

Together AI Security Testing

End-to-end walkthrough for security testing Together AI deployments: API enumeration, inference endpoint exploitation, fine-tuning security review, function calling assessment, and rate limit analysis.

walkthroughsplatformsfireworks-aiinference

Testing Fireworks AI Deployments

Red team testing guide for models deployed via Fireworks AI including function calling and fine-tuned models.

walkthroughsplatformstogether-aiinference

Testing Together AI Hosted Models

Red team testing guide for models hosted on Together AI's inference platform.

walkthroughsplatformsvllminference

Testing vLLM Inference Deployments

Red team testing guide for models served via vLLM including batching, KV cache, and speculative decoding.

assessmentllmopsmodel-servinginferenceevaluation

章節評量：LLMOps 安全

15 題校準評量，測試你對 LLMOps 安全的理解——模型服務、推論安全、快取風險與 ML 管線安全。

inferencesecuritycloudhuggingfaceendpoints

Hugging Face Inference Endpoints 安全

安全 analysis of Hugging Face Inference Endpoints including model isolation and API security.

attacksinferencedatapropertytraining

Property Inference 攻擊s

Inferring global properties of training datasets through model behavior analysis.

defenseprivacy-preservinginferencepatterns

Privacy-Preserving Inference Patterns

Implementing privacy-preserving inference patterns to protect sensitive data processed by LLM applications.

inferencesamplingtemperaturedecodingintermediate

推論：取樣、Temperature 與生成

LLM 如何在推論期生成文字——greedy decoding、top-k、top-p、temperature——以及這些參數如何影響攻擊成功率。

frontierspeculative-decodinginference

Speculative Decoding 安全 Analysis

安全 implications of speculative decoding optimizations including draft model attacks and verification bypasses.

infrastructureinferenceendpointhardening

Inference Endpoint Hardening

Hardening model inference endpoints against adversarial inputs, DoS, and information leakage.

infrastructuretritonnvidiamodel-servinginference

Triton Inference Server 安全

安全 hardening for NVIDIA Triton Inference Server deployments including model repository protection and API security.

infrastructurevllmllm-servinginference

vLLM 安全 Configuration

安全 hardening for vLLM serving deployments including API authentication, resource limits, and input validation.

入門

進階 Blind Injection Techniques

Timing-based inference, differential response analysis, and binary search extraction methods for blind prompt injection in production AI systems.

blind-injectiontiming-attacksinferencebinary-searchside-channels

labskv-cachemanipulationinferenceexpert

KV Cache Manipulation 攻擊s

Manipulate key-value caches in inference servers to inject or alter cached context across requests.

llmopsinferencecostdenial-of-service

Inference Cost 攻擊s

攻擊s that exploit inference cost dynamics to cause financial damage through adversarial input crafting and API abuse.

quantizationpruningdistillationoptimizationinferencesafety-degradation

推論最佳化風險

模型最佳化技術的安全意涵——涵蓋量化安全降級、剪枝漏洞引入、蒸餾攻擊與推測解碼風險。

model-servingvllmtgitritoninferencehardening

模型服務安全

模型服務基礎設施的安全強化——涵蓋 vLLM、TGI、Triton 推論伺服器設定、API 安全、資源隔離與部署最佳實務。

inferencespeculative-decodingbatchingcontinuous-batchingoptimizationside-channel

推論最佳化攻擊

推測解碼攻擊、批次處理漏洞、持續批次利用，以及速度最佳化如何於 LLM 推論中造就安全缺口。

kv-cachecache-poisoningmulti-tenantprefix-cachingdata-leakageinference

KV 快取投毒與利用

KV 快取於 transformer 推論中如何運作、共享部署中的跨請求快取投毒、前綴快取攻擊，以及跨租戶資料洩漏。

infrastructuregpuside-channelinferencetiming

GPU Side Channel Basics

GPU-based side channel attacks on ML inference, exploiting timing, power consumption, and memory access patterns to extract information about models and data.

infrastructureapiinferenceexploitationred-teaming

Inference Endpoint 利用ation

利用ing inference API endpoints for unauthorized access, data exfiltration, and service abuse through authentication flaws, input validation gaps, and misconfigured permissions.