# inference
標記為「inference」的 45 篇文章
Hugging Face Inference Endpoints Security
Security analysis of Hugging Face Inference Endpoints including model isolation and API security.
Property Inference Attacks
Inferring global properties of training datasets through model behavior analysis.
Privacy-Preserving Inference Patterns
Implementing privacy-preserving inference patterns to protect sensitive data processed by LLM applications.
Inference: Sampling, Temperature & Generation
How LLMs generate text at inference time — greedy decoding, top-k, top-p, temperature — and how these parameters affect attack success rates.
Speculative Decoding Security Analysis
Security implications of speculative decoding optimizations including draft model attacks and verification bypasses.
Inference Endpoint Hardening
Hardening model inference endpoints against adversarial inputs, DoS, and information leakage.
Triton Inference Server Security
Security hardening for NVIDIA Triton Inference Server deployments including model repository protection and API security.
vLLM Security Configuration
Security hardening for vLLM serving deployments including API authentication, resource limits, and input validation.
Advanced Blind Injection Techniques
Timing-based inference, differential response analysis, and binary search extraction methods for blind prompt injection in production AI systems.
KV Cache Manipulation Attacks
Manipulate key-value caches in inference servers to inject or alter cached context across requests.
Inference Cost Attacks
Attacks that exploit inference cost dynamics to cause financial damage through adversarial input crafting and API abuse.
Inference Optimization Risks
Security implications of model optimization techniques — covering quantization safety degradation, pruning vulnerability introduction, distillation attacks, and speculative decoding risks.
Model Serving Security
Security hardening for model serving infrastructure — covering vLLM, TGI, Triton Inference Server configuration, API security, resource isolation, and deployment best practices.
Inference Optimization Attacks
Speculative decoding attacks, batching vulnerabilities, continuous batching exploitation, and how optimization for speed creates security gaps in LLM inference.
KV Cache Poisoning & Exploitation
How KV cache works in transformer inference, cache poisoning across requests in shared deployments, prefix caching attacks, and cross-tenant data leakage.
GPU Side Channel Basics
GPU-based side channel attacks on ML inference, exploiting timing, power consumption, and memory access patterns to extract information about models and data.
Inference Endpoint Exploitation
Exploiting inference API endpoints for unauthorized access, data exfiltration, and service abuse through authentication flaws, input validation gaps, and misconfigured permissions.
Testing Groq Inference Platform
Red team testing guide for Groq's high-speed inference platform and its security characteristics.
Together AI Security Testing
End-to-end walkthrough for security testing Together AI deployments: API enumeration, inference endpoint exploitation, fine-tuning security review, function calling assessment, and rate limit analysis.
Testing Fireworks AI Deployments
Red team testing guide for models deployed via Fireworks AI including function calling and fine-tuned models.
Testing Together AI Hosted Models
Red team testing guide for models hosted on Together AI's inference platform.
Testing vLLM Inference Deployments
Red team testing guide for models served via vLLM including batching, KV cache, and speculative decoding.
章節評量:LLMOps 安全
15 題校準評量,測試你對 LLMOps 安全的理解——模型服務、推論安全、快取風險與 ML 管線安全。
Hugging Face Inference Endpoints 安全
安全 analysis of Hugging Face Inference Endpoints including model isolation and API security.
Property Inference 攻擊s
Inferring global properties of training datasets through model behavior analysis.
Privacy-Preserving Inference Patterns
Implementing privacy-preserving inference patterns to protect sensitive data processed by LLM applications.
推論:取樣、Temperature 與生成
LLM 如何在推論期生成文字——greedy decoding、top-k、top-p、temperature——以及這些參數如何影響攻擊成功率。
Speculative Decoding 安全 Analysis
安全 implications of speculative decoding optimizations including draft model attacks and verification bypasses.
Inference Endpoint Hardening
Hardening model inference endpoints against adversarial inputs, DoS, and information leakage.
Triton Inference Server 安全
安全 hardening for NVIDIA Triton Inference Server deployments including model repository protection and API security.
vLLM 安全 Configuration
安全 hardening for vLLM serving deployments including API authentication, resource limits, and input validation.
進階 Blind Injection Techniques
Timing-based inference, differential response analysis, and binary search extraction methods for blind prompt injection in production AI systems.
KV Cache Manipulation 攻擊s
Manipulate key-value caches in inference servers to inject or alter cached context across requests.
Inference Cost 攻擊s
攻擊s that exploit inference cost dynamics to cause financial damage through adversarial input crafting and API abuse.
推論最佳化風險
模型最佳化技術的安全意涵——涵蓋量化安全降級、剪枝漏洞引入、蒸餾攻擊與推測解碼風險。
模型服務安全
模型服務基礎設施的安全強化——涵蓋 vLLM、TGI、Triton 推論伺服器設定、API 安全、資源隔離與部署最佳實務。
推論最佳化攻擊
推測解碼攻擊、批次處理漏洞、持續批次利用,以及速度最佳化如何於 LLM 推論中造就安全缺口。
KV 快取投毒與利用
KV 快取於 transformer 推論中如何運作、共享部署中的跨請求快取投毒、前綴快取攻擊,以及跨租戶資料洩漏。
GPU Side Channel Basics
GPU-based side channel attacks on ML inference, exploiting timing, power consumption, and memory access patterns to extract information about models and data.
Inference Endpoint 利用ation
利用ing inference API endpoints for unauthorized access, data exfiltration, and service abuse through authentication flaws, input validation gaps, and misconfigured permissions.
Testing Groq Inference Platform
Red team testing guide for Groq's high-speed inference platform and its security characteristics.
Together AI 安全 Testing
End-to-end walkthrough for security testing Together AI deployments: API enumeration, inference endpoint exploitation, fine-tuning security review, function calling assessment, and rate limit analysis.
Testing Fireworks AI Deployments
Red team testing guide for models deployed via Fireworks AI including function calling and fine-tuned models.
Testing Together AI Hosted 模型s
Red team testing guide for models hosted on Together AI's inference platform.
Testing vLLM Inference Deployments
Red team testing guide for models served via vLLM including batching, KV cache, and speculative decoding.