# inference
22 articlestagged with “inference”
Hugging Face Inference Endpoints Security
Security analysis of Hugging Face Inference Endpoints including model isolation and API security.
Property Inference Attacks
Inferring global properties of training datasets through model behavior analysis.
Privacy-Preserving Inference Patterns
Implementing privacy-preserving inference patterns to protect sensitive data processed by LLM applications.
Inference: Sampling, Temperature & Generation
How LLMs generate text at inference time — greedy decoding, top-k, top-p, temperature — and how these parameters affect attack success rates.
Speculative Decoding Security Analysis
Security implications of speculative decoding optimizations including draft model attacks and verification bypasses.
Inference Endpoint Hardening
Hardening model inference endpoints against adversarial inputs, DoS, and information leakage.
Triton Inference Server Security
Security hardening for NVIDIA Triton Inference Server deployments including model repository protection and API security.
vLLM Security Configuration
Security hardening for vLLM serving deployments including API authentication, resource limits, and input validation.
Advanced Blind Injection Techniques
Timing-based inference, differential response analysis, and binary search extraction methods for blind prompt injection in production AI systems.
KV Cache Manipulation Attacks
Manipulate key-value caches in inference servers to inject or alter cached context across requests.
Inference Cost Attacks
Attacks that exploit inference cost dynamics to cause financial damage through adversarial input crafting and API abuse.
Inference Optimization Risks
Security implications of model optimization techniques — covering quantization safety degradation, pruning vulnerability introduction, distillation attacks, and speculative decoding risks.
Model Serving Security
Security hardening for model serving infrastructure — covering vLLM, TGI, Triton Inference Server configuration, API security, resource isolation, and deployment best practices.
Inference Optimization Attacks
Speculative decoding attacks, batching vulnerabilities, continuous batching exploitation, and how optimization for speed creates security gaps in LLM inference.
KV Cache Poisoning & Exploitation
How KV cache works in transformer inference, cache poisoning across requests in shared deployments, prefix caching attacks, and cross-tenant data leakage.
GPU Side Channel Basics
GPU-based side channel attacks on ML inference, exploiting timing, power consumption, and memory access patterns to extract information about models and data.
Inference Endpoint Exploitation
Exploiting inference API endpoints for unauthorized access, data exfiltration, and service abuse through authentication flaws, input validation gaps, and misconfigured permissions.
Testing Groq Inference Platform
Red team testing guide for Groq's high-speed inference platform and its security characteristics.
Together AI Security Testing
End-to-end walkthrough for security testing Together AI deployments: API enumeration, inference endpoint exploitation, fine-tuning security review, function calling assessment, and rate limit analysis.
Testing Fireworks AI Deployments
Red team testing guide for models deployed via Fireworks AI including function calling and fine-tuned models.
Testing Together AI Hosted Models
Red team testing guide for models hosted on Together AI's inference platform.
Testing vLLM Inference Deployments
Red team testing guide for models served via vLLM including batching, KV cache, and speculative decoding.