# inference
22 artikelengetagd met “inference”
Beveiliging van Hugging Face Inference Endpoints
Security analysis of Hugging Face Inference Endpoints including model isolation and API security.
Property inference-aanvallen
Inferring global properties of training datasets through model behavior analysis.
Privacy-Preserving Inference Patterns
Het implementeren van privacybeschermende inferentiepatronen om gevoelige gegevens die door LLM-applicaties worden verwerkt te beschermen.
Inferentie: sampling, temperatuur en generatie
Hoe LLM's tekst genereren tijdens inferentie — greedy decoding, top-k, top-p, temperatuur — en hoe deze parameters het slagingspercentage van aanvallen beïnvloeden.
Beveiligingsanalyse van speculative decoding
Beveiligingsimplicaties van speculative decoding-optimalisaties, waaronder draft-modelaanvallen en het omzeilen van verificatie.
Inference Endpoint Hardening
Het harden van model-inferentie-endpoints tegen adversariële inputs, DoS en informatielek.
Beveiliging van de Triton Inference Server
Security hardening for NVIDIA Triton Inference Server deployments including model repository protection and API security.
Beveiligingsconfiguratie van vLLM
Security hardening for vLLM serving deployments including API authentication, resource limits, and input validation.
Geavanceerde blind injection-technieken
Timing-gebaseerde inferentie, differentiële responsanalyse en binary search-extractiemethoden voor blind prompt injection in productie-AI-systemen.
Manipulatie-aanvallen op de KV-cache
Manipulate key-value caches in inference servers to inject or alter cached context across requests.
Aanvallen op inferentiekosten
Attacks that exploit inference cost dynamics to cause financial damage through adversarial input crafting and API abuse.
Risico's van inferentie-optimalisatie
Security implications of model optimization techniques — covering quantization safety degradation, pruning vulnerability introduction, distillation attacks, and speculative decoding risks.
Beveiliging van model serving
Security hardening for model serving infrastructure — covering vLLM, TGI, Triton Inference Server configuration, API security, resource isolation, and deployment best practices.
Aanvallen op inferentieoptimalisatie
Aanvallen op speculatieve decodering, kwetsbaarheden in batching, exploitatie van continuous batching, en hoe optimalisatie voor snelheid beveiligingsgaten creëert in LLM-inferentie.
KV-cache-vergiftiging en -exploitatie
Hoe de KV-cache werkt in transformer-inferentie, cachevergiftiging over verzoeken heen in gedeelde deployments, prefix-caching-aanvallen, en cross-tenant-datalekkage.
Grondbeginselen van GPU-side-channels
GPU-based side channel attacks on ML inference, exploiting timing, power consumption, and memory access patterns to extract information about models and data.
Misbruik van inference-endpoints
Exploiting inference API endpoints for unauthorized access, data exfiltration, and service abuse through authentication flaws, input validation gaps, and misconfigured permissions.
Het Groq inference-platform testen
Red team testing guide for Groq's high-speed inference platform and its security characteristics.
Beveiligingstesten van Together AI
End-to-end walkthrough for security testing Together AI deployments: API enumeration, inference endpoint exploitation, fine-tuning security review, function calling assessment, and rate limit analysis.
Fireworks AI-deployments testen
Red team testing guide for models deployed via Fireworks AI including function calling and fine-tuned models.
Door Together AI gehoste modellen testen
Red team testing guide for models hosted on Together AI's inference platform.
vLLM inference-deployments testen
Red team testing guide for models served via vLLM including batching, KV cache, and speculative decoding.