# vllm
7 artikelengetagd met “vllm”
AI Infrastructure Exploitation
Methodologie voor het exploiteren van GPU-clusters, model-serving-frameworks (Triton, vLLM, Ollama), Kubernetes ML-platforms, cloud-AI-services en cost-amplification-aanvallen.
Security Comparison of Model Serving Frameworks
In-depth security analysis of TorchServe, TensorFlow Serving, Triton Inference Server, and vLLM for production AI deployments
Beveiligingsconfiguratie van vLLM
Security hardening for vLLM serving deployments including API authentication, resource limits, and input validation.
Lab: misbruik van inference-servers
Attack vLLM, TGI, and Triton inference servers to discover information disclosure vulnerabilities, denial-of-service vectors, and configuration weaknesses in model serving infrastructure.
Beveiliging van model serving
Security hardening for model serving infrastructure — covering vLLM, TGI, Triton Inference Server configuration, API security, resource isolation, and deployment best practices.
Lab-setup: Ollama, vLLM en Docker Compose
Complete lab setup guide for AI red teaming: local model serving with Ollama and vLLM, GPU configuration, Docker Compose for multi-service testing environments.
vLLM inference-deployments testen
Red team testing guide for models served via vLLM including batching, KV cache, and speculative decoding.