AI Security Consulting Methodology
Structured consulting methodology for delivering AI security assessments, from client acquisition through engagement delivery.
Overview
AI security consulting requires a methodology that addresses challenges not present in traditional application security or network penetration testing engagements. AI systems are probabilistic rather than deterministic, meaning the same input can produce different outputs across test runs. The attack surface includes novel categories like prompt injection, safety alignment bypass, and model extraction that most consulting frameworks do not address. Clients often have limited understanding of AI-specific risks, making scoping and expectation management critical to engagement success.
This article presents a structured consulting methodology designed specifically for AI security assessments. It integrates proven elements from established consulting frameworks (such as PTES and OWASP Testing Guide) with AI-specific adaptations informed by the MITRE ATLAS framework and the OWASP Top 10 for LLM Applications. Whether you are an independent consultant, a practitioner at a security firm, or an internal team member who operates in a consulting capacity to business units, this methodology provides a repeatable process for delivering high-value AI security assessments.
Pre-Engagement Phase
Client Qualification and Intake
Not every potential engagement is a good fit. Effective qualification saves time for both consultant and client.
Understand the client's AI maturity: Clients range from organizations deploying their first LLM-powered chatbot to sophisticated AI companies with extensive ML infrastructure. The maturity level determines what kinds of testing are appropriate, what level of client education is needed, and what the engagement's deliverables should emphasize. Ask about their AI development lifecycle, existing security testing practices, and previous AI security assessments.
Identify the testing target: Get specific about what will be tested. "Our AI system" is not a sufficient target definition. You need to understand the specific application(s), the underlying model(s), the deployment architecture, the input vectors available to end users, any tool integrations or data sources the AI can access, and the guardrails or safety measures currently in place. Request architecture diagrams and API documentation during the intake process.
Assess access requirements: Determine what level of access is needed and whether the client can provide it. Will you test through the production user interface, through a staging environment, via API with elevated permissions, or with direct access to model weights and training data? Different access levels enable different testing approaches and affect engagement duration and cost.
Clarify regulatory context: Understanding the client's regulatory environment is essential for scoping and for contextualizing findings. An AI system subject to the EU AI Act's high-risk requirements has different testing priorities than an internal productivity tool. Financial services AI, healthcare AI, and defense AI each operate under distinct regulatory frameworks that affect what constitutes a meaningful finding.
Proposal Development
A strong proposal demonstrates understanding of the client's specific situation and sets clear expectations for the engagement.
Scope statement: Define the boundaries of testing precisely. Specify the systems in scope by name and version, the types of testing to be performed (referencing MITRE ATLAS technique categories where appropriate), the access level required, and any explicit exclusions. Common exclusions include denial-of-service testing against production systems, data poisoning attacks against training pipelines, and social engineering of the client's employees.
Methodology overview: Describe your testing approach at a level that gives the client confidence in your thoroughness without revealing proprietary techniques. Reference industry frameworks (MITRE ATLAS, OWASP LLM Top 10, NIST AI RMF) to establish credibility and alignment with recognized standards.
Deliverables: Specify exactly what the client will receive. A standard deliverable set for an AI security assessment includes:
- An executive summary suitable for non-technical leadership
- A detailed technical report with all findings, evidence, and remediation recommendations
- A vulnerability database or spreadsheet for tracking remediation progress
- A debrief presentation for the client's technical and leadership teams
- Verification testing of critical and high-severity findings after remediation (often scoped as a follow-up engagement)
Timeline and pricing: Provide a realistic timeline based on the scope. AI security assessments typically take longer than equivalent traditional assessments because of the probabilistic nature of AI systems (attacks may need many iterations), the need for manual creative testing that cannot be fully automated, and the time required to understand novel AI architectures. Build adequate time for each engagement phase and include contingency for scope evolution.
Assumptions and dependencies: Document what you need from the client (environment access, documentation, point of contact availability) and what assumptions underlie your estimate. This protects both parties when engagement realities differ from initial expectations.
Rules of Engagement
The Rules of Engagement (RoE) document is the contractual agreement that authorizes testing and defines its boundaries. For AI security assessments, the RoE must address several AI-specific considerations:
Data handling: AI testing often generates sensitive outputs, including extracted personal data, bypassed safety filter content, and examples of harmful model behaviors. The RoE must specify how this data will be handled, stored, and eventually destroyed. Clients in regulated industries may have specific data handling requirements.
Production impact: Specify whether testing against production systems is permitted, and if so, what safeguards are in place. AI testing can consume significant compute resources (particularly automated testing at scale), may generate content that could be logged and flagged by monitoring systems, and in agentic systems, could potentially trigger real-world actions (sending emails, modifying data, executing code).
Escalation procedures: Define when and how to escalate critical findings during testing rather than waiting for the final report. A vulnerability that allows extraction of training data containing PII, or an agent system that can be manipulated to take unauthorized actions, may warrant immediate notification.
Scope boundaries for probabilistic systems: Traditional penetration testing has clear scope boundaries (these IP ranges, these URLs). AI testing boundaries are fuzzier because a single prompt injection might cause the system to access out-of-scope systems through tool use. Define how to handle situations where testing a scoped system reveals vulnerabilities in connected but out-of-scope systems.
Content policies: AI red teaming may require generating or eliciting content that is offensive, harmful, or illegal. The RoE should explicitly authorize this testing within defined bounds and specify any content categories that are off-limits even for testing purposes.
Assessment Execution
Reconnaissance and System Understanding
Spend adequate time understanding the target system before launching attacks. Rushing into active testing without understanding the system's architecture and intended behavior leads to wasted effort and missed vulnerabilities.
Documentation review: Study any available system documentation, including architecture diagrams, API specifications, model cards, safety documentation, and user guides. Understand the system's intended use cases, known limitations, and existing safety measures.
Behavioral baseline: Interact with the system as a normal user to establish a behavioral baseline. Understand its response patterns, input handling, error messages, and the boundaries of its intended functionality. This baseline is essential for recognizing anomalous behavior during testing and for accurately assessing the severity of findings.
Architecture mapping: Map the system's technical architecture from the information available. Identify all input vectors (user prompts, file uploads, API parameters, system integrations), processing components (pre-processing, model inference, post-processing, output filtering), data sources (retrieval-augmented generation sources, tool integrations, external APIs), and output channels (text responses, generated code, function calls, data writes).
Threat model development: Using the architecture map, develop an AI-specific threat model. The MITRE ATLAS framework provides the technique taxonomy. The OWASP Top 10 for LLM Applications provides the risk prioritization. Map applicable techniques to the system's attack surface and prioritize based on the client's risk context.
Structured Testing Approach
Execute testing systematically across the identified attack surface rather than relying on ad hoc exploration.
Phase 1 — Prompt-level testing: This is typically the first and broadest phase. Test for direct prompt injection (can the user override system instructions?), indirect prompt injection (can injected content in retrieved documents or tool outputs influence behavior?), system prompt extraction (can the system's instructions be leaked?), and jailbreaking (can safety alignment be bypassed to produce harmful outputs?). Use a combination of manual creative testing and automated scanning with tools like Garak or Promptfoo.
Phase 2 — Application-level testing: Test the application layer surrounding the AI model. This includes input validation and sanitization, output filtering and content moderation, authentication and authorization for AI features, rate limiting and abuse prevention, session management and conversation isolation, and error handling (do error messages leak system details?). Much of this testing uses traditional web application security testing techniques adapted for AI contexts.
Phase 3 — Tool and integration testing: For AI systems with tool-calling capabilities, test the security of tool integrations. Can the AI be manipulated to call tools with adversarial parameters? Are tool permissions appropriately scoped? Can tool outputs be used to inject further instructions? Can the AI be directed to access data or systems outside its intended scope through tool use? This phase is particularly important for agentic AI systems.
Phase 4 — Data and privacy testing: Test for information leakage from training data (membership inference, training data extraction), inappropriate disclosure of system configuration or internal information, PII handling in conversations and stored data, and data exfiltration through conversation or tool use.
Phase 5 — Robustness and abuse testing: Test the system's resilience to adversarial use. Can it be used to generate harmful content at scale? Can it be used for social engineering assistance? Does it maintain consistent behavior under sustained adversarial interaction? This phase often reveals vulnerabilities that emerge only through extended interaction sequences.
Managing Probabilistic Results
AI systems present a unique challenge for security testing: results are not deterministic. The same prompt may produce different responses across runs. This has several implications for methodology:
Statistical rigor: When reporting a vulnerability, document the success rate across multiple attempts. A prompt injection that works 80% of the time is a different finding than one that works 5% of the time. Conduct sufficient trials (typically 10-20 repetitions for key findings) to establish reliable success rates.
Temperature and configuration sensitivity: Note the model's configuration parameters when they are known or observable. Temperature settings, top-p sampling parameters, and system prompt variations can all affect vulnerability exploitability. Document these when possible.
Version sensitivity: AI models are frequently updated, which can change the effectiveness of specific attacks. Document the model version tested and note that findings may not persist across model updates. This is not a reason to downgrade severity — a vulnerability that exists in the current deployed version is real regardless of whether a future update might mitigate it.
Reproducibility documentation: Provide complete reproduction steps including exact prompts, conversation context, and system state. For findings that require specific conversation history to trigger, document the full conversation sequence, not just the final triggering prompt.
Reporting and Delivery
Finding Severity Classification
Adapt standard vulnerability severity frameworks for AI-specific findings. A useful severity model considers three factors:
Exploitability: How easily can the vulnerability be discovered and exploited? Does it require specialized knowledge or tools, or can any user trigger it with simple prompts? What is the success rate across attempts?
Impact: What is the consequence of successful exploitation? Impact categories for AI systems include data exposure (extraction of training data, PII, or system configuration), safety bypass (generation of harmful, illegal, or policy-violating content), integrity compromise (manipulation of AI decisions or outputs that users rely on), availability degradation (resource exhaustion or denial of service), and privilege escalation (gaining access to tool capabilities or data beyond intended authorization).
Business context: How does the vulnerability map to the client's specific risk profile? A safety bypass in a customer-facing chatbot for a children's education platform is a different severity than the same bypass in an internal code review assistant.
Report Structure
Structure the report to serve multiple audiences within the client organization:
Executive summary (1-2 pages): Non-technical overview of findings, overall risk assessment, and top remediation priorities. Written for C-level leadership and board-level reporting. Focus on business impact and risk, not technical details.
Methodology section: Describe the testing approach, scope, timeline, and any limitations or constraints. This section establishes the basis for the findings and helps the client understand what was and was not tested.
Findings section: Present each finding with a consistent structure: title, severity rating, description of the vulnerability, evidence (annotated screenshots, prompt/response logs), business impact assessment, detailed remediation recommendations, and references to relevant standards (MITRE ATLAS technique IDs, OWASP LLM Top 10 categories, CWE identifiers where applicable).
Strategic recommendations: Beyond individual finding remediation, provide strategic recommendations for improving the client's AI security posture. These might include architectural changes, process improvements, monitoring enhancements, or training recommendations.
Appendices: Include complete testing logs, the full scope and rules of engagement, and any supplementary technical detail that supports the findings but would clutter the main report.
Client Delivery
The delivery phase determines whether findings drive action or gather dust.
Technical debrief: Walk the client's technical team through findings in detail, demonstrating key vulnerabilities live when possible. Live demonstration is particularly effective for AI vulnerabilities because stakeholders often struggle to appreciate the risk of prompt injection or safety bypass from written descriptions alone. Seeing a system produce harmful output or leak data in real time creates the organizational urgency needed to prioritize remediation.
Executive briefing: Present the executive summary to leadership with emphasis on business risk and recommended investment in remediation. Keep this session focused and short (30 minutes maximum). Be prepared to answer questions about comparison to industry peers and regulatory implications.
Remediation planning session: Work with the client's development team to prioritize remediation and create a realistic timeline. For each finding, discuss multiple remediation options with different cost/effectiveness tradeoffs. The consultant's role here is advisory — the client must own the remediation decisions and implementation.
Post-Engagement Phase
Verification Testing
Schedule verification testing 4-8 weeks after the engagement to confirm that critical and high-severity findings have been remediated effectively. Verification testing should:
- Retest all critical and high findings using the original test cases
- Test for regression (fixing one vulnerability may introduce another)
- Verify that remediation addresses the root cause, not just the specific test case
- Document verification results in a brief follow-up report
Knowledge Capture
After each engagement, capture lessons learned to improve future engagements:
Technique effectiveness: Which testing techniques were most productive for this type of system? Update your methodology playbooks accordingly.
Tooling gaps: Did you encounter situations where existing tools were inadequate? Document these as potential tool development priorities.
Client feedback: Solicit feedback on the engagement process, report quality, and communication effectiveness. Use this to refine your methodology for future clients.
Finding patterns: Do the findings from this engagement reinforce or contradict patterns from previous engagements? Update your understanding of common vulnerability patterns in AI systems.
Methodology Adaptation for Different AI System Types
LLM-Powered Applications
The most common consulting engagement type. Focus areas include prompt injection (direct and indirect), safety alignment bypass, data leakage through conversation, and tool abuse in systems with function-calling capability. Automated scanning tools are most mature for this category.
Computer Vision Systems
Adversarial example generation, model evasion techniques, and physical-world attack scenarios. These engagements require different tooling (primarily based on the Adversarial Robustness Toolbox or CleverHans) and different expertise than LLM assessments. The NIST AI RMF's discussion of adversarial robustness is particularly relevant.
Agentic AI Systems
Systems with autonomous decision-making and tool use capabilities. These are the highest-risk and most complex engagement type. Focus areas include privilege escalation through tool chains, goal hijacking, unauthorized action execution, and multi-agent interaction vulnerabilities. Methodology for this category is still evolving rapidly.
RAG Systems
Retrieval-Augmented Generation systems introduce attack surface through the retrieval pipeline. Test for knowledge base poisoning, indirect prompt injection through retrieved documents, information leakage across tenants in multi-tenant RAG systems, and retrieval manipulation to influence model outputs.
Fine-Tuned and Custom Models
Organizations that fine-tune models or train custom models present additional testing opportunities, including training data extraction, backdoor detection, and assessment of fine-tuning's effect on safety alignment. These engagements typically require higher access levels (model weights, training data access) and more advanced technical expertise.
Business Operations
Pricing Strategies
AI security consulting engagements are typically priced using one of three models:
Fixed-price per engagement: Based on system complexity, scope, and estimated effort. This model provides cost certainty for the client and is most appropriate for well-defined scopes. Common ranges are $30,000 to $80,000 for a single-system assessment, depending on complexity and duration.
Time and materials: Billed at an hourly or daily rate with a not-to-exceed cap. This model is appropriate for engagements with uncertain scope or for ongoing advisory relationships. Senior AI security consultants typically bill at $300 to $500 per hour.
Retainer model: A monthly or quarterly retainer that provides the client with a defined allocation of testing hours and priority access. This model supports ongoing relationships and continuous testing engagements.
Building Client Relationships
Long-term client relationships are the foundation of a sustainable consulting practice.
Demonstrate value beyond findings: Help clients understand their AI risk landscape, connect them with relevant industry resources and communities, and proactively alert them to emerging threats that affect their AI deployments. Be a trusted advisor, not just a testing vendor.
Follow-up cadence: Check in with clients quarterly, even between engagements. Share relevant threat intelligence, ask about new AI deployments that might need assessment, and maintain the relationship.
Referral development: Satisfied clients are the best source of new business. Deliver excellent work, maintain the relationship, and the referrals will follow. The AI security consulting market is still small enough that reputation travels quickly.
References
- MITRE ATLAS (Adversarial Threat Landscape for AI Systems). https://atlas.mitre.org/ — Technique taxonomy used for structuring AI security assessments.
- OWASP Top 10 for LLM Applications, 2025 Edition. https://owasp.org/www-project-top-10-for-large-language-model-applications/ — Risk prioritization framework for LLM application security assessments.
- NIST AI Risk Management Framework (AI RMF 1.0), January 2023. https://www.nist.gov/artificial-intelligence/ai-risk-management-framework — Federal framework for AI risk management, referenced in proposals and reports.
- Penetration Testing Execution Standard (PTES). http://www.pentest-standard.org/ — Traditional penetration testing methodology framework adapted for AI security contexts.