OWASP Agentic AI Security Maturity Framework 2026: Where Does Your Agent Stand?
83% of organizations plan to deploy agentic AI, yet only 29% believe they can adequately protect it (Cisco State of AI Security 2026, via Practical DevSecOps). That 54-point gap tells you something important: the problem is not whether security is being done, but at what level. Many teams deploy Promptfoo and set up a WAF and consider the job finished. According to OWASP's officially published Enterprise Adoption Maturity Model (June 2026), that approach lands you at Level 1 at best — reactive, not governed — with a clearly defined gap separating you from Level 2, the minimum for responsible production deployment.
This article breaks down the full 2D matrix in the OWASP framework (adoption tiers AT0-AT5 × governance maturity Level 0-3), covers the three most overlooked multi-agent threats in the OWASP Agentic Top 10 (ASI06/ASI07/ASI08), and provides an actionable self-assessment method and upgrade roadmap.
TL;DR
- OWASP officially defines a 2D matrix: 6 adoption tiers (AT0-AT5) × 4 governance maturity levels (Level 0-3)
- 79% of organizations are stuck at Level 1: tools without governance (Practical DevSecOps, 2026)
- The 3 most overlooked multi-agent threats: ASI06 (memory poisoning), ASI07 (inter-agent communication), ASI08 (cascading failures)
- Moving from Level 1 to Level 2 requires observability, not stronger filters: tool-call logging + named owners
- OWASP officially defines up to Level 3; Level 4-5 are extensions from Practical DevSecOps, SANS, CSA — not OWASP standard
Why "Having Security Tools" Is Not the Same as "Being Security-Mature"
This is the most common cognitive trap: install Promptfoo or LLM Guard and assume security is handled.
Practical DevSecOps survey data is blunt: 79% of organizations are stuck at Level 1 (Reactive). What Level 1 actually looks like: basic prompt filtering, a WAF in front of the LLM, incident response triggered only after something breaks. But what it lacks matters more:
- No AI asset inventory (no idea which agents are running in the organization)
- No tool-call logs (no traceable record of what the agent did)
- No named owners (unclear who's responsible when something goes wrong)
The core insight the maturity framework provides is the shift from point-in-time defenses to systemic governance. Just as having a firewall doesn't mean you have a mature network security posture, having an LLM filter doesn't mean you have agentic AI governance.
The entry ticket to Level 2 is observability, not stronger filters. Can you answer: "What is my agent doing right now?", "What did it just do?", "Who authorized that operation?" — if you can answer all three, you've entered Level 2 territory.
OWASP Agentic AI Top 10 Complete List (ASI01-ASI10)
OWASP Top 10 for Agentic Applications 2026 defines 10 threats (officially numbered ASI01-ASI10). The table below summarizes the full list:
| Code | Threat Name | Core Risk | Coverage Status |
|---|---|---|---|
| ASI01 | Agent Goal Hijack | Attacker manipulates agent goals via direct/indirect injection | Covered |
| ASI02 | Tool Misuse & Exploitation | Unsafe tool combinations or excessive invocations produce harmful outcomes | Partially covered |
| ASI03 | Agent Identity & Privilege Abuse | Unauthorized operations across cross-agent trust chains | Covered |
| ASI04 | Agentic Supply Chain Compromise | External agents, tools, schemas, prompts compromised | Covered |
| ASI05 | Unexpected Code Execution | Code generated or triggered by agents runs in uncontained environments | Covered |
| ASI06 | Memory & Context Poisoning | Injection/leakage into memory or context state, affecting future reasoning | Not covered |
| ASI07 | Insecure Inter-Agent Communication | Agent-to-agent messages intercepted, injected, or spoofed | Not covered |
| ASI08 | Cascading Agent Failures | Small agent failures propagate through pipelines, causing large-scale impact | Not covered |
| ASI09 | Human-Agent Trust Exploitation | Exploiting human over-reliance on agents to manipulate behavior | Mentioned indirectly |
| ASI10 | Rogue Agents | Agents exceeding intended goals due to objective drift or unexpected behavior | Mentioned indirectly |
For technical defenses against ASI01-ASI05, see OWASP Agentic AI Security Defense Guide, which covers implementation details.
The following sections focus on the three uncovered gap threats:
ASI06 Memory Poisoning: The Most Underestimated Persistent Threat
Why it's dangerous: 89% of agents share memory across users/sessions with no integrity verification (Repello AI, 2026).
Standard prompt injection is an in-session attack — it ends when the session ends. ASI06 memory poisoning has a distinct signature: "low-frequency implant, persistent impact." An attacker injects malicious information into the agent's long-term memory store in a single session; weeks of subsequent agent reasoning may then be affected (Repello AI, 2026), with the attack origin difficult to trace.
Typical attack path:
- Attacker injects malicious "user preference" data into the agent's memory store in one session
- In a subsequent session by a different user, the poisoned memory influences agent behavior
- RAG data source poisoning: contaminating the vector database affects every agent that relies on that knowledge base
Defenses: Isolate memory by user/tenant; tag every memory entry with its source and session; use a secondary model to validate memory writes; implement memory entry expiration.
ASI07 Inter-Agent Communication Attacks: The Blind Spot of Multi-Agent Architectures
Why it's dangerous: Multi-agent architectures (orchestrator + sub-agents) became mainstream in 2026. Agent-to-agent communication typically assumes trust, with no encryption or authentication in place.
Typical attack vectors:
- MitM (man-in-the-middle): intercepting A2A or MCP protocol messages
- Injection: injecting malicious instructions into a sub-agent, disguised as legitimate orchestrator commands
- Replay attacks: replaying captured old instructions to trigger unintended behavior
- Identity spoofing: impersonating a legitimate agent to issue commands
Defenses: Assign each agent a unique cryptographic identity (SPIFFE/SPIRE, inter-agent mTLS); sign inter-agent messages; re-authorize each downstream request; log all inter-agent communication completely.
ASI08 Cascading Failures: An Architectural Design Problem
Why it's dangerous: 76% of multi-agent systems lack circuit breakers (Repello AI, 2026). In an orchestrated multi-agent system, one compromised subsystem is effectively a threat to the entire agent network.
Analogy: the 2003 Northeast blackout wasn't a problem with any single power plant — it was the absence of cutoff points in the failure propagation mechanism. ASI08 is the same kind of architectural problem, not a single-point vulnerability.
Typical failure modes: A compromised agent propagates malicious instructions through a multi-agent pipeline; resource exhaustion (one agent triggers excessive tool calls, draining downstream system capacity); state contamination (poisoned output becomes another agent's input).
Defenses: Implement circuit breakers; design safe failure modes (agents pause and escalate to humans on failure, rather than continuing); isolate agent boundaries; implement transactional rollback for reversible operations.
OWASP Enterprise Adoption Maturity Model Breakdown
OWASP State of Agentic AI Security and Governance v2.01 (June 1, 2026) defines a 2D matrix: what you've deployed (adoption tier) and how mature your governance is (governance maturity).
Important: the two dimensions are independent. An organization can simultaneously be AT4 (code-executing agents) while stuck at Level 0 (zero governance). This is the most common high-risk combination and the most frequently missed diagnostic blind spot.
Dimension 1: Adoption Tiers AT0-AT5 (What You've Deployed)
| Tier | Name | Typical Characteristics |
|---|---|---|
| AT0 | Shadow AI | AI tools used without organizational knowledge or approval |
| AT1 | Vendor Embedded Assistant | AI assistant fully controlled by vendor (you consume, don't build) |
| AT2 | Platform Integrated | AI-native platform uses your data but cannot execute arbitrary code |
| AT3 | Citizen Developer Agent | Low-code/no-code platform; users configure workflows without writing code; operates on real org data |
| AT4 | Code Executing Agent | Generates and executes code; has local or cloud-level permissions |
| AT5 | Custom In-House Agent | Organization-built system; controls its own identity, tools, and boundaries |
The security responsibility inflection point is AT3: from "vendor primarily responsible" (AT1-AT2) to "organization must actively govern." AT4-AT5 places security responsibility almost entirely on the organization.
Dimension 2: Governance Maturity Level 0-3 (How Far Your Governance Reaches)
| Level | Name | Core Characteristics |
|---|---|---|
| Level 0 | Unaware and Ad Hoc | No formal governance awareness; shadow IT experiments; minimal logging; generic IT incident handling |
| Level 1 | Experimentation Without Guardrails | Pilot projects lack defined autonomy limits and decision scope; occasional red-team testing; no continuous monitoring; ambiguous accountability |
| Level 2 | Policy-Defined, Human-in-the-Loop | Formal policies with regulatory alignment (EU AI Act, GDPR); human confirmation for high-impact decisions; named owners; logging and version control established |
| Level 3 | Integrated, Continuous Oversight | Agentic AI treated as critical infrastructure; real-time dashboards, kill switches, Governance-as-code |
OWASP's official framework currently defines up to Level 3. Some industry frameworks go further (Practical DevSecOps to Level 4, SANS to Stage 5, CSA to Level 4), but these are each organization's own extensions — not OWASP official standards. Cite them with source attribution.
2D Matrix: High-Risk Combinations
| Level 0 | Level 1 | Level 2 | Level 3 | |
|---|---|---|---|---|
| AT1-AT2 | Low risk | Acceptable | Above standard | Above standard |
| AT3 | Medium risk | Needs improvement | Minimum requirement | Good |
| AT4 | High risk | Needs immediate action | Minimum requirement | Target |
| AT5 | Extreme risk | Should not deploy | Minimum requirement | Good |
AT4-AT5 + Level 0-1 is the combination that demands immediate attention. Given the 54-point gap data above, a large proportion of organizations sit in exactly this position.
Security Maturity Self-Assessment
5-Dimension Scoring Method (Practical DevSecOps, 2026)
Each dimension scored 0-10; total maps to maturity level:
| Dimension | 0 (Level 0) | 5 (Level 1-2 boundary) | 10 (Level 3) |
|---|---|---|---|
| AI Asset Inventory | No idea which agents exist | Know main agents; shadow AI uninventoried | Complete inventory including shadow AI |
| Policy and Compliance | No AI policy at all | Generic AI policy; not mapped to regulations | Formal policy aligned to regulatory frameworks |
| Monitoring and Detection | No monitoring | Basic alerts; no runtime monitoring | Real-time tool-call monitoring |
| Testing and Validation | Never conducted security testing | Occasional red-team testing; no regular schedule | Quarterly red-team + continuous automated testing |
| Incident Response | Using generic IT processes | AI-specific playbook exists but untested | Practiced AI incident response process |
Scoring: 0-10 = Level 0, 11-25 = Level 1, 26-40 = Level 2, 41-50 = Level 3
79% of organizations score Level 1 (11-25) using this method. The two dimensions that pull scores down most are "Monitoring and Detection" and "AI Asset Inventory."
Enterprise vs. Individual Developer: The Reality Gap
Enterprise Level 2 requirements:
- Named agent owners (someone accountable for every agent)
- Human confirmation workflow for high-impact operations
- Complete tool-call logging capturing per operation: agent identity, authorizer, data accessed, action taken, policy outcome, timestamp
- Alignment with all four NIST AI RMF functions (Govern/Map/Measure/Manage)
- Quarterly red-team testing
Individual developer / small tool Level 2 requirements (realistic version):
- Basic tool-call logging (what the agent did and when)
- Explicit least-privilege per tool (only give agents the tools they need; no blanket access)
- A unique identity per agent (no shared accounts or shared API keys)
- At minimum, a manual security review before each release
CISA-standard SHA-256 hash chain logging with 6-month retention is impractical for individual developers. The important thing is building observability habits, not perfectly satisfying enterprise compliance standards.
90-Day Roadmap from Level 1 to Level 3
Source: Repello AI 2026 OWASP Agentic AI Top 10 Enterprise Implementation Roadmap.
Phase 1 (Weeks 1-4): Establish Visibility
- Inventory all agent deployments, including shadow AI
- Conduct blast radius assessment per agent (worst case if this agent is compromised)
- Build ASI risk baseline (check each of ASI01-ASI10 for whether a corresponding control exists)
Phase 2 (Weeks 5-8): Quick Wins
- Reduce service account permissions; implement short-lived credentials
- Sandbox code execution environments
- Isolate agent memory by user/tenant (addresses the minimum requirement for ASI06)
- Establish tool-call logging (the Level 2 baseline)
Phase 3 (Weeks 9-12): Active Defense
- Deploy pre-execution validation for goal drift and tool misuse
- Implement behavioral anomaly detection
- Harden the supply chain with signed attestations (addresses ASI04)
- Add circuit breakers to multi-agent systems (addresses ASI08)
Phase 4 (Ongoing): Continuous Validation
- Conduct specialized red-team testing against agentic attack vectors
- Maintain behavioral baselines and re-validate periodically
- Implement Governance-as-code for automated policy enforcement
Simplified path for individual developers:
Completing Phase 1 + Phase 2 fundamentals (inventory, least-privilege tools, tool-call logging) is sufficient to reach a Level 2 standard appropriate for individual tools. Phase 3-4 are enterprise priorities.
What Each Maturity Level Actually Looks Like
The following scenarios describe typical organizational states based on OWASP Level definitions. They are not claims about the firsthand experiences of any specific organization.
Level 0 typical scenario: An independent developer using Claude Code for a side project; tool permissions have never been reviewed; the agent has shell access but it's unclear whether API keys have leaked. Anomalies are handled with generic methods; there is no AI-specific incident process.
Level 1 typical scenario: A small SaaS company with LLM Guard deployed in front of the API and basic prompt filtering in place. But no AI asset inventory (unclear which other agents are running); a security scan was triggered reactively after an API key leak. Accountability is ambiguous.
Level 2 typical scenario: A mid-size enterprise with an AI asset inventory, quarterly red-team testing, and basic tool-call logging in place. High-impact decisions require human confirmation. But monitoring runs in periodic batches rather than real-time alerts.
Level 3 typical scenario: A large financial institution or regulated industry: real-time dashboards tracking agent behavioral drift; kill switches capable of immediately suspending autonomous operation; governance policies are machine-readable and automatically enforced throughout the AI lifecycle; every decision is fully traceable.
Conclusion
Start with a 5-minute self-assessment: score your system against the 5-dimension table above. If your total is between 11-25, you're at Level 1 — the same as 79% of organizations (Practical DevSecOps, 2026).
The path forward from here is clear:
If you're an individual developer or building small tools, AT1-AT2 priority action is verifying your vendor's security policies. For AT4-AT5, prioritize Phase 1 + Phase 2 fundamentals (least-privilege tools + tool-call logging + unique agent identities) into this month's development plan.
If you're an enterprise security or engineering lead, Level 2 is the minimum threshold for responsible production deployment. Per the OWASP framework, deploying AT4-AT5 agents without named owners, tool-call logging, and human confirmation mechanisms puts you in the Level 0-1 high-risk combination — not recommended for production.
For implementation details on technical defenses (ASI01-ASI05 toolchains, configuration approaches, code-level protections), continue to the OWASP Agentic AI Security Defense Technical Guide.
FAQ
What do OWASP Agentic AI governance maturity Levels 0-3 represent?
Level 0 (Unaware and Ad Hoc): no formal governance, shadow IT experiments; Level 1 (Experimentation Without Guardrails): pilot projects lacking defined constraints, ambiguous accountability; Level 2 (Policy-Defined, Human-in-the-Loop): formal policies, named owners, human confirmation for high-impact decisions; Level 3 (Integrated, Continuous Oversight): real-time dashboards, kill switches, Governance-as-code. OWASP officially defines up to Level 3.
How do I assess which maturity level my AI agent system is at?
Use the 5-dimension self-assessment: AI asset inventory completeness, policy and compliance coverage, monitoring and detection capability, testing and validation frequency, incident response maturity — each scored 0-10. Total 0-10 = Level 0, 11-25 = Level 1, 26-40 = Level 2, 41-50 = Level 3.
What is the difference between AT adoption tiers and governance maturity levels?
AT tiers (AT0-AT5) describe 'what type of agent you have deployed' — from shadow AI to fully custom-built systems. Governance maturity (Level 0-3) describes 'how mature your security governance is.' The two are independent: an organization can be AT4 (code-executing agents) while still sitting at Level 0 (zero governance).
How does ASI06 memory poisoning differ from ordinary prompt injection?
Prompt injection is an in-session attack — it ends when the session ends. ASI06 memory poisoning is 'low-frequency implant, persistent impact': an attacker poisons the agent's long-term memory store in one session, affecting reasoning for weeks afterward. 89% of agents share memory across users/sessions with no integrity verification (Repello AI, 2026), making this harder to trace than prompt injection.
What are the three most critical steps to move from Level 1 to Level 2?
1. Build an AI asset inventory (catalog every agent deployment, including shadow AI); 2. Establish tool-call logging (every agent action has a traceable record); 3. Assign a named owner to each high-impact agent (clear accountability). The Level 2 threshold is not a stronger filter — it's observability.
What maturity level does an individual developer need for responsible deployment?
It depends on your AT tier. AT1-AT2 (using vendor platforms, no code execution): the vendor bears primary responsibility; strict self-assessment is not required. AT4-AT5 (your agent executes code, accesses external systems): a minimum of Level 2 is required — specifically tool-call logging, explicit least-privilege per tool, and a unique identity per agent (no shared accounts).
Was this article helpful?



