In March 2025, Microsoft researchers demonstrated that their own multi-agent framework Magentic-One executes arbitrary malicious code 97% of the time when a compromised file enters the pipeline. Not because any single request was obviously malicious — but because the attack unfolded across multiple steps, each one looking perfectly normal in isolation.

This is the fundamental problem with today’s AI security: we’re checking each request individually while attackers are thinking in sequences.

The Attack That Looks Normal

Here’s a real attack pattern documented by Palo Alto Unit 42’s research on Google’s A2A protocol:

Turn 1: Research agent asks financial agent a normal question
Turn 2: Financial agent responds with its capabilities
Turn 3: Research agent asks a follow-up that subtly probes system instructions
Turn 4: Financial agent leaks tool configuration
Turn 5: Research agent smuggles a hidden instruction
Turn 6: Financial agent executes an unauthorized stock trade

Each individual message passes every content filter. No prompt injection detected. No sensitive data in transit. No dangerous tool call. But the sequence is a textbook social engineering attack — reconnaissance, trust building, exploitation.

Unit 42 calls this Agent Session Smuggling. It exploits A2A’s stateful protocol design: agents remember previous context, allowing attackers to refine their approach across turns.

Agents Replay the APT Playbook

If you’ve worked in traditional security, this should sound familiar. It’s the same kill chain that APT groups have used for decades — just with a new attack surface.

Harvard’s Berkman Klein Center formalized this as the Promptware Kill Chain (arXiv:2601.09625), mapping it directly to the MITRE ATT&CK framework:

ATT&CK PhaseTraditional AttackAgent Attack Equivalent
ReconnaissancePort scanning, OSINTEnumerating available tools, probing agent capabilities
Initial AccessPhishing emailPrompt injection via malicious document, email, or web content
ExecutionMalware executionTool call executing malicious operation
PersistenceRegistry keys, cron jobsMemory poisoning, RAG data contamination
Privilege EscalationKernel exploitJailbreak, confused deputy attack
Lateral MovementPass-the-hash, RDPCross-agent message manipulation, self-replicating prompts
ExfiltrationC2 channelData sent via URL fetch, email tool, or search query

This isn’t theoretical. Anthropic reported the first real-world AI-orchestrated cyber espionage campaign (GTG-1002) in September 2025 — a state-sponsored group used a jailbroken agent to autonomously complete 80-90% of a complex intrusion, including credential extraction, system analysis, and data exfiltration across ~30 targets.

Why Request-Level Detection Fails

Every major AI security product today operates at the request level:

Request comes in → scan for prompt injection → scan for PII → pass/block
Response comes back → scan for sensitive data → pass/block
Forget everything → next request starts fresh

This catches the obvious stuff. But it structurally cannot detect:

1. Credential Harvesting Chains

Turn 2: read_file("~/.env")          ← Normal. Developers read configs.
Turn 5: curl POST https://external   ← Normal. Apps make HTTP requests.
Together: read secrets → exfiltrate   ← Attack.

2. Slow Data Exfiltration

An attacker doesn’t steal everything at once. They take one piece per session — an API key today, a database password tomorrow, a customer list next week. Each individual access looks routine. Security researchers call this salami slicing: 10 support tickets over a week, each slightly redefining agent constraints, until unauthorized actions become “normal.”

3. Memory Poisoning with Delayed Triggers

Palo Alto Unit 42 demonstrated that injected content can sit dormant in agent memory for days or weeks, only activating when an unrelated interaction triggers it. Their research found that once memory is poisoned, 87% of downstream decisions become compromised within 4 hours of the trigger event. Unlike prompt injection (which disappears when the session ends), memory poisoning creates persistent compromise.

4. Cross-Agent Privilege Escalation

In multi-agent systems, a low-privilege agent can manipulate a high-privilege agent into acting on its behalf. The COLM 2025 paper showed this works even when individual agents refuse unsafe operations — by manipulating the orchestrator’s scheduling metadata (particularly error reports), attackers hijack the entire system’s control flow.

What the Research Says About Detection

Academic and industry research is converging on one conclusion: detection must be session-level, not request-level.

ControlValve (Microsoft Research, 2025)

Microsoft’s own answer to the Magentic-One vulnerability. Instead of checking individual requests, ControlValve generates permitted control-flow graphs for multi-agent systems — a whitelist of allowed execution sequences. Any deviation triggers an alert. This is directly borrowed from Control-Flow Integrity (CFI) in systems security, adapted for agent orchestration.

Key property: it’s zero-shot and task-agnostic. No training data needed, works across different agent configurations.

LlamaFirewall (Meta, 2025)

Meta’s open-source guardrail includes Agent Alignment Checks — the first real-time audit of an agent’s chain-of-thought reasoning. It reduced attack success rates by 90%.

But here’s the catch: the ControlValve paper demonstrated that LlamaFirewall’s alignment checks can be bypassed by carefully constructed attack sequences. Single-step alignment verification isn’t enough when attackers think in multi-step chains.

OWASP Top 10 for Agentic Applications (2025)

Three of the top 10 risks directly relate to chain attacks:

  • ASI06 (Memory & Context Manipulation): Memory poisoning as the persistence layer for cross-session attacks
  • ASI07 (Insecure Inter-Agent Communication): Forged agent messages enabling lateral movement
  • ASI08 (Cascading Failures): Small misalignments amplifying into system-wide failures across agent chains

From the OWASP report:

“Agentic systems chain decisions and actions across multiple steps, and small inaccuracies compound and propagate, so what begins as a minor misalignment in one agent can trigger a system-wide outage.”

MITRE ATLAS Update (October 2025)

MITRE added 14 agent-specific attack techniques to their ATLAS framework, developed in collaboration with Zenity Labs. New entries include AI Agent Context Poisoning, Memory Manipulation, Thread Injection, and Modify AI Agent Configuration — all multi-step attack patterns that require temporal correlation to detect.

Chain Detection: The Missing Layer

The gap in today’s AI security stack is clear:

Existing DefenseWhat It CatchesWhat It Misses
PI detection (PromptGuard, etc.)Single-request injectionMulti-step sequences
LlamaFirewall alignment checkSingle-step goal deviationCarefully crafted sequences that bypass alignment
DLP scanningSingle-request data leakageSlow exfiltration across sessions
Tool Guard rulesIndividual dangerous tool callsCombinations of harmless calls that form an attack

What’s missing is a layer that correlates signals across time:

Event timeline for Session X:

  t=0   Normal conversation              score: 0
  t=3   PI detection triggered            score: +30
  t=5   Sensitive file accessed            score: +10
  t=7   DLP found API key in content       score: +20
  t=9   External HTTP request attempted    score: +40
                                           ─────────
  t=9   Chain rule matched:                score: 100 → BLOCK
        "credential-harvest"
        (read .env → external request)

This is the same architectural pattern as EDR behavioral detection. CrowdStrike’s Falcon agent doesn’t just check if a single system call is malicious — it evaluates sequences of operations over time: “login from new IP → disable antivirus → inject code” is a pattern, not three independent events.

SentinelOne’s Storyline technology automatically maps event sequences into visual attack chains, tracking the full attack path from initial compromise to impact. The equivalent for agents is mapping tool call sequences into attack chain timelines — detecting the pattern, not just the individual events.

The Six Chain Patterns

Based on documented attacks, PoCs, and academic research, these are the chain patterns that matter most:

1. Credential Harvesting read_file(.env/.aws/credentials)url_fetch(external) or send_email Source: arXiv:2510.09093, real-world CVE-2025-68664

2. Data Exfiltration list_filesread_file(sensitive) × N → write_file or send_email Source: Unit 42 session smuggling PoC

3. Reconnaissance → Extract list_toolsread_configsystem_info → targeted exploitation Source: Anthropic GTG-1002 campaign

4. Memory Poisoning Injected content → [days/weeks dormant] → triggered behavioral change Source: Unit 42 memory poisoning research, MINJA (NeurIPS)

5. Privilege Escalation Low-privilege data read → call to high-privilege agent → execute privileged action Source: COLM 2025, SEAgent framework

6. Self-Replication read_messagegenerate_reply_with_payloadsend_message Source: Morris II worm (arXiv:2403.02817)

Where We Go From Here

Chain detection for AI agents is where behavioral detection was for endpoint security in 2012 — everyone agrees it’s needed, few have built it, and the early movers will define the category.

The technical requirements are clear:

  • Stateful session tracking — maintain context across turns, not just individual requests
  • Temporal correlation — detect patterns that span minutes, hours, or days
  • Graduated response — not binary pass/block, but progressive threat scoring with escalating restrictions
  • Pattern library — a growing database of known attack chain signatures, updated from real-world incidents

The research is there. The attack patterns are documented. The question is who builds the detection layer that connects the dots.


References

Academic Papers

  1. Wu et al., “Multi-Agent Systems Execute Arbitrary Malicious Code,” COLM 2025. OpenReview
  2. Wu & Shmatikov, “Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems,” arXiv:2510.17276. Microsoft Research
  3. “The Promptware Kill Chain,” arXiv:2601.09625. Harvard Berkman Klein Center / Lawfare.
  4. Cohen et al., “Here Comes The AI Worm,” arXiv:2403.02817.
  5. “LlamaFirewall,” arXiv:2505.03574. Meta AI.
  6. Kim et al., “Prompt Flow Integrity,” arXiv:2503.15547. Seoul National University.
  7. “SEAgent: Mandatory Access Control for LLM Agents,” arXiv:2601.11893.
  8. “Exploiting Web Search Tools of AI Agents for Data Exfiltration,” arXiv:2510.09093.

Industry Reports & Frameworks

  1. OWASP Top 10 for Agentic Applications, December 2025. OWASP
  2. MITRE ATLAS + Zenity Labs agent-specific techniques, October 2025. MITRE
  3. CSA MAESTRO Threat Modeling Framework. CSA
  4. NIST AI Agent Standards Initiative, February 2026. NIST
  5. AWS Agentic AI Security Scoping Matrix. AWS

Real-World Incidents

  1. Anthropic, “Disrupting the first AI-orchestrated cyber espionage campaign” (GTG-1002). Anthropic
  2. CVE-2025-68664 (LangGrinch), CVSS 9.3. NVD
  3. Invariant Labs, MCP Tool Poisoning Attacks. Invariant Labs
  4. Unit 42, Agent Session Smuggling in A2A Systems. Unit 42
  5. Unit 42, Indirect Prompt Injection Poisons AI Long-term Memory. Unit 42