CVE-2026-21520: Why Patching a Prompt Injection Doesn't Fix the Architecture

Microsoft patched CVE-2026-21520 on January 15, 2026. Three months later, the headline still ran: "The data exfiltrated anyway."

That phrase deserves more attention than it's gotten. This isn't a story about a botched patch or a slow response. It's a story about a category of control — AI safety filters — being asked to do something they structurally cannot do.

A governance-plane kill switch is a pre-execution enforcement control that intercepts agent actions before they execute, independent of what the model was instructed to do. It is categorically different from a safety filter, which operates at the model's output layer and can be overridden by a sufficiently crafted injection. Microsoft's patch fixed the specific attack path. It didn't change the class of problem.

Here's what the architecture actually looked like — and why organizations running production agents need to understand the distinction before they hit the next variant.

What Did CVE-2026-21520 Actually Expose?

Capsule Security discovered CVE-2026-21520 on November 24, 2025, coordinated disclosure with Microsoft, and the company deployed a patch on January 15, 2026. The vulnerability affects Copilot Studio agents triggered by SharePoint form submissions — a configuration used broadly across enterprise deployments for lead intake, support forms, and internal request workflows.

The attack, which Capsule named "ShareLeak," exploits the gap between form input and the agent's context window. An attacker fills a public-facing comment field with a crafted payload: a fake system-role message that Copilot Studio concatenates directly into the agent's prompt, with no sanitization between the untrusted form input and the model.

Once the injected payload is in context, the agent receives what looks like a legitimate operator instruction: query the connected SharePoint Lists for customer data — PII, lead records, CRM context — and forward it via Outlook to a specified address. In Capsule's proof-of-concept, the agent complied. Microsoft's existing safety filters fired on some outputs. The data exfiltrated anyway.

At the same time, Capsule found an essentially identical vulnerability in Salesforce Agentforce, which they named "PipeLeak." The attack path targets Agentforce Web-to-Lead forms. Injected instructions direct the agent to send CRM data via email before any review step engages. Salesforce has since stated that Human-in-the-Loop confirmation is now enabled by default for email-based agentic actions in standard configurations — though Capsule retested after that response and reported the attack still works on Custom Topics (now called Sub-Agents), which represent the majority of enterprise Agentforce deployments.

Both vulnerabilities are patched. Both are instructive about what the patch did and didn't fix.

Why Did the Data Exfiltrate Even After the Safety Filters Fired?

This is the question that matters most for any team running agents connected to data stores and external communication tools.

Safety filters in AI systems — Microsoft's Content Safety, Salesforce's built-in policy layer, the moderation layers built into major LLMs — operate at the model's output layer. Their job is to detect and block harmful, offensive, or policy-violating content the model generates. They were designed to catch the model producing bad output.

Prompt injection is a different threat model entirely. An injected instruction doesn't cause the model to produce bad output. It causes the model to follow attacker instructions as if they were legitimate operator instructions. From the model's perspective, it received a valid task in its context window and executed it faithfully.

A safety filter asking "did the model produce harmful content?" returns false. Forwarding SharePoint customer records to an email address isn't harmful content — it's a routine business action. The filter has no basis to block it, because the filter doesn't know that the email address isn't supposed to receive that data in the first place.

Munich Re's annual cyber risk report, published March 2026, listed prompt injection among the major attack vectors specifically targeting AI systems — because of this gap. Safety filters evaluate outputs against a policy. They don't evaluate actions against a definition of what the agent is authorized to do. Those require different enforcement mechanisms.

The Salesforce disclosure made this explicit. Their spokesperson noted that HITL confirmation "is enabled by default for email-based agentic actions." But default settings and model-level confirmation prompts are both soft controls — they can be bypassed by injected instructions that appear to have operator authority. The architectural gap isn't about defaults. It's about where the enforcement lives.

What Kill Switches Actually Are — and What They're Not

A kill switch in the context of agentic governance is a pre-execution enforcement control. It doesn't observe the model's output and decide whether to block it. It observes the action the agent is about to take — send an email, write to a database, call an external API — and applies a rule before that action executes.

The distinction matters for one reason: kill switches don't depend on the model's judgment at all. They operate in the governance plane — the control layer that sits above your agents rather than inside them. The model doesn't get a vote. The policy runs before the action, regardless of what the model was told.

A properly configured kill-switch policy for the ShareLeak scenario looks like this: any outbound email from a Copilot Studio agent where the recipient domain isn't on an approved allowlist is blocked before it sends. It doesn't matter what the model was injected with. It doesn't matter what instructions the attacker embedded in the form. The action fails at the governance layer.

This is the control that ShareLeak and PipeLeak would not have defeated. Capsule's attacks succeeded because the agents had simultaneous access to untrusted inputs (public forms) and sensitive, outbound-capable tools (SharePoint + Outlook, Salesforce CRM + Email), with no independent enforcement layer between them. Patching the specific injection syntax closes one door. It doesn't change that the door is in the wrong place.

The useful mental model: patches fix vulnerabilities. Kill switches enforce architecture.

A few things kill switches are not: they're not guardrails (model-side output filters), they're not rate limits (though those serve related functions), and they're not HITL prompts (soft requests for confirmation that an injected model can fabricate compliance with). Kill switches are hard stops at the action layer that execute independent of model state.

How Waxell Handles This

Waxell enforces kill-switch policies at the governance plane — before agent actions execute, outside the model's control. Policies define what actions an agent is permitted to take in your production environment: which recipient domains it can contact, which data sources it can read, which tools it can invoke. An agent that receives an injected instruction to exfiltrate data via email hits the kill-switch check before the send executes. If the recipient isn't in the approved domain set, the action is blocked — the model's compliance with the injected instruction is irrelevant.

Every blocked action is logged in Waxell's execution audit trail with full context: which policy fired, what the attempted action was, and the complete execution record. If an attacker is probing your Copilot Studio or Agentforce deployment with injection variants, that activity surfaces in the log before your customers' data leaves your environment.

Waxell's signal-domain boundary addresses the underlying configuration problem directly. The controlled interface between what data flows into an agent and what actions that agent is authorized to take prevents agents from having unrestricted simultaneous access to untrusted inputs and sensitive outbound-capable tools. That's the exact configuration both ShareLeak and PipeLeak required to succeed.

FAQ

What is CVE-2026-21520?

CVE-2026-21520 is an indirect prompt injection vulnerability in Microsoft Copilot Studio affecting agents triggered by SharePoint form submissions. Discovered by Capsule Security in November 2025 and patched by Microsoft on January 15, 2026, the vulnerability — named "ShareLeak" — allowed attackers to inject malicious instructions through untrusted form fields, directing Copilot Studio agents to query SharePoint data and exfiltrate it via Outlook to external addresses.

What is the difference between a safety filter and an AI agent kill switch?

Safety filters operate at the model's output layer and block harmful or policy-violating content the model generates. Kill switches are governance-plane controls that enforce action-level boundaries before agent actions execute, independent of the model. Safety filters can be bypassed by prompt injection because an injected instruction causes the model to treat attacker commands as legitimate — the model isn't producing bad output, it's following attacker-supplied instructions. Kill switches enforce rules on actions regardless of what the model was told, so they can't be bypassed this way.

What is PipeLeak, and how does it relate to ShareLeak?

PipeLeak is Capsule Security's name for a parallel indirect prompt injection vulnerability in Salesforce Agentforce, discovered alongside ShareLeak in early 2026. Where ShareLeak targeted Copilot Studio agents triggered by SharePoint forms, PipeLeak targeted Agentforce Web-to-Lead forms. Both exploited the same architectural pattern: untrusted form inputs concatenated into agent context windows, with agents holding simultaneous access to that untrusted input and outbound communication tools.

Does patching CVE-2026-21520 protect against future prompt injection attacks on Copilot Studio?

The patch closes the specific ShareLeak attack path. It doesn't address the underlying architecture: any agent with simultaneous access to untrusted inputs and sensitive data stores or outbound communication tools is structurally exposed to injection variants. Defense requires governance-plane kill-switch policies enforcing action-level boundaries — not just patching the specific syntax each new attack uses.

What should organizations running Copilot Studio or Agentforce do now?

Capsule Security recommends treating all form inputs as untrusted data, restricting Email Tool access when processing untrusted inputs, applying prompt boundary techniques, requiring manual review before emails containing CRM or SharePoint data are sent, and logging all agent actions involving data access or external communication. At the architecture level: audit whether your agents have simultaneous access to untrusted input surfaces and external send capabilities. If they do and there's no independent enforcement layer between them, the ShareLeak configuration risk persists regardless of whether the CVE is patched.

What is a governance plane in AI agent systems?

A governance plane is a control layer that sits above an agent system — separate from the models, tools, and orchestration logic — and enforces operational policies on agent behavior at runtime. It handles kill switches, action-level enforcement, execution logging, and human-in-the-loop escalation without relying on the model to police its own behavior. In agentic systems, the governance plane is what allows you to enforce architecture rather than just configure model defaults.

If your agents are connected to SharePoint, CRM systems, or email tools, the ShareLeak configuration applies to your deployment whether CVE-2026-21520 is patched or not. Waxell enforces kill-switch policies before actions execute — so the next variant hits a governance boundary, not your customers' data. Get early access.

Sources: