All articles
Threat IntelligenceMarch 2026·10 min read

McKinsey's Lilli Breach: When the AI Itself Is the Attack Surface

How an autonomous agent compromised McKinsey's Lilli platform — and what it illustrates about the detection gap facing every enterprise deploying AI at scale.

McKinsey Lilli breach visualisation: API endpoint map, SQL injection flow, database compromise, and silent system prompt rewrite — Helixar threat intelligence

Illustration generated using Google Gemini. AI-generated image — no copyright infringement intended.

Incident at a Glance

46.5M

chat messages exposed

728K

internal files accessible

2 hrs

to full DB access

0

credentials required

In March 2026, an autonomous offensive agent walked into McKinsey's internal AI platform, Lilli, through an unauthenticated API endpoint and — within two hours — had read and write access to the entire production database. No credentials. No insider access. Just a domain name, an exposed API document, and an agentic system that doesn't follow checklists. The vulnerability that enabled it was SQL injection: a class of attack so old it predates Google, LinkedIn, and most of the engineers who built Lilli. The attack surface it exploited — an enterprise AI platform's prompt layer — is two years old and almost entirely unmonitored.

The Entry Point: 22 Doors Left Unlocked

McKinsey's Lilli is the firm's internal generative AI platform — a proprietary RAG system trained on decades of proprietary research, methodology documents, and client frameworks, deployed to over 30,000 consultants globally. It is the kind of system that gives consultants leverage: the equivalent of every senior partner's institutional knowledge made instantly queryable. That also makes it an extraordinarily high-value target.

The agent began with surface mapping: a structured enumeration of Lilli's publicly accessible API documentation. That documentation — over 200 endpoints, fully described — was publicly exposed. Most endpoints required authentication. Twenty-two did not. One of those unauthenticated endpoints accepted write operations: it logged user search queries to the database.

Field values in the query payload were safely parameterised. Field names were not. The agent identified this structural gap — the kind that standard automated scanners routinely miss because they test values, not structure — and began iterating. Each malformed field name produced a database error. Each error message revealed slightly more about the query shape. After a run of blind iterations, live production data started flowing back.

“SQL injection via unsanitised JSON key names: a structural path that standard automated scanners routinely miss because they test values, not structure. It had been live in production for over two years.”

The Escalation: When Read Access Is the Beginning, Not the End

SQL injection against a user-facing AI platform is bad. Write access to that platform's prompt database is a different category of bad — one that most enterprise security architectures are not built to recognise.

Lilli's system prompts — the instructions governing how the AI behaves, what it refuses, what it cites, how it frames answers — were stored in the same database the agent had just compromised. The same injection vector that exposed tens of millions of chat messages could silently rewrite those instructions. No deployment pipeline. No code review. No log entry. No alert.

The agent also chained the SQL injection with an IDOR (Insecure Direct Object Reference) vulnerability to pivot across individual user session histories. Cross-user data access through vulnerability chaining, requiring no additional exploits. One structural weakness, two escalation paths, and the AI platform itself had become the attack surface.

Critical Insight

The prompt layer is a high-value target that almost nobody is treating as one. It controls the output that employees trust, that clients receive, and that decisions are built on — yet it rarely has access controls, version history, or integrity monitoring. In the Lilli incident, a write to the prompt store would have produced no observable system anomaly. The AI would simply have begun behaving differently.

Why This Is Everyone's Problem

McKinsey is not a soft target. The firm has world-class security teams, significant tooling investment, and internal scanners running continuously. The SQL injection vulnerability in Lilli had been live in production for over two years. Their own tooling did not find it.

The reason is not incompetence. It is architecture. Enterprise AI platforms are being built fast, bolted onto existing infrastructure, and secured with frameworks designed for web applications and databases. Those frameworks cover two attack layers: the network perimeter and the application layer. An AI platform has a third layer those frameworks do not cover: the prompt layer.

Every enterprise running an internal AI platform — a custom LLM wrapper, a RAG system over proprietary documents, an agentic workflow — is operating the same attack surface. The question is not whether the vulnerability class exists in your environment. It is whether you would detect it, and how fast.

The Three Attack Layers

API Layer

Unauthenticated endpoints, injection via API parameters, IDOR chaining

Prompt Layer

System prompt rewrite, guardrail removal, silent persistence with no log trail

Kill-Chain Escalation

Agentic traversal chaining API access, database writes, and prompt manipulation into full platform compromise

How Helixar Would Theoretically Have Caught This

Helixar's detection architecture operates across three layers: the endpoint agent, the API session layer, and a cross-layer correlation engine. The following describes how detection would theoretically have unfolded in a scenario where Helixar was deployed in front of a platform like Lilli.

These scenarios are theoretical illustrations based on Helixar's intended design. As an early-stage platform in active pilot testing, actual detection behaviour, response timing, and alert thresholds will vary by deployment.

Attack PhaseSignal Helixar Would ObserveTheoretical Response

Surface Mapping

T+0 to T+5 min

Unusual volume of structured requests to unauthenticated endpoints; automated enumeration pattern distinguishable from normal user behaviour by request cadence and endpoint spread.API session layer flags anomalous request pattern. Observe mode: alert logged. Gate mode: session rate-limited or challenged.

Injection Probing

T+10 to T+30 min

Iterative requests with systematically mutating field names; error responses leaking schema information — a statistically anomalous probe pattern across session history.Session risk classification elevates as the iterative probe pattern becomes statistically significant. Security team notification generated. The 30-minute probe window is the realistic intervention window.

Data Exfiltration

T+30 to T+60 min

Bulk data responses through an unauthenticated endpoint; volume and data structure inconsistent with legitimate user queries at that access tier.High-confidence alert. In an actively gated deployment, session could theoretically be terminated before bulk extraction completes. Incident record created with full session replay.

Prompt Write Attempt

T+60 to T+120

Write operation targeting configuration or prompt storage — a class of API activity with no normal pattern in legitimate user sessions.Highest-severity alert. Cross-layer correlation produces unified incident view rather than isolated alerts. Prompt write treated as a critical integrity event.

Behavioural Detection vs. Signature Matching: Why the Gap Exists

Traditional security tooling operates on known signatures: identify a bad pattern, block it, update the list. This model works when the attack universe is stable and enumerable. The CodeWall agent found a vulnerability that standard automated scanners had not flagged in two years — not because those tools are deficient, but because the injection lived in a structural path that checklist-based tools do not probe.

An autonomous attacker that maps, probes, chains, and escalates continuously and at machine speed requires a detection approach that does not depend on recognising the specific technique in advance. The attack stages in the Lilli incident — surface mapping, iterative probing, data extraction, configuration write — follow a consistent progression across a wide class of agentic threats, even when the specific technique varies. Detecting that progression, rather than any individual step, is the design goal.

This is analogous to the difference between a face recognition system trained on known criminals and a forensic profiler who can identify suspicious behaviour patterns in someone they have never seen before. One is limited to its training data; the other reasons from first principles. Behavioural detection is the forensic profiler.

Mitigation: What Should Have Been in Place

Regardless of detection tooling, this attack exploited conditions that were preventable. The following steps represent what any enterprise running an internal AI platform should have in place:

  • API authentication coverage: Audit every endpoint for authentication. Unauthenticated write endpoints are an immediate critical finding. There is no legitimate reason for a write endpoint on a user-facing AI platform to be unauthenticated.
  • Structural parameterisation: Enforce parameterised queries throughout — not just for field values but for structural elements including field names and dynamic identifiers. Code review should explicitly cover structural injection paths.
  • Prompt layer access control: Treat system prompts and model configurations as high-value production assets. Apply access controls, write auditing, and version history. Prompt storage should not be co-located with general application data accessible through user-facing APIs.
  • Session-level API monitoring: Instrument the API layer to detect session-level patterns — not just individual malformed requests. Iterative enumeration and error-response fishing are invisible at the per-request level but anomalous as session patterns.
  • IDOR prevention: Enforce object-level authorisation on every data retrieval endpoint. Cross-user data access via IDOR is the standard escalation path after an initial injection foothold — prevent it structurally.
  • Cross-layer correlation: An anomalous API session followed by a configuration write is a single attack, not two alerts. Ensure your detection architecture correlates signals across API, application, and database layers into a unified incident view.
  • AI platform incident playbook: Define and rehearse a specific response playbook for AI platform compromise. Prompt rewrite is silent with no obvious system anomaly. “The AI is behaving differently” needs to be a pre-planned response, not improvised under pressure.

Questions Enterprises Should Be Asking Now

If your organisation is running an internal AI platform — any LLM deployment with a RAG layer, a system prompt, or API access to sensitive data — these questions require answers:

  • Do you have a complete inventory of which API endpoints require authentication and which do not?
  • Are your system prompts and model configurations access-controlled, versioned, and write-audited?
  • Can your detection tooling identify session-level behavioural anomalies, or only individual request patterns?
  • Do you have an incident response playbook specifically for AI platform compromise — including prompt integrity validation?
  • If an attacker silently rewrote your AI's system prompt today, how long before you'd know?

The threat is not theoretical. It happened to a firm with the resources and sophistication to do things properly. SQL injection is 25 years old. The attack surface it exploited — an enterprise AI platform's prompt layer — is two years old and almost entirely unmonitored.

References

  1. McKinsey & Company. (2024). Meet Lilli, our generative AI tool. McKinsey & Company. mckinsey.com
  2. OWASP. (2023). OWASP Top 10: A03 Injection. Open Web Application Security Project. owasp.org
  3. OWASP. (2023). OWASP API Security Top 10: API3:2023 Broken Object Property Level Authorization. Open Web Application Security Project. owasp.org
  4. Willison, S. (2023). Prompt injection attacks against GPT-4. Simon Willison's Weblog. simonwillison.net
  5. NIST. (2023). AI Risk Management Framework (AI RMF 1.0). National Institute of Standards and Technology. airc.nist.gov
  6. Anthropic. (2025). Responsible Scaling Policy and Security Research. Anthropic. anthropic.com
  7. CISA. (2024). Guidelines for Secure AI System Development. Cybersecurity and Infrastructure Security Agency. cisa.gov
  8. CrowdStrike. (2026). 2026 CrowdStrike Global Threat Report. CrowdStrike Inc. crowdstrike.com

SQL injection is 25 years old. The attack surface it exploits is two.

See your real detection posture before an agent finds your unauthenticated endpoints first.

Book a Walkthrough