During a live deployment session, Vigil — our own behavioral endpoint security platform — flagged Claude Code, the AI coding agent we were using to build it. The incident lasted long enough to stage through multiple observation windows and appear in our own Nexus dashboard before we noticed and marked it as a false positive. We have been thinking about that false positive ever since.

Two AI Systems. One Machine.

It was a normal evening of development. We were deep in a deployment cycle — pushing new capabilities into Vigil, infrastructure going up, features being merged, the browser open, terminals running, everything in parallel. The kind of session where velocity is high and attention is split.

What we had not fully appreciated in that moment was that we had two AI systems running simultaneously on the same machine.

One was Claude Code — Anthropic's coding agent, acting as a development partner. It was writing code, running builds, navigating the browser to verify deployments, checking live dashboards, clicking through UI flows. Doing exactly what it was configured to do.

The other was Vigil. Running quietly in the background. Watching everything.

Vigil noticed.

What the Platform Observed

Vigil observed a process opening network connections at a rate inconsistent with a human user. It saw browser sessions being navigated in sequences too fast and too precise to be hand-driven. It saw subprocess spawning, shell command execution, file writes across multiple paths, and output checks — all in a rhythm that looked less like a person working and more like an agent executing a plan.

Which, of course, it was.

The platform flagged it. Not with high confidence — it filed the activity as a trajectory event, staged across multiple observation windows — but it flagged it. The incident appeared in Nexus. Severity climbed as the session continued.

We marked it as a false positive. But we did not move on quickly.

Why the Flag Was Correct

The instinct when your own security product flags your own tooling is embarrassment — or frustration. It shouldn't have done that.

But it absolutely should have done that.

Claude Code's behavior on that machine was genuinely anomalous by any reasonable behavioral standard. It was operating at a speed and with a precision that sits outside normal human usage patterns. It was making network calls, spawning processes, and writing files in a cadence that no human operator would produce. Vigil cannot know that a process has benign intent. It can only observe behavior. And the behavior was, objectively, suspicious.

This is the core tension that AI development is walking into with increasing speed: AI agents doing legitimate work behave the way attackers look like they behave. Fast. Automated. Networked. Persistent. Goal-directed. The behavioral signature of an authorised AI coding agent and the behavioral signature of a compromised agentic process are, at the observation layer, nearly identical.

The Question Security Teams Have Not Answered

If your security tooling cannot detect agentic AI behavior, you have a gap. Because the behavioral patterns we observed that evening — rapid process spawning, browser automation, sustained network activity, filesystem writes across many paths — are precisely what a sophisticated attacker would attempt to produce if they had compromised an agentic process or injected malicious instructions into one.

The threat is not hypothetical. Prompt injection attacks against AI coding agents are a documented and active class of attack. A malicious comment in a repository. A crafted dependency. A tampered file. Any of these can redirect an agent's behavior in ways the user never intended — and that redirected behavior would produce exactly the kind of signal we saw.

“The question is not whether your AI agents could be hijacked. The question is whether you would notice.”

Traditional endpoint detection tools were designed to catch known malware signatures and the behavioral artefacts of human-operated intrusions. An AI agent that has been redirected via prompt injection leaves none of those artefacts. It operates with legitimate credentials. It executes through signed processes. It does not look like an attacker — it looks like itself, doing its job. The only meaningful detection surface is behavioral: does the agent's activity deviate from what it would be expected to do in this context?

How We Handled the False Positive

We did not tune Vigil to ignore the patterns it detected. We did not add a blanket exception for AI coding agents.

We used the false positive workflow we had built into the product — documented, scoped, and audited. The suppression is tied to the specific context of this tooling running in this environment. Novel deviations from those established patterns will still surface. Future sessions running the same tooling in the same patterns will not flood the queue. The sensitivity is intact; the noise is reduced.

This is how behavioral security is supposed to work. You do not eliminate sensitivity in response to false positives. You tune the signal.

The incident also moved one item up the roadmap: a Trusted Agent Allowlist capability — a way for operators to designate known-good agentic processes and configure how the platform contextualises their behavioral signatures. Not exemption. Not blindness. Contextualisation, with full audit trail.

The Broader Signal

The next generation of security threats will not look like the last generation. They will not be static malware. They will not be human-speed intrusions with recognisable signatures. They will be fast, adaptive, and automated — and they will often wear the face of legitimate tooling.

The real challenge for enterprise security teams now is building detection infrastructure that can distinguish between an AI agent working for the organisation and one that has been turned against it. Without losing the sensitivity needed to catch real threats. Without drowning operators in noise from their own authorised tools. Without waiting for a signature to be written after the breach has happened.

That is the problem Vigil is built to address. And the irony of our own detector catching our own development tooling is not lost on us. We think it is productive irony. It means the detector is working.

TL;DR

Authorized AI coding agents are behaviorally indistinguishable from compromised agentic processes — fast, automated, networked, and goal-directed. Your EDR was not built for this signal.
Prompt injection attacks against AI agents are documented and active. If your security tooling cannot detect anomalous agentic behavior, you will not notice when an agent is redirected.
The right response to agentic false positives is scoped contextualisation, not reduced sensitivity. Vigil's false positive workflow preserves detection capability while reducing noise from known-good agents.

Incident context: This article describes an internal development session in which Claude Code — Anthropic's coding agent — was operating as authorised tooling on a Helixar team machine. The detection was a calibrated false positive; Claude Code was behaving exactly as intended and was not subject to any compromise or redirect. The incident is disclosed here because it is genuinely illustrative of the security problem Helixar exists to solve, and because we believe transparency about how our own products behave in real conditions is valuable.

Note on scope: No proprietary Vigil detection methodology, model architecture, training approach, or implementation detail is disclosed in this article. The behavioral observations described reflect what any security analyst would see in process and network telemetry — not the internal workings of Vigil's detection layer.

Brand names: Claude Code is a product of Anthropic, PBC. Helixar is an independent company and is not affiliated with, endorsed by, or in partnership with Anthropic or any of its products. All third-party product names are used for identification purposes only.

Questions or corrections: press@helixar.ai

When the Defender Caught the Builder