Anthropic Just Leaked Claude Code's Source. Here's What That Means for Every AI Agent You Run.

Anthropic's Claude Code npm packaging error exposed 512,000 lines of agent source code. Here's what the leak reveals about AI agent security governance — and why runtime controls matter more than ever.

Waxell blog cover: Claude Code Source Leak and AI Agent Security Governance

On March 31, 2026, Anthropic published Claude Code version 2.1.88 to the npm registry with a 59.8 MB JavaScript source map file accidentally included. The file pointed to a zip archive on Anthropic's own cloud storage containing the full, unobfuscated source — 512,000 lines of TypeScript across 1,900 files. Within hours, the entire codebase was mirrored across GitHub and analyzed by thousands of developers. A clean-room reimplementation crossed 100,000 stars within a single day — reportedly the fastest-growing repository in GitHub history.

This was Anthropic's second security lapse in five days. On March 26, a CMS misconfiguration had exposed roughly 3,000 internal files, including a draft blog post detailing an unreleased model called Claude Mythos that the company says poses "unprecedented cybersecurity risks."

Anthropic characterized both incidents as human error. But the downstream question isn't about Anthropic's release engineering. It's about yours.

AI agent security governance is the set of runtime controls that protect AI agent deployments from exploitation — including tool access restrictions, input validation, output filtering, and behavioral enforcement — regardless of whether the agent's internal architecture is known to an attacker. It is distinct from application security (which protects the codebase) and from agentic governance broadly (which covers cost, compliance, and operational controls in addition to security). Agent security governance assumes that security-through-obscurity will eventually fail, and enforces controls at the infrastructure layer rather than relying on hidden implementation details.

What was actually in the Claude Code source?

The leaked code exposed Claude Code's full agentic architecture: its tools system (file read, bash execution, web search), its query engine for LLM API orchestration, its multi-agent coordination logic, and a bidirectional IDE extension-to-CLI communication layer. Forty-four feature flags for unreleased capabilities were visible, along with internal model codenames mapping to specific Claude variants.

Among the most discussed revelations: a feature called KAIROS — an unreleased headless agent mode where Claude Code suppresses its status bar, disables planning mode, and silently backgrounds long-running commands for use when a user isn't watching the terminal. A separate feature called autoDream spawns a background subagent that consolidates memories across session transcripts, writing summaries to MEMORY.md that get injected into future system prompts. Another feature, described in the code as "Undercover Mode," contained instructions for stripping Anthropic traceability when contributing to external repositories.

The root cause was a missing .npmignore entry that failed to exclude *.map debugging files from the published package. Anthropic stated: "No sensitive customer data or credentials were involved or exposed. This was a release packaging issue caused by human error, not a security breach."

That's true in the narrow sense — no API keys or user data leaked. But 512,000 lines of agent orchestration logic is not nothing. It's the complete blueprint for how one of the most widely deployed AI coding agents makes decisions, accesses tools, and manages state.

Why does exposed agent source code matter for security?

The standard response to a source code leak is "well, open-source projects publish their code and they're fine." That reasoning misses what's different about agentic systems.

A traditional application processes requests. An AI agent takes actions. When an attacker understands exactly how an agent's orchestration layer works — which tools it can call, how it sequences decisions, what its context pipeline looks like, how it handles trust prompts — they can craft inputs designed to exploit those specific mechanisms.

With Claude Code's exact orchestration logic now public — including how it sequences decisions, manages its context pipeline, and handles trust prompts — security analysts noted that attackers could craft inputs specifically designed to exploit those mechanisms and persist access across developer sessions. This isn't theoretical. Claude Code runs directly inside developer environments with access to SSH keys, API tokens, credentials stored in environment variables, and the ability to modify source code, make commits, and push to repositories.

The timing made it worse. On March 30 — just hours before the Claude Code source map went live — a supply chain attack hit the axios npm package, a dependency used by Claude Code and millions of other applications. Malicious versions 1.14.1 and 0.30.4 were published within a 39-minute window, containing a cross-platform Remote Access Trojan. Developers who installed or updated npm packages in that window may have pulled the trojanized dependency. The convergence of an exposed agent architecture and a concurrent supply chain attack on a shared dependency represents a compounding risk that's difficult to model in isolation.

What does this mean for organizations running AI agents?

The Claude Code leak is the highest-profile example of a pattern that will repeat: the internal architecture of AI agents becoming public knowledge, whether through packaging errors, reverse engineering, or simply because agentic frameworks are increasingly open-source.

The governance question isn't "how do we prevent our agent's code from leaking?" — it's "what happens to our security posture when an attacker knows exactly how our agents work?"

If your security model depends on attackers not understanding your agent's orchestration logic, you don't have a security model. You have a delay.

Three specific risks apply to any organization deploying AI agents — not just Claude Code users:

Tool access exploitation. When an attacker knows which tools an agent can invoke and how it decides to invoke them, they can craft inputs that steer the agent toward specific tool calls. If your agent has database write access and the attacker understands the decision path that leads to a write operation, the attack surface is well-defined. Runtime tool access policies that restrict which tools an agent can call in which contexts — evaluated before execution, not after — are the mitigation.

Context poisoning. Exposed orchestration logic reveals exactly how an agent constructs its context window: what gets included, in what order, and how the agent weighs different inputs. Prompt injection attacks become significantly more effective when the attacker can see the precise pipeline their malicious content will flow through. Controlled data interfaces that validate and filter what enters the agent's context are the defense — applied at the infrastructure layer, independent of the agent's own filtering.

Persistence and escalation. The KAIROS feature in Claude Code — an unreleased headless agent mode that suppresses the UI and silently backgrounds long-running commands — illustrates a broader trend: agents that operate autonomously without a user present. A related feature called autoDream separately consolidates memories across session transcripts, enabling agents to accumulate context over time. An attacker who understands these persistence mechanisms can craft payloads that survive session boundaries. Without runtime behavioral monitoring and session-level enforcement, a single compromised interaction can propagate across an agent's entire operating history.

How is this different from the LiteLLM supply chain incident?

Earlier this year, a supply chain attack compromised the LiteLLM proxy — a widely used LLM routing layer. That incident (which we covered in a previous post) was about a malicious actor injecting code into a dependency that AI agents rely on.

The Claude Code leak is different in kind. This isn't an external attacker compromising a supply chain. This is the complete internal architecture of an AI agent becoming publicly available — giving every attacker a detailed map of how the agent reasons, what it can access, and how it handles trust decisions.

The LiteLLM incident said: "your dependencies can be compromised." The Claude Code leak says: "your agent's decision-making logic can be reverse-engineered, and now it has been." Both are supply chain risks, but the governance response to each is different. For compromised dependencies, you need integrity verification and runtime monitoring. For exposed architecture, you need security controls that work even when the attacker has the blueprints.

That's the definition of defense in depth. And it's the core argument for runtime governance that operates at the infrastructure layer — independent of the agent's own code.

How Waxell handles this

How Waxell handles this: Waxell's runtime governance policies enforce security controls at the infrastructure layer — above the agent, independent of its source code, evaluated before each tool call and output. When an attacker understands an agent's orchestration logic, the governance layer is the enforcement boundary they can't see into or bypass by manipulating agent behavior. Tool access policies restrict which tools an agent can invoke per session and per context. Content policies intercept inputs and outputs, blocking payloads designed to exploit known context pipeline structures. Operational policies detect behavioral anomalies — like an agent attempting actions outside its normal pattern — and trigger circuit breakers or human escalation. These controls work whether the agent's source code is public, private, or somewhere in between, because they don't depend on the agent's own reasoning. They operate above it. Waxell provides the security guarantees that agent-level controls alone cannot.

Frequently Asked Questions

What happened in the Anthropic Claude Code source code leak?
On March 31, 2026, Anthropic published Claude Code version 2.1.88 to the npm registry with a source map file accidentally included. The 59.8 MB file contained a link to Anthropic's cloud storage with the complete, unobfuscated source code — 512,000 lines of TypeScript across 1,900 files. The leak exposed Claude Code's full agentic architecture, including its tools system, multi-agent orchestration logic, 44 unreleased feature flags, and internal model codenames. Anthropic attributed the leak to a missing .npmignore entry and stated no customer data was exposed.

Why is the Claude Code source code leak a security concern for AI agents?
Exposed agent source code gives attackers a detailed blueprint of how the agent makes decisions, accesses tools, and handles trust. This enables more targeted prompt injection attacks, tool access exploitation, and context poisoning — because the attacker knows the exact pipeline their malicious inputs will traverse. It also reveals persistence mechanisms that attackers can exploit to maintain access across sessions. The concern extends beyond Claude Code to any agent whose architecture becomes known through leaks, reverse engineering, or open-source publication.

What is AI agent security governance?
AI agent security governance is the set of runtime controls that protect AI agent deployments from exploitation — including tool access restrictions, input validation, output filtering, and behavioral enforcement — applied at the infrastructure layer rather than the agent layer. It assumes that an attacker may understand the agent's internal architecture and enforces controls that work regardless. This is distinct from application security (protecting the codebase) and from broader agentic governance (which also covers cost, compliance, and operational controls).

How can organizations protect AI agents when the source code is known?
The defense model shifts from security-through-obscurity to defense-in-depth: runtime policy enforcement that operates independently of the agent's code. This includes tool access policies (restricting which tools an agent can invoke in each context), content policies (filtering inputs and outputs at the infrastructure layer), and operational policies (detecting behavioral anomalies and triggering circuit breakers). These controls must be evaluated before each action executes — not after — and must be independent of the agent's own reasoning, because an attacker who knows the agent's code can manipulate its reasoning.

Was the Claude Code leak connected to the Anthropic Mythos model leak?
The two incidents were separate with independent root causes. The Mythos leak (March 26) was caused by a CMS misconfiguration that exposed roughly 3,000 internal files including a draft blog post about the unreleased model. The Claude Code leak (March 31) was caused by a missing .npmignore entry in the npm packaging process. However, two major security lapses within five days raised broader questions about operational security practices at one of the leading AI companies — and highlighted that even frontier AI labs are not immune to the basic infrastructure errors that runtime governance is designed to catch.

What is the relationship between the Claude Code leak and the axios supply chain attack?
On March 30, 2026 — hours before the Claude Code source map went live — malicious versions of the axios npm package (versions 1.14.1 and 0.30.4) were published containing a Remote Access Trojan. Axios is a dependency used by Claude Code and millions of other applications. The near-simultaneous timing means that developers installing or updating npm packages in that window may have pulled the compromised dependency. The combination of exposed agent architecture and a concurrent supply chain attack on a shared dependency represents a compounding risk that underscores the need for runtime integrity monitoring and governance controls.