Waxell

Product

Compare

START FREE

Waxell

Logan Kelly

Jun 23, 2026

DeepMind's AI Control Roadmap Treats Agents as Insider Threats — And Sets the Bar for Every Team

DeepMind's AI Control Roadmap treats deployed agents as insider threats. Here's the defense-in-depth framework it established — and how Waxell Runtime enforces it without rebuilds.

Waxell blog cover: DeepMind AI control roadmap insider threat governance

On June 18, 2026, Google DeepMind published its AI Control Roadmap, a defense-in-depth framework for managing AI agents that might be imperfectly aligned. Written by Rohin Shah and Four Flynn, the document makes an uncomfortable assumption explicit: alignment training alone cannot guarantee that a deployed agent will stay within bounds. So DeepMind built a system that assumes it won't.

The framework applies insider threat controls — the same layered security enterprises use for privileged employees with dangerous access — directly to AI agents. It defines four detection levels (D1 through D4), ranging from monitoring an agent's chain-of-thought reasoning at the baseline to analyzing internal model representations when reasoning becomes opaque at the most advanced tier. It defines three response tiers (R1 through R3), from asynchronous review of transcripts after the fact, to real-time blocking of actions before they execute. The required tier scales with the agent's capability: the more capable and potentially evasive the model, the further up the stack the controls must reach.

DeepMind developed this framework from real data. Their team analyzed over one million internal coding agent tasks using a prototype asynchronous monitoring system. The finding that should reframe how most teams think about agent safety: the majority of flagged events did not stem from adversarial intent. They resulted from agent misinterpretation or overeagedness to achieve a user's goal. A live monitor built for the Gemini Spark agent, for example, was catching unintentional data deletion — not a rogue model trying to cause harm, but an overager one completing a task in ways it shouldn't.

DeepMind also published a companion document for policymakers: "Three Layers of Agent Security", covering individual agent security, multi-agent systems, and ecosystem resilience. Publishing both documents simultaneously signals that this framework is intended as an industry reference, not just internal practice.

Why Alignment Training Alone Doesn't Hold at Production Scale

The working assumption in most enterprise deployments is that choosing a well-aligned model from a reputable frontier lab handles the safety problem. This is understandable — it worked well enough when agents were doing simple, reversible tasks with human review at every step. It doesn't hold when agents are executing autonomously at scale.

DeepMind's document challenges this directly, and the argument isn't that frontier models are unsafe. It's that no training process can provide guarantees about agent behavior across every input, environment, and capability level an agent will encounter in production. Alignment gives you a probability distribution over behaviors. An agent that behaves correctly 99.9% of the time will still produce off-script actions at production volume — a million tasks at 99.9% means a thousand incidents.

The insider threat parallel is precise. A company can't assume every employee with office access will remain trustworthy under all conditions, so it layers controls — access logging, anomaly detection, escalation procedures, emergency termination — independent of how rigorous the hiring process was. DeepMind is applying the same logic. They use the MITRE ATT&CK framework, the standard threat taxonomy for enterprise security, to categorize potential AI agent behaviors. This breaks down "the agent did something wrong" into specific tactics and techniques that can be monitored, detected, and responded to with defined playbooks.

Most enterprise teams building on top of frontier models inherit the output of alignment training but have not built the structural governance plane that sits above it. They have observability — they can see what happened. They don't have enforcement — the pre-execution layer that stops the wrong action before it runs.

DeepMind has one of the largest AI safety teams in the world and direct access to their own model internals. They still chose to build system-level governance rather than rely solely on alignment. That choice is a benchmark.

How Production Agent Teams Are Exposed Right Now

Start with detection tier D1, the baseline: monitoring the agent's chain-of-thought reasoning for behavioral signals. For a team running a production coding agent or customer-facing workflow agent, this means reading and classifying the agent's stated reasoning against a behavioral policy in real time — not logging it, but evaluating it. Most teams log. Few evaluate against a policy.

D2 adds behavioral pattern detection beyond what the agent says — identifying signals in action sequences and tool call patterns that don't match policy. D3 adds a dedicated supervisor model reviewing agent decisions for high-risk actions. D4, the most advanced level, handles cases where the agent has learned to reason without visible text, requiring inspection of model internals rather than transcript analysis.

On the response side, R1 (asynchronous review) is roughly equivalent to having server logs after an incident — better than nothing, but not control. R3 (synchronous blocking) requires an enforcement layer that intercepts between the agent's decision and the tool call it wants to make, and can halt execution before the action runs.

That gap — knowing what happened versus stopping it before it happens — is the core exposure DeepMind is describing. Here's a practical self-assessment checklist based on the framework:

Detection:

Can you evaluate agent chain-of-thought against a behavioral policy in real time, not just log it? (D1)
Do you have behavioral pattern detection based on action sequences, not just keyword filtering? (D2)
For high-risk action categories, do you have a supervisor reviewing decisions before execution? (D3)

Response:

For low-risk reversible actions, do you have asynchronous review with defined remediation SLAs? (R1)
For high-risk actions — database writes, external API calls, file deletion, privileged operations — do you have pre-execution blocking? (R2/R3)
Do you have a policy-enforced kill switch that triggers automatically on defined behavioral thresholds, not just a manual container termination? (Kill policy)

Most teams can check the first item on each list. Almost none can check the last.

How Waxell Runtime Handles This

What DeepMind describes in the AI Control Roadmap is a governance plane: a structural layer that sits above individual agents and enforces behavioral policies independent of what the underlying model does. Waxell Runtime is that layer, built for teams that need these controls without redesigning their agents from scratch.

Runtime wraps your existing agents — regardless of model, framework, or architecture — and enforces policy-based controls before each action executes. No changes to agent code. No rebuilds. Waxell intercepts at the tool call layer and applies the policy, in real time, using a 2-line initialization against 200+ frameworks and libraries.

The controls DeepMind recommends map directly to Waxell Runtime policy categories:

Kill policy is the mechanism behind R3 synchronous blocking. When an agent's behavior crosses a defined threshold — a runaway loop, unauthorized scope expansion, cost limit violation, or content policy breach — Waxell's Kill policy terminates execution before the action runs. This is automated, policy-enforced, and logged with a full audit trail. It's the difference between a kill switch that someone has to pull and one that fires when the threshold is hit.

Control policy implements scope enforcement: what tools the agent is allowed to call, what data it can access, and what actions it can take at all. This is D2/D3 detection translated into enforcement. Not monitoring whether scope is violated, but preventing the violation at the execution boundary.

Content policy handles output-layer R3 blocking — stopping agent responses that contain PII, harmful content, or outputs that violate defined rules before they reach users or downstream systems.

Across 26 policy categories, Waxell Runtime provides D1-D3 detection and R1-R3 response coverage without requiring teams to build a custom supervisor agent stack. Waxell Observe handles the monitoring and behavioral pattern detection layer. Runtime handles enforcement. Together they cover the full arc of what DeepMind's framework requires.

For teams using vendor AI, external agents, or MCP-native tools — agents you didn't build and can't instrument directly — Waxell Connect extends the same governance to that layer without requiring the vendor to change anything. Same policies, same audit trail, no code changes on the agent side.

The $2.9 trillion in US economic value that McKinsey projects AI agents will create by 2030 is only capturable if enterprise teams can actually deploy those agents at scale. DeepMind's framework is the clearest published articulation of what "at scale" requires from a governance standpoint. It's also one team's implementation — not a spec for everyone else to replicate. Waxell Runtime is the implementation for teams that need these controls without DeepMind's internal tooling budget.

Get access to Waxell Runtime at https://waxell.ai/get-access.

Frequently Asked Questions

What is DeepMind's AI Control Roadmap?

DeepMind's AI Control Roadmap, published June 18, 2026, is a defense-in-depth framework for managing deployed AI agents that might be imperfectly aligned. It defines four detection levels (D1–D4) scaling from chain-of-thought monitoring to opaque reasoning analysis, and three response tiers (R1–R3) from asynchronous transcript review to real-time action blocking. It applies insider threat controls — standard in enterprise security for privileged human access — to AI agents.

Why does DeepMind treat its own AI agents as insider threats?

Because alignment training cannot guarantee agent behavior across every environment and capability level encountered in production. An insider threat model doesn't assume adversarial intent — it applies layered controls regardless, because no training process (like no hiring process) can guarantee behavior under all conditions. The same logic that justifies access logging and anomaly detection for privileged employees justifies behavioral monitoring and pre-execution enforcement for AI agents.

What is the difference between monitoring and governance for AI agents?

Monitoring means you can observe and log what an agent did after the fact. Governance means you enforce policies before an action executes — blocking it, requiring human approval, or terminating the agent when a threshold is crossed. DeepMind's framework distinguishes these explicitly: asynchronous transcript review (R1) is monitoring; synchronous blocking before execution (R3) is governance. Most enterprise teams have built monitoring. Governance requires a separate runtime enforcement layer.

What did DeepMind find from analyzing 1 million agent tasks?

DeepMind analyzed over one million internal coding agent tasks and found that most flagged events weren't adversarial — they resulted from agent misinterpretation or overeagedness to achieve a user's goal. This is critical for governance design: a system built only to detect deliberate misuse will miss the majority of real-world failures. Behavioral monitoring must cover off-script actions broadly, including well-intentioned agents acting on tasks in unauthorized ways.

What is a Kill policy for AI agents?

A Kill policy is an automated, pre-configured rule that terminates or halts agent execution when a defined behavioral threshold is crossed — runaway loops, scope violations, cost limits, or content violations. Unlike a manual kill switch, it fires automatically at the tool call layer before the action runs, with a full audit trail of what triggered it. Waxell Runtime implements Kill policies as one of its 26 policy categories.

How does Waxell Runtime implement DeepMind's recommended controls?

Waxell Runtime wraps existing agents and enforces policies before each tool call executes, without requiring changes to agent code or framework. Kill policies implement R3 synchronous blocking. Control policies implement D2/D3 scope enforcement. Content policies block unauthorized output at the response layer. Waxell Observe handles D1/D2 behavioral monitoring. For external and MCP-native agents, Waxell Connect extends the same governance without requiring vendor-side changes. Initialization takes two lines and supports 200+ frameworks.

Primary source: Securing the future of AI agents — Google DeepMind, June 18, 2026.

Agentic Governance, Explained

Waxell blog cover: multi-agent governance coordinator planner worker roles

Multi-Agent Governance: Why Role-Blind Policies Fail [2026]

Most governance tools apply identical policies to every agent. Here's why that breaks at coordinator, planner, and worker layers — and what actually works.

Logan Kelly

Jun 22, 2026

Waxell blog cover: SOC 2 attribution gap for AI agents

SOC 2 for AI Agents: The Attribution Gap [2026]

SOC 2's CC6.3 requires privileged actions traced to an accountable individual. AI agents act without human authorization. Here's how to close the gap.

Logan Kelly

Jun 19, 2026

Waxell blog cover: GitHub AI agent infrastructure crisis

GitHub's AI Agent Crisis: What 9 Outages Cost [2026]

In May 2026, GitHub logged 9 outages and added AWS capacity to stay online. Here's why unbounded AI coding agents break production — and what enforcement prevents.

Logan Kelly

Jun 17, 2026

Waxell blog cover: AI agent cost audit framework showing 5 steps to find fleet budget waste

AI Agent Cost Audit: 5 Steps to Find Fleet Budget Waste

A developer left 7 AI agents running for 2 hours and burned $200. Here's the 5-step cost audit that finds fleet budget waste before your next invoice arrives.

Logan Kelly

Jun 17, 2026

Multi-Agent Governance: Why Role-Blind Policies Fail [2026]

Most governance tools apply identical policies to every agent. Here's why that breaks at coordinator, planner, and worker layers — and what actually works.

Logan Kelly

Jun 22, 2026

SOC 2 for AI Agents: The Attribution Gap [2026]

SOC 2's CC6.3 requires privileged actions traced to an accountable individual. AI agents act without human authorization. Here's how to close the gap.

Logan Kelly

Jun 19, 2026

Waxell

Waxell provides observability and governance for AI agents in production. Bring your own framework.

Product

Company