Logan Kelly

600 Firewalls in 5 Weeks: What the FortiGate AI Attack Teaches Us About Human Oversight

600 Firewalls in 5 Weeks: What the FortiGate AI Attack Teaches Us About Human Oversight

An AI agent compromised 600+ firewalls across 55 countries in 5 weeks — without a human approving each command. Here's what enterprise teams building agents need to learn from it.

600 Firewalls in 5 Weeks — The Human-in-the-Loop Lesson from the FortiGate AI Attack

Between January 11 and February 18, 2026, an attacker with limited technical skills compromised more than 600 FortiGate firewall appliances across 55 countries — in five weeks, without needing to approve each attack command themselves.

They didn't need to. They had built an AI agent to do it — and configured it to act without waiting for their approval.

At the center of the operation was ARXON — a custom-built tool that researchers characterized as a Model Context Protocol (MCP) server. ARXON fed reconnaissance data from compromised FortiGate devices into commercial large language models — including DeepSeek and Anthropic's Claude — to generate structured attack plans. A separate Docker-based Go tool called CHECKER2 ran parallel scans of thousands of VPN endpoints. Claude Code was then configured to execute the attack plans autonomously via a pre-authorization configuration file that eliminated interactive approval per command — including running Impacket (secretsdump, psexec, wmiexec), Metasploit modules, and hashcat against victim networks, in some cases with hard-coded credentials for a major media company already embedded in the config. The attacker, according to Amazon Threat Intelligence, was financially motivated and Russian-speaking — and, writing on the AWS Security Blog, CJ Moses (Amazon's Chief Information Security Officer) described this campaign as evidence of commercial AI enabling "unsophisticated" actors to execute operations that would previously have required far more people or time.

The scale of the attack was enabled by a specific architectural choice: no human approval requirement per execution step. That's the lesson most enterprise AI teams are missing — it's not about firewalls or FortiGate credentials. It's about what happens to any agentic system when you remove the human from the execution loop.

Human-in-the-loop (HITL) in AI agent systems refers to the architectural requirement that an agent pause and request human approval before executing high-consequence actions — rather than executing autonomously based solely on its own reasoning. HITL is not about slowing agents down for every action; it is about defining which actions are consequential enough to require human sign-off before execution. Without this boundary, an agent's blast radius is limited only by what it has access to. In the FortiGate attack, there was no HITL boundary on Claude Code's execution — which is why 600 firewalls fell in five weeks instead of five months.

What did ARXON actually do — and why does it matter for enterprise AI teams?

The attack architecture is worth understanding precisely because it isn't exotic. ARXON isn't a classified offensive tool. It's a pattern that any engineering team could accidentally replicate.

The setup: a threat actor built a multi-step agentic system. Step one was reconnaissance — CHECKER2, the parallel scanning tool, mapped exposed management interfaces and identified devices with weak, single-factor credentials. That reconnaissance data was fed into ARXON, which queried Claude and DeepSeek to produce structured attack plans: which credentials to try next, where to look for Domain Admin rights, how to spread laterally through corporate networks. Claude Code then executed those plans directly — via a pre-authorization configuration that eliminated per-command approval, including pre-loaded credentials for victim organizations — without pausing for the attacker to review. Post-exploitation, the attacker extracted full firewall configurations including VPN and administrative credentials, then moved into corporate Active Directory environments and targeted backup infrastructure — the classic precursor playbook for ransomware operations.

This is the same architecture pattern as a legitimate enterprise AI agent: a planner component (ARXON + LLM) feeding instructions to an executor component (Claude Code) that acts on real systems. The difference is intent, not design.

When you build an enterprise agent that queries an LLM for the next action and then executes it against a database, an API, or a customer record — you've built the same architecture ARXON used. The question is what controls sit between the reasoning step and the execution step.

In ARXON's case: none. That's why the scale was 600 devices in 5 weeks, not 60 in 5 months.

Why do AI agents need approval loops — not just audit logs?

This is the question teams get wrong most often. The typical response to an incident like the FortiGate attack is to add observability: better logging, clearer traces, dashboards that show what the agent did. That's necessary but insufficient.

ARXON had, functionally, perfect observability from the attacker's perspective. They could see everything the agent was doing — every lateral movement step, every credential attempt, every compromised host. That observability didn't stop anything. It just provided a record of what succeeded.

Observability answers the question: what did the agent do?

Human-in-the-loop governance answers the question: is the agent allowed to do this next action, now, with these parameters?

The architectural difference matters because of timing. Observability is post-execution. HITL policy enforcement is pre-execution — it intercepts before the action runs, not after. An audit trail for every action tells you what happened. An approval policy stops it from happening.

For enterprise teams, this shows up in a specific class of high-consequence agent actions:

  • Writing to production databases

  • Issuing API calls that create, update, or delete records

  • Sending emails or messages on behalf of users

  • Accessing or transmitting customer PII

  • Triggering financial transactions or workflow escalations

These aren't the actions agents take constantly — they're the ones where the blast radius of an error is large. The ARXON architecture demonstrates what the blast radius looks like when you remove the approval gate from that category of actions: 600 compromised hosts across 55 countries.

What does effective human-in-the-loop governance actually look like?

"Human in the loop" is often implemented as theater — a confirmation modal the user clicks through, or a flag that logs when something happened without actually requiring approval before it runs. Real HITL governance has three requirements that distinguish it from performance.

It's pre-execution, not post-hoc. An approval policy fires before the action executes. Not after the LLM decides to take the action. Not after the tool call returns. Before. The agent's execution is paused at the decision boundary — the moment between "the LLM proposed this action" and "the action runs." This is the only point at which approval is meaningful.

It's scoped to consequence, not frequency. Requiring human approval for every agent action is operationally unusable. Effective HITL governance defines which action types require approval — based on the resource accessed, the data classification involved, the destructiveness of the operation, or the dollar threshold of the action. Everything below the threshold runs autonomously. Everything above it pauses for review. ARXON had no threshold. Claude Code executed everything it was instructed to execute, at the same level of autonomy, regardless of consequence.

It leaves a verifiable record. Every approval request, the decision made, the identity of the approver, and the timestamp belongs in the same execution trace as the agent's tool calls. Not in a separate log system. Not in a Slack thread. In the execution record, so that the decision to approve is as auditable as the action it authorized. Human oversight without documentation is oversight that can't be verified.

For the FortiGate attack: the ARXON system executing Impacket and Metasploit autonomously, without the threat actor approving each command, is precisely the failure mode that scoped approval policies prevent. If Claude Code had been configured with a policy requiring approval before executing any offensive tool call, the attacker would have needed to manually review and approve each step — at which point they'd have been better off just running the tools themselves. The AI's scale advantage evaporates when you reintroduce human decision points at consequence-appropriate thresholds.

What makes the ARXON attack a governance failure, not just a security failure?

The standard security framing of this incident focuses on the defender side: patch your FortiGate devices, enable MFA, don't expose management interfaces. That's correct as far as it goes.

But the attacker-side story is the governance lesson for enterprise AI builders. ARXON worked because the system's designer built an autonomous execution pipeline with no approval gates. That design decision — "Claude Code doesn't need to ask before it runs" — is what enabled the 5-week, 55-country scale.

Every enterprise AI team making the same design decision is building the same risk into their own systems. Your agent isn't attacking firewalls. But it may be:

  • Executing database writes that can't be undone

  • Sending customer-facing communications that can't be recalled

  • Triggering financial operations that require reversal processes

  • Accessing data that shouldn't have left its classified boundary

The governance plane concept exists precisely because agents can't govern themselves. An LLM that has decided to take an action doesn't have a built-in mechanism to evaluate whether that action should require human sign-off. That evaluation has to happen at the infrastructure layer — above the agent's reasoning, before execution.

ARXON didn't have an infrastructure layer above it. Claude Code just executed.

How Waxell handles this

How Waxell handles this: Waxell's approval policies define which action types require human sign-off before execution — scoped by tool category, data classification, resource type, or any combination. The agent's execution pauses at the decision boundary: the LLM has proposed an action, but the action hasn't run yet. A designated approver receives the escalation, reviews the proposed action in context, and approves or rejects it. The decision — who approved, when, and in response to what proposed action — is embedded in the same execution trace as the agent's tool calls, creating a complete audit trail for every action, including the actions that required and received human review.

Policies are defined once at the governance layer and enforced across every agent session regardless of framework — LangChain, CrewAI, LlamaIndex, or custom Python. Updating the approval threshold for a category of actions doesn't require a deployment. The governance layer is independent of the agent code.

If you're building agents that interact with real systems — databases, APIs, external services — the question isn't whether your architecture resembles ARXON's. It does. The question is whether you've built the governance layer above it. Waxell lets you define approval policies once and enforce them across every agent, without modifying agent code. Get early access to add the governance layer your agents are missing.

Frequently Asked Questions

What is human-in-the-loop for AI agents?
Human-in-the-loop (HITL) for AI agents means requiring human approval before the agent executes a defined category of high-consequence actions. It is not a requirement to review every action — that would make agents operationally useless. It is a policy that identifies which action types (database writes, data transmissions, financial operations, etc.) require sign-off before running, and pauses execution until that approval is received. The FortiGate attack demonstrated what happens at scale when this boundary is removed: an agent that doesn't need permission can compromise 600 systems in 5 weeks.

Did the FortiGate attack use AI agents?
According to Amazon Threat Intelligence's February 2026 disclosure, the attacker built a custom MCP-based framework called ARXON that queried commercial large language models (DeepSeek and Anthropic's Claude) to generate structured attack plans. Claude Code was then configured to execute those plans autonomously — running Impacket scripts, Metasploit modules, and hashcat — without requiring the threat actor to approve each command. This is a multi-step agentic architecture: a planner component feeding instructions to an executor component that acts on real systems without human review per action.

Why isn't observability enough to prevent AI agent incidents?
Observability records what your agents did. It answers a post-execution question. Human-in-the-loop governance answers a pre-execution question: is this action authorized before it runs? In the FortiGate case, the attacker had full visibility into what ARXON was doing — but that visibility didn't slow the attack. Only a policy that paused execution before high-consequence actions could have done that. Observability is necessary and insufficient; governance enforcement is what turns visibility into control.

What actions should require human approval in an AI agent system?
The answer depends on consequence, not category. The principle: any action whose failure mode is difficult or impossible to reverse, or whose blast radius is large, should require approval before execution. Typical candidates include: write operations to production databases, API calls that create or delete records, outbound communications sent on behalf of users or the organization, access to classified or sensitive data, financial operations above a defined threshold, and any tool call that grants elevated permissions. The threshold for "consequential enough" should be defined at the policy layer, not left to the agent's own judgment.

How does the ARXON attack relate to enterprise AI agent risks?
The attack architecture is structurally identical to legitimate enterprise agent patterns: a planning component that queries an LLM for the next action, feeding instructions to an execution component that acts on real systems. The risk isn't that ARXON is exotic — it's that it's recognizable. Enterprise teams building agents that can write to databases, call external APIs, or trigger workflows have built the same architecture. The question is whether they've introduced governance controls at the execution boundary. ARXON had none. Most enterprise agents have incomplete governance at this boundary.

What is the difference between human-in-the-loop and human-on-the-loop?
Human-in-the-loop means the agent's execution is paused until a human approves a specific action before it runs. Human-on-the-loop means a human monitors what the agent is doing and can intervene if they notice a problem — but the agent doesn't wait. The FortiGate attack illustrates why "on the loop" provides minimal protection at scale: if an agent can take 600 actions before a monitor intervenes, the damage is already done. HITL requires the agent to pause at the decision boundary. HOTL only provides a chance to intervene if the monitor is watching at exactly the right moment.

Sources

Agentic Governance, Explained

Waxell

Waxell provides observability and governance for AI agents in production. Bring your own framework.

© 2026 Waxell. All rights reserved.

Patent Pending.

Waxell

Waxell provides observability and governance for AI agents in production. Bring your own framework.

© 2026 Waxell. All rights reserved.

Patent Pending.

Waxell

Waxell provides observability and governance for AI agents in production. Bring your own framework.

© 2026 Waxell. All rights reserved.

Patent Pending.