Logan Kelly
Feb 20, 2026
Most AI teams have shipped agents but haven't built governance. There's a maturity gap between observing what agents do and actually controlling it. Here's what it looks like.

Here's a question worth sitting with: what would you do right now if your agent started behaving badly?
Not catastrophically — not the science fiction version where it goes rogue. The mundane version. It starts hallucinating on a specific class of queries. It's calling a downstream service more aggressively than you expected. It's occasionally including information in its responses that it probably shouldn't have access to. The behavior is subtle enough that it wouldn't trigger any alert you currently have configured.
How do you find it? How fast? What do you do when you do?
The agentic governance gap is the space between having operational visibility into AI agents (knowing what they did) and having actual control over them (defining and enforcing what they're allowed to do). Most teams with production agents have reached Stage 3 — observable — but not Stage 4 — governed. The difference is an enforcement layer: real-time policies that prevent bad behavior before it propagates, not dashboards that surface it after the fact. Based on Waxell's assessment of teams moving from prototype to production, fewer than 20% have implemented systematic governance controls by the time their agents are live. (See also: What is agentic governance →)
For most teams that have shipped agents in the last year, the honest answer involves some combination of: someone notices something off, engineers dig through logs manually, the cause is eventually identified, a patch is deployed. The timeline is hours to days. The damage — to users, to data, to cost budgets, to reputation — is already done.
This gap is wider than most teams realize, because it's easy to hide behind genuine engineering work that feels like it should be sufficient.
Why Observability Isn't the Same as Governance
Here's the dynamic that keeps the gap invisible for so long: teams that have invested in observability feel like they have governance. They have traces. They have session logs. They have dashboards. They can answer questions about what happened after it happened. This feels like control.
It isn't.
Governance isn't retrospective visibility. It's the capacity to define what acceptable behavior looks like, enforce it in real time, and intervene when it's violated — before the violation propagates into a user-visible problem or an audit-triggering incident.
The analogy I reach for is financial controls. A bank that only reviews transactions after they're complete has auditing. A bank that also runs real-time fraud scoring, enforces transaction limits, and can block suspicious transactions in flight has controls. The audit capability is table stakes. The controls are the differentiator.
Your observability stack is the audit capability. You're probably still missing the controls.
What Does AI Agent Governance Maturity Look Like?
It helps to have a map. Here's how agent deployments actually mature — which is to say, here's the spectrum most teams move through, not always in order and not always intentionally:
Stage 1: Prototype. One environment. Direct API calls. No logging, no monitoring. You're iterating fast. Governance isn't the point; proving the concept is.
Stage 2: Production-deployed, unmonitored. The agent is live. Real users. No meaningful observability. You find out about problems from user complaints. Most teams move through this stage faster than they'd like to admit.
Stage 3: Observable. Logging in place. Session traces. Some alerting on errors and latency. You can diagnose problems after they happen. This feels like a significant improvement — and it is — but it's still not governance.
Stage 4: Governed. Policies defined. Enforcement at the runtime layer. Real-time visibility into policy violations. Budget guardrails. PII controls. Audit trail that's usable by non-engineers. You can answer questions about agent behavior on a timeline of minutes, not hours.
Most teams with production agents are at Stage 3. They believe they're at Stage 4 because they've invested in observability tooling. The distinction between 3 and 4 is the enforcement layer — not more dashboards, but real controls.
What Flying Blind Actually Looks Like
It's not that you have no information. It's that the information you have isn't sufficient for the decisions you need to make, and the information you'd need is either not collected or not actionable in time.
A few patterns that show up repeatedly in teams that don't know they're at Stage 3:
You find cost anomalies in the monthly billing cycle. Spend spiked three weeks ago. You're only finding it now because the bill arrived. The sessions that caused the spike are cold. Whatever caused them is either fixed or still happening.
You can't answer regulatory questions in good time. A user requests deletion of their data under GDPR. You need to locate every place their PII appears in your agent's logs and processing history. You know it's in there. You don't have a tool that lets you find it systematically. This takes a team three days that should take an hour.
You learn about behavioral regressions from users. A code change three weeks ago altered a system prompt. It changed the agent's behavior in a subtle but consistent way. Users started noticing last week. You're figuring it out this week. There's no mechanism to detect behavioral drift; you're relying on user feedback as your canary.
You don't know what you'd do if something was actively wrong. The bad session is happening right now. What's the intervention? If the answer is "stop the service and redeploy," that's not governance — that's a blunt instrument. Governance gives you targeted interventions: terminate a specific session, apply a policy update without a redeploy, block a specific tool call pattern while everything else continues.
What the Gap Costs
The gap has a cost structure that's easy to underestimate because many of its costs are probabilistic and hypothetical until they're not.
Regulatory exposure. The EU AI Act, GDPR, HIPAA, and emerging state-level AI regulations all have something to say about AI systems that process personal data, make consequential decisions, or operate in high-risk domains. Organizations that can demonstrate systematic governance — defined policies, documented enforcement, auditable records — are in a defensible position. Organizations that can't are exposed.
Customer trust incidents. When an agent behaves badly in a visible way — surfaces data it shouldn't, gives harmful advice, produces output that's offensive or factually wrong in a damaging way — the customer relationship takes a hit that's out of proportion to the technical severity of the failure. The absence of governance is the story that gets told: "they didn't have controls in place."
Engineering drag. Teams without governance infrastructure spend disproportionate time on ad hoc incident response. Every anomaly is a manual investigation. Every compliance question is a one-off project. Every cost spike is a fire drill. This is engineering time that doesn't compound — it's spent, and then the next incident arrives.
The compounding cost of retrofitting. Governance that's designed in from the start costs a fraction of governance that's bolted on after the fact to a system that wasn't designed for it. Every month you delay is another month of technical debt accumulating against the governance retrofit.
How Fast Is Regulatory Pressure Building?
For teams in regulated industries (financial services, healthcare, legal) the timeline for governance being non-optional is already short. For everyone else, it's short-to-medium. The pace of AI regulation is accelerating in the EU, the UK, and in US states — and the direction of travel is toward higher requirements for documented controls on AI systems, not lower.
The good news is that governance infrastructure built for your own operational needs maps reasonably well to what regulators are asking for. Defined policies, enforcement logs, audit trails, incident response procedures — these aren't compliance theater, they're legitimate operational assets that also happen to satisfy what your auditor will eventually ask for.
Building governance because you need it operationally, and getting compliance coverage as a side effect, is a much better path than building it reactively under deadline pressure because regulators are asking.
The governance gap is closable. It requires a clear-eyed assessment of where you actually are on the maturity spectrum (most teams find they're a stage behind where they thought), and an intentional move toward enforcement infrastructure rather than more monitoring.
The teams that do this now do it on their own terms. Everyone else does it eventually, under conditions they didn't get to choose.
How Waxell handles this: Waxell is the enforcement layer that closes the gap between Stage 3 (observable) and Stage 4 (governed). You define policies — spend ceilings, PII rules, tool constraints — and Waxell enforces them in real time across every agent session, with a full audit trail documenting every governance decision. The operational questions that previously required investigation become answerable on demand. See how teams use it →
Frequently Asked Questions
What is the AI agent governance gap? The governance gap is the difference between observing what your AI agents do and actually controlling what they're allowed to do. Teams that have invested in observability — logs, traces, dashboards — often believe they have governance. They don't. Governance requires enforcement: real-time policies that prevent bad behavior before it occurs, not monitoring that surfaces it afterward.
What is the difference between AI agent observability and governance? Observability is retrospective visibility — you can see what happened after it happened. Governance is prospective control — you define what's allowed to happen and enforce those rules in real time. The analogy: a bank that reviews transactions after they complete has auditing. A bank that also enforces transaction limits and runs real-time fraud scoring has controls. You probably have the first. You likely don't have the second.
What does AI agent governance maturity look like? Governance maturity moves through four stages: prototype (no monitoring), production-deployed but unmonitored (live but blind), observable (logging and traces, problems diagnosed after the fact), and governed (policies defined, enforcement in real time, operational questions answerable on demand). Most teams with production agents are at Stage 3 believing they're at Stage 4. The diagnostic question: can you answer behavioral, cost, and data questions about your agents in minutes without engineering investigation?
How do you know if your AI team has a governance gap? Four signals: you find cost anomalies in monthly billing rather than in real time; you can't answer GDPR data subject requests without a multi-day engineering investigation; you learn about behavioral regressions from users rather than monitoring; and you don't know what targeted intervention you'd take if an agent was actively misbehaving right now — your only option is a full service restart.
What does it cost to close the governance gap later versus now? Governance designed in from the start costs a fraction of governance retrofitted onto a system that wasn't designed for it. The compounding cost: every month without governance is another month of technical debt, plus the probabilistic cost of incidents that happen in the gap — regulatory exposure, customer trust incidents, engineering time spent on manual incident response, and the cost of the incident itself.
Agentic Governance, Explained




