Waxell

Product

Compare

START FREE

Waxell

Logan Kelly

Feb 27, 2026

How to Keep PII Out of Your AI Agents (Without Slowing Them Down)

PII enters agent context in ways most teams haven't mapped. Here are the three vectors, how to detect them, and how to enforce data policies without killing performance.

Your agent probably saw a social security number last Tuesday. Do you know what it did with it?

If the answer is "I'd have to check the logs," that's the problem. If the answer is "we don't log inputs at that granularity," that's a bigger problem. And if you're not certain PII entered your agent context in the first place, that's the problem you need to solve first.

PII protection for AI agents means systematically preventing personally identifiable information from entering, propagating through, and leaking out of an agent's processing pipeline. Unlike traditional application security, PII in agent systems arrives through three distinct vectors — direct user input, tool call responses, and session context accumulation — each requiring a different detection and handling strategy. Effective protection requires pre-execution interception, not retroactive log scanning. (See also: What is agentic governance →)

Here's what's happening at most companies shipping customer-facing agents right now: the engineering team built something that works. The agent handles customer queries, retrieves relevant data, calls tools, writes coherent responses. In the process, it regularly encounters — and often processes, logs, and sometimes echoes back — personally identifiable information. And there's no systematic control over what happens to it. This is fixable — but only if you understand how PII actually enters agent context in the first place.

How Does PII Actually Enter AI Agent Context?

PII doesn't always arrive in the way you'd expect. The obvious case — a user typing "my SSN is 123-45-6789" — is actually the one teams tend to handle, because it's visible in the input and easy to reason about. The subtle cases are where things go wrong.

Vector 1: Direct user input. Yes, people type PII into chat interfaces. Not always intentionally. A customer troubleshooting a billing issue might paste in a full invoice including name, address, and account number. A user following up on a medical inquiry might include dates of birth, prescription numbers, or insurance IDs. The input field is not a sanitized environment.

Vector 2: Tool call responses. This one catches teams off guard. Your agent calls a CRM lookup tool, and the tool returns a full customer record — including fields the agent didn't need and the user shouldn't have access to. Or the agent calls a document retrieval tool, and the retrieved chunk happens to include PII from an adjacent record that got indexed alongside the relevant content. Your agent didn't ask for PII. It got it anyway, because the tools aren't selective.

Vector 3: Context accumulation over a session. This is the quietest failure mode. In a multi-turn conversation, context from early messages persists. If a user mentioned their date of birth in turn 2, that information is still in the context window at turn 17 when the agent is answering a completely different question. If you're using session-level logging (and you should be), that PII is now in your logs. If the agent is doing anything with the accumulated context — summarizing, synthesizing, passing it to tools — it's moving through your system.

What You're Actually Protecting Against

Before diving into detection, it's worth being precise about the risks. There are four places PII can end up that you don't want it to be.

LLM training data. Depending on your API provider and your agreement terms, inputs to the model may or may not be used for training. Most enterprise agreements opt out of this. But it's worth knowing.

Your own logs. This is the most common failure. You're logging LLM inputs for debugging purposes. Those logs contain PII. You now have a data store you probably aren't treating with appropriate access controls, retention policies, or deletion mechanisms.

Tool call payloads. If your agent passes PII to third-party tools — analytics systems, CRMs, ticketing systems — that data is now in those systems. With all of their data retention policies, which you may not have visibility into.

Model responses. The agent echoes PII back in a way it gets cached, indexed, or sent somewhere it shouldn't go. Customer support systems that store chat transcripts are the typical culprit here.

Each of these is a different threat vector with a different remediation approach.

How Do You Detect PII Before It Enters Agent Context?

There's a natural instinct to add PII detection after the fact — scan logs for patterns, run a batch job, flag violations retroactively. Don't do this. Retroactive detection tells you what already went wrong. You want to prevent it from going wrong.

Pre-execution detection intercepts data before it reaches the LLM. This is where you catch PII in user input before it enters the context window, and where you inspect tool call responses before they're appended to the conversation.

The technical approaches here exist on a spectrum between speed and accuracy:

Regex-based detection is fast and deterministic. Pattern matching for SSNs, credit card numbers, phone numbers, email addresses, dates of birth. It works reliably for structured PII. It doesn't catch unstructured PII ("my sister Sarah who works at Mayo Clinic") and generates false positives on things like product IDs that happen to match patterns.

Named entity recognition (NER) is more sophisticated — these models are trained to classify entities like PERSON, LOCATION, ORGANIZATION, DATE. They handle unstructured text better than regex but add latency. A small NER model running inline adds 20–50ms per check on typical hardware, which is usually acceptable.

LLM-based classification — asking a small, fast model to classify whether a given input contains PII — is the most accurate approach and the most expensive. You're running an LLM call to check whether to run an LLM call. This makes sense for high-risk operations (before writing to a database, before sending to a third-party tool) but not for every turn in a conversation.

For most production deployments, a combination works best: fast regex for common structured patterns, NER for unstructured text, LLM classification as a final check before high-consequence actions.

What to Do When You Find PII

Detection without a response strategy is theater. When you detect PII in the pipeline, you have a few options, and the right choice depends on context.

Redact and proceed. Replace the PII with a placeholder ([REDACTED] or a synthetic substitute) and let the conversation continue. This works when the PII isn't necessary for the agent to complete its task. It breaks when the agent needs the actual value (e.g., looking up an account by email address).

Mask with a reversible token. Replace the PII with a token that maps back to the real value in a secure store. The agent operates on the token; the real value never enters the LLM context. You look up the real value when you actually need it — to make a database call, to send a communication. This is more complex but preserves functionality.

Halt and request rephrasing. Block the agent's response and ask the user to rephrase without personal information. Appropriate for some contexts (a general-purpose assistant), disruptive in others (a support agent where the customer expects it to already know their details).

Allow with audit flag. In some cases — particularly where the agent legitimately needs PII to complete its task — you allow the processing to continue but flag the event for audit. This is a governance decision, not a detection failure. You're making an intentional choice to process PII and creating a record of that choice.

Does Pre-Execution PII Detection Add Latency?

The most common objection to pre-execution PII detection is latency. Adding a detection step before every LLM call slows things down.

This is real but manageable. A few approaches that work in practice:

Async detection for low-risk paths. Run PII detection in parallel with early conversation processing for turns where PII is unlikely. Only block if detection returns a positive result.

Tiered detection. Fast regex runs on everything. NER runs only on inputs that pass a length or complexity threshold. LLM classification runs only before high-consequence operations.

Caching. For tool responses, you can cache the PII scan result for a given response hash. If the same tool call returns the same data (common in CRM lookups), you've already scanned it.

Small dedicated models. Purpose-built PII detection models are significantly smaller and faster than general-purpose LLMs. Running a 200M parameter NER model is not the same computational cost as running a GPT-4-class model.

The teams that treat PII detection as a hard engineering problem — not a compliance checkbox — find ways to make it work at acceptable latency. The teams that treat it as overhead tend to skip it.

What Does a PII Compliance Audit Trail Need to Capture?

If your organization is subject to GDPR, HIPAA, CCPA, or any of the emerging AI-specific regulations, you're going to be asked to demonstrate what your agents do with personal data. "We have a PII detection step" is not sufficient. "We have a PII detection step and here is a record of every detection event, what data was flagged, what action was taken, and where the processed data went" — that's what compliance looks like.

Your audit trail needs to capture: the detection event, the policy applied, the outcome, and the disposition of the data. Not just "this happened" but "this happened, we handled it this way, here's the record."

This is harder to retrofit than to build in from the start.

The good news on all of this: it's solved engineering, not research. The detection models exist. The pattern libraries exist. The masking and tokenization approaches are well understood. What's missing, for most teams, is having a governance layer that makes them practical to deploy without building a custom integration for every agent.

That's what runtime governance is for.

How Waxell handles this: Waxell's Content policy type includes pre-execution PII detection on both user inputs and tool call responses — regex, NER, and configurable classification — with redact, mask, halt, or allow-with-audit-flag response actions. Every detection event is logged with the data flagged, the policy applied, and the outcome, giving you the audit trail that compliance actually requires. Deployed as a layer over your existing agents, no rewrites needed. Learn more →

Frequently Asked Questions

What are the three ways PII enters AI agent context? PII enters agent context through three vectors: direct user input (customers typing personal information into chat interfaces), tool call responses (CRM lookups or document retrievals that return more data than needed), and session context accumulation (PII mentioned early in a conversation persisting in the context window across later turns). Most teams focus on the first vector and miss the other two.

How do you keep PII out of AI agents without slowing them down? The performance-preserving approach uses tiered detection: fast regex for structured PII patterns (SSNs, credit cards, phone numbers) runs on all inputs; NER models handle unstructured text above a complexity threshold; LLM-based classification runs only before high-consequence operations like database writes or third-party API calls. A small dedicated NER model adds roughly 20–50ms per check — usually acceptable in production.

What should you do when an AI agent detects PII? You have four options: redact and proceed (replace PII with a placeholder and continue — works when the agent doesn't need the actual value), mask with a reversible token (the agent operates on a token that maps back to the real value in a secure store), halt and request rephrasing (block the response and ask the user to rephrase), or allow with audit flag (permit processing for cases where the agent legitimately needs the PII, but log it explicitly). The right choice depends on whether the agent requires the actual value to complete its task.

What does a PII compliance audit trail for AI agents need to capture? A compliant PII audit trail needs to capture: the detection event (what was flagged and where), the policy applied (redact, mask, halt, or allow), the outcome (what happened to the flagged data), and the disposition of the data (where processed PII ended up — log, API call, model response). "We have a detection step" is not sufficient for GDPR or HIPAA. The record of how each detection was handled is what auditors actually need.

Does PII protection apply to tool call results, not just user inputs? Yes — and this is the vector most teams miss. When an agent calls a CRM lookup tool, the response may contain full customer records including fields the agent didn't request. When a document retrieval tool returns a chunk, it may include PII from adjacent records. Pre-execution inspection should run on tool call responses before they're appended to the agent's context, not just on user-supplied inputs.

Agentic Governance, Explained

Waxell blog cover: CVE-2026-21520 Copilot prompt injection governance kill switch

CVE-2026-21520: Why Patching a Prompt Injection Doesn't Fix the Architecture

Microsoft patched CVE-2026-21520. Data still exfiltrated. The problem isn't the vulnerability — it's that AI safety filters aren't kill switches. Here's the architecture fix.

Logan Kelly

Apr 21, 2026

Waxell blog cover: abstract dark grid pattern, text reads "53% of AI Agents Exceed Their Permissions"

53% of AI Agents Exceed Their Permissions. That's an Architecture Problem.

CSA's April 2026 study found 53% of organizations had AI agents exceed intended permissions. This isn't a policy gap — it's a missing enforcement layer.

Logan Kelly

Apr 20, 2026

Waxell blog cover: The Three-Layer Agentic Architecture Most Teams Get Wrong

The Three-Layer Agentic Architecture Most Teams Build Wrong

LangChain says own your cognitive architecture, outsource your infrastructure. That's two layers. The governance plane is the third — and it's the one where production failures live.

Logan Kelly

Apr 17, 2026

Waxell blog cover: GitHub AI Agent Prompt Injection — Claude, Gemini, Copilot, No CVE

Comment and Control: The GitHub AI Agent Attack That Three Vendors Hushed

Researcher Aonan Guan hijacked Claude Code, Gemini CLI, and Copilot Agent via PR titles and hidden HTML comments. All three paid bug bounties. None filed a CVE. Here's what that means for your agents.

Logan Kelly

Apr 16, 2026

CVE-2026-21520: Why Patching a Prompt Injection Doesn't Fix the Architecture

Microsoft patched CVE-2026-21520. Data still exfiltrated. The problem isn't the vulnerability — it's that AI safety filters aren't kill switches. Here's the architecture fix.

Logan Kelly

Apr 21, 2026

53% of AI Agents Exceed Their Permissions. That's an Architecture Problem.

CSA's April 2026 study found 53% of organizations had AI agents exceed intended permissions. This isn't a policy gap — it's a missing enforcement layer.

Logan Kelly

Apr 20, 2026

Waxell

Waxell provides observability and governance for AI agents in production. Bring your own framework.

Product

Company