How Verra Works
An 11-step pipeline on every interaction.
Verra is security middleware that sits between your agents and everything they touch, including users, tools, databases, other agents, and models. Every interaction passes through the same pipeline regardless of protocol. MCP, A2A, or anything else.
The Pipeline
Every interaction runs these steps in order. Steps 2, 6, and 9 run partially in parallel, and logging is fire-and-forget so the response is never held.
Auth
Agent API key (x-verra-key) is looked up against registered agents. Unknown keys fail immediately with 401.
Policy load
Org-wide policy and per-agent policy override are fetched in parallel from Supabase. The agent-level override is layered on top.
Tool filtering
Any tools in the request not present in the agent's allowed_tools list are stripped from the payload before detection runs.
Header parse
W3C traceparent is read or generated. user_id and parent_agent_id are extracted from headers. The trace propagates through the entire A2A chain and exports as standard OTel spans.
A2A authorization
If parent_agent_id is present, Verra checks the caller→target trust matrix. Forbidden agent-type pairs and sensitive data patterns are blocked here.
Detection
Four detectors run in parallel: prompt injection (pattern + embedding), jailbreak (pattern + embedding + LLM judge), data exfiltration, policy violation. Aggregated into a single verdict: pass, flag, or block.
Approval gate
If risk is high and policy requires justification, a pending approval record is created and a 202 Accepted is returned with approval_id. The request does not proceed until approved.
Model routing
Low risk routes to model_target. Medium or high risk routes to private_model_target (self-hosted). If private target is unconfigured, the request is blocked.
Log receipt
A receipt is written to Supabase asynchronously. No raw text is stored, only hash, length, metadata, risk level, and findings.
Auto-classify
After 10+ receipts, Verra classifies the agent type from behavioral patterns. One of: hr, finance, legal, engineering, support, marketing, security, data, general.
Forward + scan response
Request is proxied to the LLM provider. The response is scanned for secrets or data leakage before being returned to the agent.
Detection
Four detectors. Zero serial latency.
All four detectors run in parallel on every request. Latency is the max of four concurrent checks, not the sum. Combined verdict: pass, mask, flag, or block.
Prompt Injection
Three layers, escalating cost. Pattern match first (sync, ~0ms). Fine-tuned on-device classifier second: protectai/deberta-v3-base-prompt-injection-v2 runs locally via ONNX with no API call. LLM judge third, only for ambiguous scores. Catches delimiter attacks, "ignore previous instructions" variants, context escape attempts, and soft persona-hijack attacks.
Jailbreak Detection
Three layers in sequence, escalating cost. Pattern match first (fast). Embedding similarity against a 31-prompt reference corpus second, using jackhhao/jailbreak-classifier and OpenAI embeddings (medium). LLM judge fallback third, only when the first two are inconclusive. Catches roleplay exploits, DAN-style prompts, system-note injection, and hypothetical framing attacks.
Data Exfiltration
Detects attempts to extract system prompts or model training data. Distinct from DLP; this covers intentional extraction attempts rather than accidental leakage.
Policy Violation
Customer-defined rules evaluated per request. Supports keyword filters, topic blocks, language restrictions, and custom LLM-judge rules. Defined in org policy and overridable per agent.
Risk signals: annotate every receipt regardless of verdict
PII
· Email addresses
· Phone numbers
· Social security numbers
· Dates of birth
Secrets
· sk-* patterns (OpenAI)
· AKIA* (AWS access keys)
· ghp_* (GitHub tokens)
· Bearer tokens
Tool Access Control
Four layers before a tool runs.
POST /api/gate/tool-input is called when an agent invokes a tool. All four layers must pass.
RBAC
Is the tool in the agent's allowed_tools list? Hard gate. If not, blocked immediately.
Permission matrix
16 tool categories × 9 agent types. Each pairing is expected, allowed, suspicious, or forbidden. An HR agent is suspicious on database_query and forbidden on code_execution.
Behavioral baseline
Compared against the agent's 200-receipt rolling profile. Tracks peak hours, tool frequency, data types. Anomaly score above 0.8 blocks. Between 0.5 and 0.8 warns. Fails open, baseline errors don't block production.
Content scan
DLP check on the tool input payload itself. Any policy violations here block the call. Fails closed.
Sample permission matrix (subset of 9×16)
| database_query | code_execution | admin | ||
|---|---|---|---|---|
| hr | ✓ allowed | ⚠ suspicious | ✗ forbidden | ✗ forbidden |
| finance | ✓ allowed | ✓ allowed | ⚠ suspicious | ✗ forbidden |
| engineering | ✓ allowed | ✓ allowed | ✓ allowed | ⚠ suspicious |
| security | ✓ allowed | ✓ allowed | ✓ allowed | ✓ allowed |
A2A Authorization
Agents don't automatically trust each other.
When Agent A delegates work to Agent B, Verra validates the trust relationship before Agent B can make any LLM calls.
Implicit path
Agent B includes x-verra-parent-agent: agent_a_id in its proxy header. Verra detects the delegation and checks the call matrix automatically.
Explicit path
Agent A first calls POST /api/a2a with target_agent_id and task. Verra validates both agents are in the same org, checks the delegation policy, writes an agent_handoff receipt, and returns a W3C traceparent. Agent B carries this forward to continue the same distributed trace.
Visibility
Full observability. No raw text stored.
Every receipt stores hash + length + metadata only. You get full audit capability without PII ever persisting. Verra is OpenTelemetry-native, and every trace exports to any OTLP backend.
Receipts
Every proxied call with risk level, findings, agent, trace ID, span ID, and detection reasons.
Approvals
Pending human reviews with approve/reject and full audit trail.
Shadow AI
Unregistered AI usage surfaced in the dashboard with agent, timestamp, and request metadata.
Agents
All registered agents: model targets, environments, tool permissions, call stats.
Lineage
Agent relationship graph. A2A edges, ego graph per agent, trace lookup.
Policy
Define org-wide rules: block/warn thresholds, PII handling, custom LLM-judge rules.
OTel export
Every trace is OpenTelemetry-native. Set OTEL_EXPORTER_OTLP_ENDPOINT to ship spans and metrics to Grafana Tempo, Jaeger, Honeycomb, Datadog, or any OTLP backend.
Analytics
Calls over time, risk distribution, agent performance trends.