How to secure AI agents in production
A practical playbook for teams shipping agents to real customers. Five controls that turn a probabilistic LLM into a system you can put on call.
The problem
AI agents are non-deterministic processes with production credentials. The same prompt can refund $4,500 today and email 500 leads tomorrow. Traditional observability tells you what broke after Stripe processed the refund. By then, your customer already got the email.
Securing agents is not about better prompts. It's about putting a deterministic gate between the agent and the tools that can hurt customers.
The five controls
Write rules in YAML or code that scope each tool by agent, environment, and argument shape. Hot-reload them in production without redeploying the agent.
Pause any action over a threshold — refunds, mass emails, customer deletes — and route to Slack, Linear, or PagerDuty. Approvers can edit args before resuming.
Detect repeated tool calls and runaway API spend. Trip the breaker before the bill arrives. The same control catches prompt-injection–driven retry storms.
Step through any agent run after the fact. Inspect tool args, policy decisions, latency, and cost per step. Required for incident review and SOC 2 evidence.
Every prompt, tool call, decision, and response — searchable, filterable, exportable. The audit trail your security team will ask for.
Where the gate sits
Inline. Before the tool runs. p95 under 50ms.
Three lines to get started
import { saferun } from "@saferun/sdk";
const agent = saferun.wrap(myAgent, {
policies: ["./policies.yaml"],
onPause: async (a) => slack.notify("#ops-approvals", a),
});Common questions
Scope policies per environment (dev / staging / prod). Require approval for destructive tools in prod only. Capture replays of every run for post-mortem. Enforce loop and spend breakers before any tool executes.
Treat the LLM as untrusted input. Validate tool arguments against schemas. Scope each tool by agent and environment. Route high-impact actions to a human. Keep a tamper-evident audit log.
Wrap the agent with an inline policy evaluator deployed at the edge. Isolate tool credentials behind the gate so the agent never holds raw API keys. Stream every action to a searchable log with replay.
Prompt injection is dangerous because the LLM can be tricked into calling a tool. The gate doesn't trust the LLM — it trusts the policy. If the injected instruction tells the agent to refund $50,000, the policy blocks the refund regardless of what the prompt says.
Ship agents to production without losing sleep.
Currently onboarding our first design partners. Free during early access.
