Agentic AI risks: the 8 failure modes that reach production
What goes wrong when LLMs hold production credentials — and the inline controls that stop each one before customers feel it.
An agent calls stripe.refund or postgres.delete with arguments no human approved. Often triggered by ambiguous user instructions or hallucinated context.
Inline policy gate that requires approval above a threshold.
An agent retries a failing tool 200 times in a minute, racking up API charges or hitting rate limits that take down the rest of your stack.
Loop circuit breaker — trip after N calls in a window.
User input or scraped page content tricks the LLM into calling a sensitive tool the user never asked for. The classic 'ignore previous instructions' attack at production scale.
Don't trust the LLM. Trust the policy. The gate enforces what tools can run regardless of prompt.
Agent reads PII from one tool and emails or posts it to another. Often unintentional — the agent thinks it's being helpful.
Scope tools by argument shape. Block emails containing email/SSN patterns to external domains.
A bug or injection sends the agent into a 50,000-token retry loop against GPT-4. The bill arrives Monday morning.
Spend circuit breaker per agent and per tool.
A model upgrade or prompt tweak changes behavior subtly. Refund rates jump 3x and nobody notices for two weeks.
Action log + replay. Diff behavior across deploys.
Customer says 'your bot deleted my account.' You have no record of why the agent decided to call delete.
Tamper-evident log of every prompt, tool call, decision, and response. Replay the exact run.
SOC 2 and HIPAA auditors ask: who approved this action, when, and based on what context? 'The LLM decided' is not an answer.
Human-in-the-loop approvals with audit trail tied to the approver's identity.
All eight, mitigated by one inline gate.
SafeRun sits between your agents and your tools. Currently onboarding our first design partners.
