Risks

Agentic AI risks: the 8 failure modes that reach production

What goes wrong when LLMs hold production credentials — and the inline controls that stop each one before customers feel it.

Rogue tool calls

An agent calls stripe.refund or postgres.delete with arguments no human approved. Often triggered by ambiguous user instructions or hallucinated context.

Mitigation

Inline policy gate that requires approval above a threshold.

Runaway loops

An agent retries a failing tool 200 times in a minute, racking up API charges or hitting rate limits that take down the rest of your stack.

Mitigation

Loop circuit breaker — trip after N calls in a window.

Prompt injection

User input or scraped page content tricks the LLM into calling a sensitive tool the user never asked for. The classic 'ignore previous instructions' attack at production scale.

Mitigation

Don't trust the LLM. Trust the policy. The gate enforces what tools can run regardless of prompt.

Data exfiltration

Agent reads PII from one tool and emails or posts it to another. Often unintentional — the agent thinks it's being helpful.

Mitigation

Scope tools by argument shape. Block emails containing email/SSN patterns to external domains.

Cost blowups

A bug or injection sends the agent into a 50,000-token retry loop against GPT-4. The bill arrives Monday morning.

Mitigation

Spend circuit breaker per agent and per tool.

Silent regressions

A model upgrade or prompt tweak changes behavior subtly. Refund rates jump 3x and nobody notices for two weeks.

Mitigation

Action log + replay. Diff behavior across deploys.

Untraceable incidents

Customer says 'your bot deleted my account.' You have no record of why the agent decided to call delete.

Mitigation

Tamper-evident log of every prompt, tool call, decision, and response. Replay the exact run.

Compliance gaps

SOC 2 and HIPAA auditors ask: who approved this action, when, and based on what context? 'The LLM decided' is not an answer.

Mitigation

Human-in-the-loop approvals with audit trail tied to the approver's identity.

All eight, mitigated by one inline gate.

SafeRun sits between your agents and your tools. Currently onboarding our first design partners.

Start free Read the guide