Failure mode library

When your production AI agent is broken, start here.

A growing field reference of production AI agent failure modes. Each entry: the symptom in your words, what is actually happening, the three fixes that do not work, and the one that does. Published openly.

LangGraph recursion limit of 20 reached: the actual fix (not what the docs say)
May 19, 2026
Why your RAG pipeline returns wrong chunks (and the 5-minute diagnostic)
May 18, 2026
Claude Opus 4.7 quietly raised your token bill ~35% — the migration audit nobody is doing
May 18, 2026
OpenAI silently downgraded your agent's default model — the one-line audit every team should run today
May 17, 2026