Failure mode library
When your production AI agent is broken, start here.
A growing field reference of production AI agent failure modes. Each entry: the symptom in your words, what is actually happening, the three fixes that do not work, and the one that does. Published openly.
LangGraph recursion limit of 20 reached: the actual fix (not what the docs say)
Why your RAG pipeline returns wrong chunks (and the 5-minute diagnostic)
Claude Opus 4.7 quietly raised your token bill ~35% — the migration audit nobody is doing
OpenAI silently downgraded your agent's default model — the one-line audit every team should run today