Guides
The processes we run, published openly.
In-depth, vendor-neutral guides to how we audit and harden production AI agents. The same processes we run on every engagement — written down so you can learn them, or hire us to run them on your agent.
AI Agent Evaluation: The Complete Process We Run on Every Production Agent
- 01Building Evaluation Datasets for AI Agents
- 02AI Agent Evaluators: LLM-as-Judge, Heuristics, Human Review, and Pairwise
- 03Offline Evaluation for AI Agents: Experiments, Regression Tests, Backtesting
- 04Online Evaluation and Production Monitoring for AI Agents
- 05AI Agent Evaluation Criteria and Metrics That Actually Matter