Loading...
Loading...
I research evaluation methodologies and build infrastructure that makes AI systems reliable enough for production. Currently focused on agent orchestration and self-improving pipelines.
Vanta / ThreatKey (acquired) / Snap
Metrics and methodologies for measuring agent reliability
Self-improving pipelines with programmatic optimization
Guardrails and circuit breakers for production AI
Working on agent reliability?
I consult on evaluation infrastructure and agent architecture.