i am jonathan haas.

i run EvalOps, a lab focused on making AI systems less brittle. right now that means stress-testing language models, measuring how they drift, and wiring guardrails back into production.

before this i shipped code at Snap, Carta, and DoorDash, then built ThreatKey. most days you can find me pairing with teams that want fewer surprises in production.

recent writing

I Tested 5 Embedding Models on 10K Developer Questions

Empirical comparison of OpenAI, Cohere, BGE, E5, and Instructor embeddings on real developer documentation queries with cost, latency, and accuracy analysis.

#ai#research#embeddings
The Complete Guide to Developer Experience

A comprehensive synthesis of 21 posts on DX: patterns, principles, and practices for building exceptional developer tools and experiences.

#developer-experience#engineering#product
The 10-Minute AI POC That Becomes a 10-Month Nightmare

It started with a Jupyter notebook. 'Look, I built a chatbot in 10 minutes!' Nine months later, three engineers had quit and the company almost folded.

#ai#technical-debt#poc

see more

projects i'm proud of

EvalOps lab

applied research shop pressure-testing evaluation guardrails with real teams.

security engineering series

field notes on hardening production systems before they fall apart.

cognitive dissonance detection

multi-agent probes that flag conflicting model behavior before users see it.

dspy 0-to-1 guide

hands-on playbook for shipping self-improving LLM apps without guesswork.