← back to where I focus
01 Where I focus

Building production agents

An agent that dazzles in a demo and falls over in production isn't an agent — it's a liability. The real work is memory, state, and orchestration: the architecture that makes autonomy dependable when no one's watching.

Demo-perfect, production-fragile

Almost every agent demo works. That’s the trap. A scripted path, a clean input, a forgiving audience — anything looks autonomous for ten minutes. Production is the opposite: messy inputs, half-failed tool calls, state that has to survive a restart, and nobody watching at 3am. The gap between the two isn’t the model. It’s the architecture around it.

That architecture is what I build.

The three things that make an agent dependable

The harness, not the prompt

I think of this as harness engineering: the engineered loop the model runs inside — plan, gather context, reason, act, observe, learn — going around a thousand times a night across a fleet. The model owns one beat. The other five are architecture, and they’re where projects live or die.

Get the harness right and the model becomes a config value: you swap it in an afternoon, and everything you built around it keeps working. Get it wrong and you have a clever demo that falls over the first time a tool call times out.

Security is a property of the build

Because agents act, the threat model is real — prompt injection, over-broad tokens, irreversible tool calls. I treat every input as hostile, give each agent its own scoped identity and short-lived credentials, classify every tool by blast radius, and put high-impact actions behind approval. But this is built into the architecture, not bolted on afterwards. A dependable agent is a secure agent, because both come from the same discipline: knowing exactly what the system can do, and proving it.

What it unlocks

When the architecture is right, autonomy stops being scary. You can give an agent more scope because you can see what it’s doing, stop it when you need to, and trust that it will pick up cleanly when something fails. Dependable beats impressive every time the demo ends and the real work starts.