← back to writing
#Agentic AI · #AI Engineering · #Memory · #Governance · #Evaluation · #MCP

The 5 Pillars of Agentic AI: From Prompting Models to Engineering Systems

The prompt era is ending. What replaces it isn't a cleverer prompt — it's engineering. Five foundations separate a demo that impresses from a system you can run in production: memory, state, orchestration, governance, and evaluation.

For two years the job was prompting — find the magic words, paste them into a model, marvel at the reply. That era is closing. The interesting work now isn’t a cleverer prompt; it’s engineering the system around the model so it can actually do a job, unattended, more than once.

I keep coming back to five foundations. Every agent system that survives contact with real work is built on them; every one that collapses is missing one or two. Here they are on one slide, and then one at a time.

The 5 Pillars of Agentic AI — Memory, State, Orchestration, Governance, Evaluation — the foundations every agent system must be built on as the field moves from prompting models to engineering systems.

The throughline is the subtitle: from prompting models to engineering systems. A prompt produces a reply. A system remembers, knows where it is, hands work off, stays on a leash, and proves it worked. Those are the five pillars.

1. Memory — remembering across sessions, not just within them

A model has a context window. That’s not memory — that’s short-term attention that evaporates the moment the session ends. An agent with no memory re-meets you every morning, re-learns every lesson, re-makes every mistake. It’s a goldfish with an API key.

Real memory has tiers, and the distinction matters:

The failure mode of skipping this is expensive and quiet: agents that look capable in a demo and amnesiac in production. I went deep on giving a fleet memory in tiers — and the punchline is that where memory lives (in-context vs. a vault both machines can reach) is a design decision, not an afterthought. Get it wrong and your agents are articulate strangers forever.

2. State — knowing where a workflow is, not just what to say next

Memory is about what an agent knows. State is about where the work is. They’re different, and conflating them is why so many agent “workflows” are really just one long, fragile turn that dies if anything interrupts it.

A system with real state can answer: what step are we on? what’s done, what’s blocked, what’s waiting on a human? And critically, it can pause, resume, hand off, recover, and audit — because the state lives outside any single agent’s head, in something durable.

This is exactly why I run my agents on an issue-backed board rather than a chat loop. Every unit of work is a tracked item with an explicit disposition — done, in-review, blocked, in-progress — and recurring work materialises into auditable tasks instead of vanishing into a transcript. When state is a first-class object, “the laptop closed mid-run” is a resumable event, not a lost afternoon. When it isn’t, every interruption is a restart.

3. Orchestration — agents that hand off without falling apart

One agent is a tool. Several agents that can route work to each other, delegate sub-tasks, and run multi-agent loops without losing context at every boundary — that’s a system. Orchestration is the pillar that turns a pile of capable agents into a workforce.

The hard part isn’t spawning agents; it’s the handoff. Context lost at the seam is where multi-agent systems quietly degrade — the second agent doesn’t know what the first decided, redoes it, or contradicts it. Good orchestration means routing with intent, delegating to child tasks instead of polling, and keeping the why flowing across every boundary.

It also means a real decision about how much machinery sits between an agent and its model — a bare local runtime, a kernel that wraps it, or a gateway in front of the whole thing. I laid out that orchestration trade-off in detail: each layer buys coordination and costs fragility, and the right answer is the lowest layer that does what you actually need.

4. Governance — autonomy with a leash you can verify

This is the pillar everyone wants to skip and nobody can afford to. Autonomy is the whole point of an agent — and unbounded autonomy is how you wake up to a force-pushed main, a leaked secret, or a very expensive bill. Governance is autonomy with a leash you can verify — not a leash you hope is holding.

In practice it’s a stack of small, boring controls that compound:

I’ve argued the uncomfortable version of this before: your agents aren’t under-capable, they’re under-governed, and the fix is governing them at scale — even to the point of making the pipeline mandatory so discipline isn’t optional. A leash you can’t verify is just a story you tell yourself about a system you don’t actually control.

5. Evaluation — knowing if it actually worked, not just if it replied

The last pillar is the one that separates engineering from theatre. A reply is not a result. An agent that confidently produces the wrong answer, fast, is worse than no agent — and you will not know which you have without evaluation.

That means metrics on outcomes (not just “did it respond”), tracing so you can see why it did what it did, regression tests so yesterday’s fix doesn’t silently break tomorrow, and a continuous feedback loop that turns failures into learning. Evaluation is what makes an agent system improvable instead of merely impressive.

The most useful version of this I’ve found is teaching agents to learn from losing — to treat a failed run as a signal to refine, not an embarrassment to bury. Without that loop, you don’t have a system that gets better; you have a slot machine that occasionally pays out.

The pattern: each pillar is a thing you build, not a thing you prompt

Look at the five together and the shape is obvious. Memory so it remembers. State so it knows where the work is. Orchestration so the pieces cooperate. Governance so autonomy stays on a verifiable leash. Evaluation so you know it worked. None of those is a prompt. Every one of them is a system you design, build, and operate.

That’s the whole shift the slide is pointing at — from prompting models to engineering systems. The model is the easy part now; it’s a commodity you call. The durable advantage is in the five pillars around it. Pick any agent project that impressed you in a demo and quietly died in production, and I’ll show you which pillar it skipped.

Start with the one you’re weakest on. For most people, that’s evaluation — because it’s the only pillar that tells you the truth about the other four.