Why I made the pipeline mandatory — and the agents got better

Conventional wisdom says you give a capable agent room to work. I did the opposite: a fixed, non-negotiable workflow from brainstorm to finish. Constraint didn't slow the agents down. It's what made them trustworthy.

The first version of my agent setup was permissive. I’d describe a task, hand it over, and let the model decide how to approach it. The thinking was obvious enough: it’s capable, so don’t get in its way. Constraints would just slow it down.

That was wrong, and it took a while to admit why.

A permissive agent doesn’t use its freedom to be more thoughtful. It uses it to skip the steps that feel optional in the moment — which is most of the important ones. It would jump straight to code because planning felt slow. It would call a task finished because writing a test felt like overhead. Each shortcut was locally reasonable and globally disastrous. I was getting fast, confident, unreliable work. The worst kind.

The fix was a pipeline nobody can skip

So I inverted the design. Instead of freedom with optional good practices, I built a single mandatory workflow that every non-trivial task has to pass through, in order:

BRAINSTORM → WORKTREE → PLAN → EXECUTE → TEST → REVIEW → FINISH → LEARN

Eight steps. Each one has a specific job, and none of them are negotiable. Brainstorm refines a vague request into an approved design before any code exists. Worktree isolates the work on its own branch. Plan breaks it into small tasks. Execute does them one at a time, verifying each. Test proves every change. Review checks the work against the plan. Finish handles the merge decision and cleanup. Learn logs what the session taught.

The key word is mandatory. The agent doesn’t get to decide that this particular task is simple enough to skip the plan. That decision — “is this the kind of work that needs rigor?” — is exactly the judgment call an untrained agent gets wrong, every time, in the direction of less rigor. So I took the decision away.

The counterintuitive part: constraint made it faster

Here’s what I didn’t expect. Removing the agent’s freedom to improvise didn’t make it slower overall. It made it faster, because it stopped doing the most expensive thing an agent can do: confidently going down the wrong path and forcing a costly unwind.

The permissive agent’s “fast” was an illusion. It produced output quickly and then I spent far longer discovering what was wrong with it, explaining the problem, and supervising the redo. The disciplined agent is slower for the first thirty seconds — it’s brainstorming and planning while the other one would already be typing — and then it’s dramatically faster for the rest of the task, because the work it produces is actually right.

This matches something I’d read in Anthropic’s own guidance on working with their models, and saw confirmed over and over in practice: agents perform best when given clear constraints rather than open-ended freedom. Open-endedness sounds like a gift. To an agent, it’s an invitation to guess.

Constraints are how you scale trust

There’s a deeper reason the mandatory pipeline matters, and it’s the reason I care about this beyond a side project. A fixed workflow makes agent behavior predictable. And predictability is the only thing that lets you delegate at scale.

If every agent session might or might not include a plan, might or might not be tested, might or might not be isolated, then every session needs a human to check which corners got cut. That doesn’t scale past a handful of tasks. But if the pipeline is guaranteed — if I know with certainty that anything reaching the review stage was planned, isolated, and tested — then I can trust the output without re-deriving it. The constraint is what converts “I have to watch this” into “I can rely on this.”

You can’t delegate to something whose behavior you can’t predict. The pipeline is predictability, enforced.

To make that real rather than aspirational, the enforcement has to live in the system, not in good intentions. So the pipeline is backed by actual gates — a verification script that checks for a real plan and a clean tree before work can be called done, and a CI check that runs on every pull request. The agent can’t talk its way past a failing gate. That’s the point. Discipline that depends on the agent choosing to be disciplined isn’t discipline. It’s hope.

The same lesson, one floor up

I lead cloud and AI architecture for large enterprises, and this is the identical conversation in a different register. Executives don’t ask me whether an AI system is clever. They ask whether it’s governable — whether its behavior is predictable enough to put in front of a regulator, a board, a customer’s most sensitive workload.

The answer, at every scale, is the same: you make AI trustworthy by constraining it, not by hoping it behaves. A mandatory pipeline for a coding agent and a governance framework for an enterprise AI platform are the same idea wearing different suits. Define the path. Enforce it in the system. Then, and only then, hand over the keys.

The instinct to give a capable thing freedom is a human one — it’s how we treat people we respect. But an agent isn’t asking for respect. It’s asking for a path. Give it the constraints, and it’ll surprise you with how far it can run inside them.

See the pipeline in code

The full BRAINSTORM→FINISH workflow, the enforcement scripts, and the CI gate — all in the repo.

View on GitHub →