I Built a Framework So Disciplined I Couldn't Use It
I shipped a governance framework for AI agents, then failed its own adoption test — no uninstall, no way to list its skills, no way to know if it had drifted. Here's the sprint that fixed it, and the four patterns you can steal whether or not you ever touch my repo.
I built a framework to make AI coding agents disciplined. Then I tried to install it on a clean machine — and learned, in about ten minutes, that I’d made it disciplined, not usable.
There was no uninstall. No way to list the skills it shipped. No way to ask whether the copy in this repo had drifted from the one I published. I’d written an entire rulebook for how agents should behave and forgotten to put a door on the building.
Back in March I introduced the Copilot Agents Dojo — a behavioral governance framework you drop into a repo to turn Copilot from a fancy autocomplete into a disciplined teammate. That post was about the philosophy. This one is about what happened when I tried to make other people actually adopt it — and the four lessons that only show up when you stop building a thing and start shipping it.
The test I failed
Here’s the embarrassing part. I benchmarked the Dojo against a leading peer starter kit — the kind built for one-command install and a plugin marketplace — and the scoreboard was lopsided in a way I didn’t expect.
My Dojo The peer kit
Governance rigor ██████████ ███
Actually adoptable ██ ██████████
I had the spec gate, the traceability, the supply-chain pinning, the control plane. They had something I didn’t: people could get it, run it, and get rid of it without reading the source. My framework was a beautifully organised dojo with the front door bricked over.
So I wrote the gap down honestly — eighteen gaps, each with a one-line “why it matters” — and spent a sprint closing the ones that mattered. Not by copying the peer. By asking a sharper question:
A framework only compounds if it spreads. Rigor that nobody can adopt isn’t governance — it’s a museum.
What I actually shipped
Four moves changed the Dojo’s character, not just its feature count.
1. A door you can walk back out of. The installer now writes a checksummed manifest, ships a doctor that detects drift, and a preserve-modified uninstall. You can install it, ask “has anything here changed since I installed it?”, and cleanly remove it without nuking your own edits. Trust isn’t the install — it’s the uninstall.
2. The skills became visible. Auto-activation felt elegant and turned out to be invisible: nobody could see what the agent knew how to do. So every skill and persona is now a discoverable slash command, generated into prompt shims. A framework whose capabilities you can’t list is a framework you can’t trust.
3. One command runs the kata. The whole mandatory pipeline — BRAINSTORM → PLAN → EXECUTE → TEST → REVIEW → FINISH → LEARN — used to be a sequence you invoked by hand. Now sprint chains it behind one entry point, with a parallel swarm variant when the work splits cleanly.
4. It reads its own diary before planning. This is the one I’m proudest of. The Dojo always stored memory — a vault, an MCP memory server, a Postgres time machine. But storing isn’t remembering. Now, before it plans anything, it recalls: prior decisions, patterns, and past sessions, surfaced as Step 0 of planning.
Before: store memory → ... → forget to read it → repeat the mistake
After: store memory → recall at plan time → start ahead of last time
The gap was never the store. It was the loop.
And underneath those four: the guardrails are skills now (block rm -rf, block force-push — invokable, not just prose in a config file), there’s a real security-audit skill (OWASP + STRIDE, with a fix mode), the installer detects your stack instead of asking you to choose blind, and first-run is a gamified quest instead of a wall of README.
Why it’s better — and it’s not the feature list
The honest answer to “why is this better” isn’t more skills. It’s three changes in kind:
- It’s adoptable. There’s a door. You can try it in minutes and leave clean.
- It’s legible. You can see what it does before you trust it with your repo.
- It learns on its own terms. Planning now starts from what was learned last time, not a blank page.
And here’s the part I refused to trade away: I did all of this without weakening the moat. Every new capability still has to pass the same verification gate everything else does. The spec, the traceability, the supply-chain rules — all intact. I closed the adoption gap on the peer’s terms while keeping the governance lead on mine. I even contributed the anti-drift gate back upstream.
That’s the whole thesis in one line: you don’t have to choose between rigor and reach — but you do have to build the door on purpose.
How any of this helps you — even if you never touch my repo
You don’t need the Dojo. You need the patterns. Four things to steal:
-
Put a door on your framework. If you ship internal tooling, conventions, or agent config, build the uninstall and the drift check before you build the next feature. Adoption dies on the exit, not the entrance. Can a teammate try your thing and back it out clean? If not, that’s your real backlog.
-
Make your capabilities listable. Auto-magic is invisible, and invisible is untrustworthy. Whatever your agents can do, expose a way to list it. “What can this thing do?” should be a command, not an archaeology dig through prompts.
-
Don’t just store memory — recall it at decision time. Most teams pour everything into a vault and never read it back into the moment a decision is made. The value isn’t the database; it’s wiring retrieval into the step before you act. A diary you never reread is just disk.
-
Make your guardrails skills, not slogans. “Don’t force-push” in a README is a hope. The same rule as an invokable, discoverable guard is a control. Governance written into the surface beats governance bolted on beside it.
None of these are Copilot-specific. They’re how you make any opinionated system spread without losing its spine.
The bigger pattern
I keep coming back to the same idea across everything I build. In my AI operating system, the lesson was make autonomy opt-in and the off-state a query you can read back. Here it’s the sibling rule: make discipline adoptable, and adoption a door you can walk both ways through. Same instinct — rigor only counts if you can verify it and reach only counts if you can govern it.
The framework is sharper now. More importantly, it’s usable — which, it turns out, is its own kind of discipline.
If you’ve ever built something genuinely good that nobody adopted, I’d bet the problem wasn’t the quality — it was the door. What’s the best tool on your team that no one outside the team can install, list, or remove cleanly? That’s the gap worth closing this quarter.
The patterns above are all open in copilot-agents-dojo. Steal what’s useful, and tell me what your version looks like.
🥋 New here? Walk the Zen Quest.