Notes from the edge of enterprise AI.
Field notes on AI governance, secure intelligent systems, agentic patterns, and the operational discipline behind moving AI from pilot to production.
I Gave Hermes and OpenClaw the Same Job for 30 Days. Only One Got Better.
Two of 2026's strongest agent stacks, the same repetitive workload, thirty days of real runs. One of them quietly rewired itself and pulled ahead — and the reason it won is the reason most agent comparisons ask the wrong question.
Bokken to Shinken: The Week My AI Framework Became a Real Blade
For months my agent framework was a bokken — a wooden practice sword. You couldn't really draw it. This week I forged the live blade: published to the registry with cryptographic provenance, behind a release gate that fails closed. Here's the deep version — the npm bin sanitisation that breaks npx, SLSA provenance via OIDC, fail-closed gates, single-source-of-truth docs, and release automation — plus the four principles underneath them.
The Smartest Answer Didn't Come From the Smartest Model
A benchmark went around last week: one setup beating Opus by 8% and GPT by 11%, with no new model and no special access. I've been running the same trick in my own agent for a while. It isn't a smarter brain — it's three ordinary ones and someone to chair the room. Here's what it actually buys, the bill nobody mentions, and why it's the same lesson that made me build a harness for my team.
The Board Was Green, the Work Wasn't: An Hour on Agentic AI at UNSW
I had sixty minutes and fifty final-year students to answer one question: what actually separates an agent that does the job from a demo that falls apart the moment a tool call times out? Here's the talk — the definition, the loop, three real exemplars in healthcare, education and the public sector, and where I'd embed Responsible AI so it survives contact with production.
Four Folders Won't Hold: Configuring Obsidian as Memory for Your AI
PARA gets you started, then it breaks. Six months in, your weekend vault is a junk drawer and your agent can't find anything in it. Here's how I reshaped Tiago Forte's four folders into deterministic memory categories an AI can actually ground on — frontmatter contracts, progressive summarisation written for a machine, and the weekly loop that keeps a second brain honest.
Provenance or It Didn't Happen
An agent that can't cite its source is just a confident stranger with opinions. Here's how I ground my fleet in truth with provenance — a governed Nexus Brain on Cosmos DB for work, and an Obsidian second brain you can build this weekend for yourself. Plus why Satya called the next discipline 'loop engineering' at Build 2026.
The Model Isn't the Edge. The Harness Is.
Two teams. Same model. One ships an agent that runs your delivery practice; the other ships a chatbot that forgets your name. The gap isn't the model — it's the harness. Here's the discipline nobody named yet.
Compaction: cutting agent context 62% with no accuracy loss
The night-shift agents were drowning in their own history. Here's the compaction pass that more than halved token cost on long runs — and the one summary it silently corrupted before I added pinned invariants.
You Can Sleep. Your Agents Don't Need To.
At 2:47am, my agent figured out that Customer X always says 'prod' but means 'staging.' At 9am, it caught the mistake before I shipped. An agent that's 1% better each night isn't 30% better in a month — it's 35%, compounding. I built the loop that makes it happen, and the four guardrails that stop it from going feral.
The 5 Pillars of Agentic AI, Part 2: Memory — Why Agents Need to Forget as Much as They Remember
Your agent just recommended a cheese plate to the customer who told it, last week, that they're lactose intolerant. It didn't lie — it forgot. Studying how MemoryBear and Microsoft Foundry build real memory, the same uncomfortable truth shows up: the hard part isn't remembering. It's forgetting.
The 5 Pillars of Agentic AI: From Prompting Models to Engineering Systems
Every AI agent demo is flawless — and then it dies in production. The gap between the demo and the disaster is the five things around the model: memory, state, orchestration, governance, and evaluation. The prompt era is over. This is the engineering era.
The 5 Pillars of Agentic AI, Part 1: Governance — The Four Controls That Make Agent Autonomy Safe
You wake up to a force-pushed main, deleted tests, and a leaked key — courtesy of an agent you trusted. 'Be careful' isn't governance; it's a wish. Here are the four concrete controls that turn a hopeful leash into one you can actually inspect: opt-in execution, a verifiable leash, soul files, and live guardrails.
Local CLI, a Hermes Wrapper, or OpenClaw? The Paperclip Adapter Decision Nobody Helps You Make
The adapter is the most consequential Paperclip setting and the least discussed. It decides how much machinery sits between your agent and the model. I wired my fleet all three ways — a bare Copilot CLI, a Hermes kernel wrapping it, and an OpenClaw gateway — and one of them quietly broke and started leaning on another. Here's the honest trade-off, and how to choose.
How I Configured Paperclip to Run My AI Delivery Practice
The question I get most often isn't 'what is Paperclip' — it's 'how did you actually set it up?' Here is the real configuration behind my 27-agent company: the config.json that matters, the three-file instruction cascade, skills as a single source of truth, and the execution contract that stops issues from silently blocking.
Your AI Company Is Burning Tokens and Shipping Nothing. Here's the Config That Fixes It.
The discussions are full of the same horror story: a test hire, ten minutes, the whole token budget gone — and nothing shipped. It isn't the model. It's that you handed a 27-agent workforce no goals and no routines, so they wake up, read the entire world, find nothing crisp to do, and bill you for the privilege. Here's how I configure goals against shippable products, routines that actualize real work, and a GitHub Copilot CLI local adapter — and why the architect's job didn't disappear.
The done gate: catching agents that lie about finishing
The most expensive failure in agent systems isn't a crash — it's an agent that says 'done' because saying so is easier than being done. Here's the verifiable gate that took false completions from 17% to zero.
I Built a Framework So Disciplined I Couldn't Use It
I shipped a governance framework for AI agents, then failed its own adoption test — no uninstall, no way to list its skills, no way to know if it had drifted. Here's the sprint that fixed it, and the four patterns you can steal whether or not you ever touch my repo.
Inside My AI Operating System, Part II: The Console, the Leash, and the Memory It Keeps
My 3D AI office lied to me, and the afternoon I lost to it taught me more about governing agents than any amount of infrastructure did. Part II of the AI OS deep dive: telling a dashboard from a trigger, a leash on autonomy you can actually verify, and giving memory tiers.
Inside My AI Operating System: The Architecture Running My Agents 24/7
A technical deep dive into the always-on agent stack that runs my work: a Hermes kernel on my Mac, a Paperclip workforce on a VPS, one Obsidian vault as shared memory, MCP as the syscall layer — and a Tailscale mesh holding two machines together with no open ports.
My New Operating System: Hermes + Paperclip + Obsidian + MCP
I stopped thinking of my AI tools as separate apps and started running them like an operating system. Hermes is the always-on kernel, Paperclip is the agent workforce, a Jarvis wake-word loop is the microphone, one Obsidian vault is shared memory for every runtime, and MCP is the syscall layer.
Killed at 2am, resumed at 2:01: externalising agent run state
A power blip took out an eight-hour fleet run four hours in. It should have cost four hours of work and a fortune in tokens. It cost 90 seconds — because the state lived outside the process. Here's the checkpoint model that made it boring.
The red thread problem: how skills, agents and governance rescue TOGAF traceability in agentic delivery
Agentic delivery generates plausible artifacts at every architecture layer with no enforced lineage. Here's how to keep the TOGAF red thread unbroken when agents are doing the work.
The latest evolution of skills.md isn't a better file — it's the runtime catching up to the prompt
Persona libraries, self-improving runtimes, and behavioural governance are three layers of the same stack. The frontier is making them work together.
Spec-Kit Best Practices Through a TOGAF Lens: An Architect's Playbook
Spec-Kit gives AI agents a disciplined workflow. TOGAF gives the enterprise a disciplined architecture. Map them together and you get governed, AI-native delivery.
Your AI agents are untrained. The bottleneck was never capability.
We keep waiting for smarter models. But the agents we already have fail for the same reasons junior engineers do — no plan, no proof, no memory. Capability isn't the constraint. Discipline is.
Why I made the pipeline mandatory — and the agents got better
Conventional wisdom says you give a capable agent room to work. I did the opposite: a fixed, non-negotiable workflow from brainstorm to finish. Constraint didn't slow the agents down. It's what made them trustworthy.
Teaching agents to learn from losing
Most agent setups make the same mistake twice — or twenty times. The most valuable thing I built into the dojo wasn't a skill. It was a loop that turns every correction into a rule the agent can't forget.
I Built a Full SaaS App in One Session with GitHub Copilot: Here's What Happened
How I transformed a Next.js landing page into a full serverless SaaS with Document Intelligence and Chat Your Data — in a single Copilot session.
Claude vs GPT in the Enterprise: An Honest Comparison from the Field
A practitioner's honest comparison of Claude and GPT models in enterprise settings — strengths, trade-offs, and when to use which.
Azure AI Foundry in Production: Patterns That Actually Work
Practical patterns for deploying AI models in production using Azure AI Foundry — from model selection to cost optimization.
AI-Native Delivery: Why Traditional Software Delivery Fails with AI Agents
Agile, Scrum, and waterfall weren't designed for AI-assisted development. We need an AI-native delivery methodology.
The Copilot Agents Dojo: A Behavioral Governance Framework for AI Coding Agents
Most organisations let AI agents loose with prompts and hope for the best. That's not an operating model — that's a risk. The Dojo changes that.
Stop Prompting, Start Architecting: Governing AI Agents at Scale
If your AI coding strategy still relies on prompts, you're leaving leverage on the table. Here's how top teams govern AI agent behavior at the repo level.