The latest evolution of skills.md isn't a better file — it's the runtime catching up to the prompt

Persona libraries, self-improving runtimes, and behavioural governance are three layers of the same stack. The frontier is making them work together.

For most of the last two years, “skills.md” meant one thing: a markdown file that told an AI agent who to be. A frontmatter block, a persona, a list of dos and don’ts, maybe a few code examples. It was static. You wrote it once, the agent read it on load, and that was the whole relationship. The file was a costume the model put on.

If you look at where the open-source ecosystem has actually moved, the interesting development isn’t a richer file format. It’s that the runtime has finally caught up to the prompt — and in doing so, it has split the old monolithic skills.md into three distinct layers. Three patterns from the wild make this unusually clear, and the trap is to read them as competitors. They aren’t. They sit at three different levels of the same stack.

Layer one: identity — who the agent is

The persona layer, at maximum breadth, looks like a library of specialists. A hundred-plus agent definitions across a dozen domains, each one a markdown file carrying personality, deliverables, and success metrics. The genius isn’t any single agent — it’s the distribution. The same persona definition gets compiled to run across Claude Code, Copilot, Cursor, Aider, Windsurf, Gemini CLI and more. One definition, many runtimes.

But notice the verb these systems use: you activate an agent. “Activate Frontend Developer mode.” It’s still a costume — a beautifully tailored one, with an enormous wardrobe — but the model puts it on and takes it off. There’s no memory of yesterday’s outfit, and no improvement of the costume from having worn it. This is skills-as-identity. It tells the agent who to be.

It scales breadth. It does not, on its own, give an agent a spine. A Frontend Developer persona will still happily skip the tests if the prompt drifts that way.

Layer two: memory — what the agent learned

The runtime layer is the part that genuinely postdates the old skills.md mental model, and it’s the most important shift. The newest self-improving agents — Nous Research’s Hermes is the clearest example — invert the relationship entirely. The skill file is no longer an input the human authors. It’s an output the agent generates.

These runtimes create skills autonomously after complex tasks, improve them during use, nudge themselves to persist knowledge, and build a deepening model of the user across sessions. The whole thing lives on infrastructure that isn’t your laptop — a cheap VPS, a GPU cluster, or serverless backends that hibernate when idle — and you talk to it from Telegram or Discord. Crucially, the skills it grows align with emerging open standards, so they aren’t locked in.

This is skills-as-procedural-memory. It tells the agent what it learned to do.

It scales depth. But by design, those skills are emergent and self-curated — wonderful for capability, slightly terrifying for discipline. A self-improving loop with no gates is a very fast way to learn bad habits confidently.

Layer three: discipline — how the agent behaves

This is the layer that falls through the crack between a persona file and a self-improving runtime, and it’s the one I’ve spent the most time on with copilot-agents-dojo.

The identity layer gives an agent a voice but no spine. The memory layer gives an agent the ability to grow but no guarantee of what it grows toward. Neither answers the question that actually matters in a production environment: how should the agent behave regardless of who it’s being or what it’s learned?

That’s what behavioural governance is for. The skills.md and instructions that make Copilot agents reason like senior engineers. Mandatory gate pipelines instead of passive transcription. Structural decisions like decomposing a single “Business Analyst” role across separate TPM and Architect agents with a shared requirements-elicitation skill — so one agent never quietly does two jobs badly. Force the handoff, make the gate explicit, demand the artifact.

This is discipline as a first-class design constraint, not a personality trait you hope sticks. It tells the agent how to operate.

The synthesis: governed self-improvement

Put the three layers side by side and the “latest evolution” of skills.md names itself:

Identity answers who — persona, voice, and domain expertise, written once and portable across every runtime.
Memory answers what-I-learned — skills the agent authors and refines from its own experience, persisted across sessions on always-on infrastructure.
Discipline answers how — the non-negotiable behavioural governance, the gates, the senior-engineer posture that holds no matter which persona is loaded or which skill was learned.

The marriage worth building is specific. The runtime provides the loop. The persona libraries seed the breadth. The governance layer supplies the constitution that keeps the loop honest. Today a self-improving agent can grow a skill that’s clever but undisciplined. A fixed governance framework can’t grow at all. Put the gates inside the learning loop and you get the thing nobody has shipped cleanly yet: governed self-improvement.

Every emergent skill the agent writes gets auto-subjected to the same questions before it’s allowed to persist into procedural memory: Did this pass the gate? Did it produce the artifact? Did it reason like a senior engineer? Only then does it count.

That’s the frontier, and it’s an open lane. The first generation of skills.md told agents who to be. The current generation lets them learn what to do. The work that’s left — the part that turns impressive demos into something an enterprise can actually hand a production codebase — is making sure that what they learn is disciplined enough to trust.

Personas scale breadth. Runtimes scale depth. Governance is what makes either one safe to deploy.