← back to writing
#AI Agents · #MCP · #Obsidian · #Hermes · #Productivity

My New Operating System: Hermes + Paperclip + Obsidian + MCP

I stopped thinking of my AI tools as separate apps and started running them like an operating system. Hermes is the always-on kernel, Paperclip is the agent workforce, a Jarvis wake-word loop is the microphone, one Obsidian vault is shared memory for every runtime, and MCP is the syscall layer.

For years my AI tools were a pile of disconnected apps. A chatbot in one tab. A note-taker in another. A terminal agent that knew nothing about either. Each was good at its one job and completely ignorant of the others.

That’s not how an operating system works. An OS has a kernel that schedules work, processes that do the work, a filesystem that remembers things, and a syscall layer so every process can reach the same hardware. So I stopped collecting apps and built a system — one where the parts genuinely share state.

Four pieces make it run: Hermes as the always-on kernel, Paperclip as the agent workforce, one Obsidian vault as shared memory, and MCP as the syscall layer that gives every agent the same capabilities. And lately a fifth: Jarvis, a wake-word voice loop that turns my voice into the microphone for the whole thing.

Here’s the part that took me a while to get right: the magic isn’t any single tool. It’s that three different agent runtimes write to the same memory and speak the same protocol.

The Architecture

          INPUT MODALITIES
  Voice "Hey Jarvis"  ·  Telegram bridge
  Discord / Slack / Signal  ·  Copilot CLI


  ┌──── RUNTIMES ─────────────────────┐
  │ HERMES — always-on kernel (VPS)   │
  │   cron · self-improving loop      │
  │ PAPERCLIP — agent workforce       │
  │   Copilot CLI under ACP · :3100   │
  └───────────────────────────────────┘
                  │  every runtime speaks MCP

  ┌──── MCP — syscall layer ──────────┐
  │ playwright · ms-365 · blender ·   │
  │ memory · …  (same tools for all)  │
  └───────────────────────────────────┘
                  │  read / write

  ┌──── SHARED MEMORY ────────────────┐
  │ OBSIDIAN VAULT — "wasita brain"   │
  │ Daily/ · Inbox/ · Memory/         │
  │ <symlinks to every project>       │
  └───────────────────────────────────┘

Inputs at the top; the runtimes in the middle; one shared syscall layer and one shared memory beneath them. Read it top to bottom — that shape is the operating system.

Hermes: The Always-On Kernel

Hermes, from Nous Research, is the kernel. It’s the only piece that’s always running, and crucially it doesn’t run on my laptop — it lives on a cheap VPS and I talk to it from Telegram, Discord, Slack, WhatsApp, or Signal. Close the laptop; the kernel keeps working.

What makes it a kernel and not just a chatbot is the closed learning loop. Hermes creates skills from experience, improves them during use, curates its own memory with periodic nudges, and searches its own past conversations with full-text recall. A built-in cron scheduler runs unattended automations — every evening it writes a one-line summary to Daily/YYYY-MM-DD.md in the vault without me asking.

The discipline that ties it into the system: Hermes never keeps important state in its head. Anything worth remembering gets written to the vault. Which means another runtime can read it five minutes — or five months — later.

Paperclip: The Agent Workforce

This is the piece I always mis-explain, so let me be precise: Paperclip is not a capture tool. It’s an agent runtime. It runs GitHub Copilot CLI agents under ACP, exposed over a local HTTP API on :3100, organised around issues, projects, and goals — like a tiny autonomous engineering org living on my machine.

If Hermes is the scheduler, Paperclip is the pool of worker processes that actually grind through multi-step jobs.

The clever bit is how I reach it. A small bridge.py long-polls Telegram and routes messages to Paperclip agents:

Telegram message
   → comment on a persistent "📱 Telegram Inbox" issue (resume=true)
   → re-wakes the assigned Copilot CLI agent
   → agent runs the job, writes results to the vault
   → final summary posted back to the Telegram chat

So from my phone, a one-line message spins up a real coding agent, which does the work, records what it learned, and reports back — all while I’m doing something else.

Jarvis: The Microphone

The newest piece, and the one that makes the whole thing feel like science fiction: a wake-word voice loop I call Jarvis. It runs as an always-on macOS LaunchAgent, so it’s listening the moment I log in.

The pipeline is deliberately humble and entirely local until the routing step:

mic
  → openWakeWord detects "hey jarvis"
  → RMS voice-activity detection captures the utterance
  → faster-whisper (base.en, int8 on CPU) transcribes it
  → Opus routes the transcript → { agent, message, intent }
  → intent decides: speak the roster · report an agent's status ·
    or send the message to that agent's Inbox issue (resume=true)
  → edge-tts speaks the reply back through the speakers

The routing brain is the clever bit. It isn’t keyword matching — a small Opus prompt turns “ask the site agent how the new blog post build went” into a structured {agent: "...", intent: "status"} and either reads back the agent’s latest run summary or wakes it with a new instruction. It reuses the exact same “Inbox issue + resume=true” mechanism as the Telegram bridge, which means voice is just another front-end to the same Paperclip workforce.

In OS terms, Jarvis is the microphone driver. It doesn’t do the work — it’s the input device that lets me dispatch jobs to the kernel and the workforce without touching a keyboard. “Hey Jarvis, ask the research agent to summarise today’s inbox” — and a coding agent on a VPS goes and does it while I keep cooking.

The engineering details I care about: the audio callback only ever enqueues frames (heavy work stays off the realtime thread), it drains and resets after speaking so it never wakes itself, and it self-heals when my AirPods drop. Boring reliability work — exactly what a good driver should be.

Obsidian: Shared Memory for Every Runtime

Here’s the keystone. There is exactly one Obsidian vault — wasita brain — and all three runtimes write to it: Hermes, Paperclip’s Copilot agents, and Copilot CLI when I drive it directly.

It’s plain Markdown on disk, which is the whole point: because it’s just files, every agent can read and write it with no integration glue. The vault is organised less like a notebook and more like a database the agents query:

A root AGENTS.md governs the rules every agent must follow — wikilinks over markdown links, frontmatter on every note, and a tag taxonomy that keeps the graph useful across runtimes:

#agent/hermes   #agent/paperclip   #agent/copilot-cli
#pattern/<domain>   #lesson/<topic>   #project/<slug>

That #agent/* namespace is my favourite detail. I can look at a memory note and see which of my agents learned it. The cemetery of forgotten notes became a shared brain three different minds contribute to.

MCP: The Syscall Layer

The last piece is what stops these from being three islands: the Model Context Protocol.

Every runtime speaks MCP, so every runtime gets the same capabilities — a browser via the Playwright server, my calendar and email via the Microsoft 365 server, a persistent memory server, even Blender. Add one new MCP server and all three agents can use it immediately, with zero per-integration glue.

That’s exactly what a syscall layer does for an OS: programs don’t ship their own disk drivers, they make a standard call and the kernel handles the hardware. My agents don’t each reinvent “use a browser” — they make the same MCP call.

A Day in the System

  1. Morning. Hermes’ cron has already drafted today’s Daily/ note and pulled my calendar over the MS-365 MCP server. I skim it on my phone.
  2. Hands full. “Hey Jarvis, ask the site agent to publish the draft.” Whisper transcribes, Opus routes it, a Paperclip agent wakes and runs — I never touch the keyboard.
  3. On the move. I fire a task into Telegram. The bridge wakes a Paperclip Copilot agent, which works the issue and writes its output into the vault.
  4. At my desk. I open Copilot CLI directly in the vault. AGENTS.md auto-loads, so it already knows the conventions — and it can read what the voice and Telegram agents just produced.
  5. Stuck on something. I ask any of the three. It searches Memory/, finds a lesson/ note another runtime wrote weeks ago, and hands it back.
  6. Evening. Hermes sweeps the day, distils anything reusable into Memory/patterns/, and tags it #agent/hermes.

No app-switching marathon. No copy-paste tax. The processes hand work to each other through shared memory.

Why This Works When My Old Setup Didn’t

Every “second brain” I’d built before failed for the same reason: memory was separate from action. I filed beautiful notes no agent could act on, and ran agents that couldn’t see my notes.

Treating it as an operating system closes that seam:

The whole is greater than its parts precisely because the parts share a substrate — files — and a bus — MCP. That is what an operating system is: shared memory and shared syscalls.

The Caveats

The Takeaway

Stop thinking about AI tools as apps and start thinking about them as an operating system. Pick an always-on kernel (Hermes). Give it a workforce that actually executes (Paperclip). Add a microphone so you can dispatch jobs by voice (Jarvis). Point every runtime at one memory you own (Obsidian). Wire them with a real syscall layer (MCP).

The result isn’t a better app. It’s a better substrate — where three different agents capture, remember, and act through the same shared brain, without any of it ever leaving the system.