← back to writing
#AI Agents · #MCP · #Obsidian · #Hermes · #Tailscale

Inside My AI Operating System: The Architecture Running My Agents 24/7

A technical deep dive into the always-on agent stack that runs my work: a Hermes kernel on my Mac, a Paperclip workforce on a VPS, one Obsidian vault as shared memory, MCP as the syscall layer — and a Tailscale mesh holding two machines together with no open ports.

I wrote before about treating my AI tools as an operating system instead of a pile of apps: a kernel, a workforce, shared memory, a syscall layer. That post was the concept.

This one is the deep dive. I’m going to open the case and show you the wiring — the ports, the systemd units, the mesh network, the auth tokens, the exact folder layout — because the interesting part of any operating system isn’t the block diagram. It’s what happens when you make it run on real hardware, 24/7, and the abstractions have to survive contact with closed laptops, locked-down networks, and headless servers.

Here’s the whole thing in one breath: a Hermes kernel and a Jarvis voice loop live on my Mac; a Paperclip agent workforce runs always-on inside a VPS; both halves share one Obsidian vault and speak MCP; and a Tailscale mesh stitches the two machines into a single private network with zero open ports. Now let’s go under the hood.

The Two-Machine Topology

The first non-obvious decision: which pieces run where. The instinct is to put “the brain” in the cloud. The right answer was the opposite — the deciding question isn’t where should the brain live, it’s what has to keep running while the laptop is shut, and what physically can’t leave the laptop.

  ┌──────────── MAC (control plane) ──────────────┐
  │ HERMES gateway  — kernel, cron, Telegram      │
  │ PAPERCLIP :3100 — local board + 2 agents      │
  │ JARVIS          — wake-word mic loop          │
  │ Apple / vault MCP — bound to this machine     │
  └───────────────────────────────────────────────┘
                  ▲  Tailscale mesh (WireGuard, no open ports)

  ┌──────────── VPS (always-on executor) ─────────┐
  │ PAPERCLIP :3101 — systemd, 2 live agents      │
  │   PRReviewer · HarvestingEngineer             │
  │ headless Copilot CLI (token auth, no browser) │
  └───────────────────────────────────────────────┘

The kernel lives next to its memory; the workforce lives where uptime is free.

Paperclip, Precisely: Two Instances and a Split-Brain Lock

This is the piece people misread, so let me be exact. Paperclip is an agent runtime, not a note app. It runs GitHub Copilot CLI agents under ACP (Agent Client Protocol), fronted by a local HTTP API, organised around issues, projects, and goals — a tiny autonomous engineering org you can watch in a browser.

I wanted agents executing 24/7 on the VPS without giving up the local board I use to supervise them. So there are two Paperclip instances:

Two instances pointed at the same GitHub repo is a recipe for double-stamped PR reviews. The guard is deliberately simple and exploits how Paperclip schedules work: an agent only executes when it’s active, so I keep each agent active on exactly one machine and paused on the other. PRReviewer and HarvestingEngineer run on the VPS and are paused on the Mac; the rest are active on the Mac and paused on the VPS. A paused agent won’t pick up work even when its schedule fires — that’s the lock.

The cutover order matters, and it’s the kind of thing you only catch by thinking it through before you flip switches: pause the active side first, let it drain, then resume the other side. Reverse that and there’s a window where both are live and racing for the same issues.

Tailscale: Networking That Respects the Landing Zone

The original plan was unglamorous — open SSH to the VPS, run a reverse tunnel so the cloud executor can reach the Mac’s board on :3100. It worked for roughly a day at a time, then died. Repeatedly.

The cause was instructive, and the policy was right. The VPS sits in an Azure Enterprise-Scale Landing Zone, where governance periodically reconciles the network security group back to its secure baseline and removes any public inbound rule — including the SSH rule I’d added. That baseline is deny-inbound by design: a public listening port on a server is exactly the kind of attack surface the landing zone exists to eliminate. My inbound rule was the anomaly. The platform was doing its job; I was the one working against it.

So I stopped fighting the baseline and adopted an architecture it’s happy with — zero public inbound. Tailscale builds an outbound-only WireGuard mesh: both machines dial out to a coordination server and establish a direct, end-to-end encrypted tunnel between themselves. Nothing listens on a public interface, so there’s nothing for the landing zone to revoke — and nothing for the internet to scan. It satisfies the deny-inbound policy and buys real security-through-obscurity: the boards simply aren’t reachable from anywhere off the tailnet.

mac-laptop   100.x.x.x     macOS
cloud-vps    100.x.x.y     linux  (direct, ~9 ms)

Everything now rides the tailnet on stable 100.x addresses:

This was the single biggest reliability upgrade in the entire system — and it got there by aligning with the policy instead of resisting it. The baseline can reconcile the NSG all day; an outbound mesh exposes nothing for it to close.

Headless Auth: The Real Tax on “Always-On”

Here’s the detail that separates a demo from a system that survives a reboot: GitHub Copilot CLI authenticates through a browser device-code flow, and a headless VPS has no browser.

The fix took some reading of the CLI’s own behaviour. It checks for an environment token — COPILOT_GITHUB_TOKENbefore it ever falls back to the device flow. So the durable wiring is:

  1. Mint a token on the Mac; write it to a 0600 env file on the VPS (~/.paperclip/copilot-token.env).
  2. Reference that file from the systemd unit with a drop-in:
# /etc/systemd/system/paperclip.service.d/copilot-token.conf
[Service]
EnvironmentFile=/home/<user>/.paperclip/copilot-token.env
  1. Every agent process the service spawns inherits the token. No browser, ever.

It survives reboots and restarts, and the agents authenticate silently. The same trick — injecting a secret through a systemd EnvironmentFile rather than a login flow — is how the whole VPS half stays hands-off.

Obsidian: One Vault, Three Runtimes, a Real Taxonomy

The keystone is that there is exactly one Obsidian vault — wasita brain — and three different runtimes write to it: Hermes, Paperclip’s Copilot agents, and Copilot CLI when I drive it directly. It’s plain Markdown on disk, which is the entire point: no integration glue, just files every agent can read and write.

But “just files” undersells it. The vault is structured so each top-level folder is a memory category an agent can target deterministically:

wasita brain/
├─ Inbox/        capture-first, triaged weekly
├─ Daily/        one note per day (Hermes cron writes these)
├─ Sources/Repos/  ← symlinks to every project repo on disk
├─ Projects/     hub + notes per engagement
├─ Decisions/    ADR-light decision memory
├─ Knowledge/lessons/   one note per non-obvious bug + fix
├─ Playbooks/patterns/  reusable patterns
├─ MOCs/         maps of content
└─ Archive/

Two mechanisms make this more than a folder tree:

#agent/hermes   #agent/paperclip   #agent/copilot-cli
#pattern/<domain>   #lesson/<topic>   #project/<slug>

That #agent/* namespace is my favourite detail: I can open a memory note and see which of my agents learned it. Three different runtimes contribute to one brain, and the tags tell you who taught it what.

MCP: The Syscall Layer (and What Actually Runs)

The last layer is what stops these runtimes from being islands: the Model Context Protocol. Every runtime speaks MCP, so adding one server gives all of them the capability at once — no per-integration glue. That’s precisely a syscall layer: programs don’t ship their own disk drivers; they make a standard call.

In the interest of a real deep dive, here’s what’s actually loaded — a local, Apple-leaning stack rather than the generic one I’d sketched before:

Drop in one new server and Hermes, Paperclip’s agents, and Copilot CLI all inherit it simultaneously. Composability, uniformity, and locality — the three properties that make a syscall layer worth having.

Jarvis: Voice as Just Another Front-End

The piece that makes it feel like science fiction is Jarvis, a wake-word loop running as an always-on macOS LaunchAgent. The pipeline is deliberately humble and entirely local until the routing step:

mic
  → openWakeWord detects "hey jarvis"
  → RMS voice-activity detection captures the utterance
  → faster-whisper (base.en, int8 on CPU) transcribes it
  → a small model routes the transcript → { agent, message, intent }
  → intent decides: speak the roster · report an agent's status ·
    or send the message to that agent's Inbox issue (resume=true)
  → edge-tts speaks the reply back

The clever bit isn’t the transcription — it’s that the router reuses the exact same “Inbox issue + resume=true” mechanism as the Telegram path. Voice isn’t a separate system; it’s another input device dispatching jobs to the same Paperclip workforce. “Hey Jarvis, ask the site agent how the build went” — and a coding agent on a VPS goes and checks while I keep cooking.

A Request’s Journey, End to End

To make the layering concrete, follow one instruction through the stack:

  1. I say “Hey Jarvis, ask the PR reviewer to look at the open pull requests.”
  2. Jarvis transcribes locally and routes it to a structured {agent: "PRReviewer", intent: "dispatch"}.
  3. The dispatch lands as a comment on PRReviewer’s persistent Inbox issue with resume=true — but on the Mac instance, where PRReviewer is paused. The split-brain lock means nothing executes here.
  4. The VPS instance, where PRReviewer is active, picks the work up. Its Copilot CLI agent authenticates with the injected COPILOT_GITHUB_TOKEN, clones/pulls over the tailnet, reviews the PRs, and posts back.
  5. Whatever it learned gets written to Knowledge/lessons/ or Playbooks/patterns/, tagged #agent/paperclip, into the one vault every runtime can read.
  6. I see the result on the Mac board at localhost:3101 — the VPS executor, forwarded home over Tailscale.

One sentence of voice, and the work crossed a voice model, a chat protocol, a mesh network, a headless auth boundary, and a shared filesystem — then came home. That traversal is the operating system.

What I’d Tell You to Steal

The system still comes down to the same two ideas the first post made: one shared memory, one shared bus. Everything above is just the engineering required to make those two ideas survive two machines, a locked-down landing zone, and a server with no screen — and keep running while I’m asleep.