Inside My AI Operating System: The Architecture Running My Agents 24/7
A technical deep dive into the always-on agent stack that runs my work: a Hermes kernel on my Mac, a Paperclip workforce on a VPS, one Obsidian vault as shared memory, MCP as the syscall layer — and a Tailscale mesh holding two machines together with no open ports.
I wrote before about treating my AI tools as an operating system instead of a pile of apps: a kernel, a workforce, shared memory, a syscall layer. That post was the concept.
This one is the deep dive. I’m going to open the case and show you the wiring — the ports, the systemd units, the mesh network, the auth tokens, the exact folder layout — because the interesting part of any operating system isn’t the block diagram. It’s what happens when you make it run on real hardware, 24/7, and the abstractions have to survive contact with closed laptops, locked-down networks, and headless servers.
Here’s the whole thing in one breath: a Hermes kernel and a Jarvis voice loop live on my Mac; a Paperclip agent workforce runs always-on inside a VPS; both halves share one Obsidian vault and speak MCP; and a Tailscale mesh stitches the two machines into a single private network with zero open ports. Now let’s go under the hood.
The Two-Machine Topology
The first non-obvious decision: which pieces run where. The instinct is to put “the brain” in the cloud. The right answer was the opposite — the deciding question isn’t where should the brain live, it’s what has to keep running while the laptop is shut, and what physically can’t leave the laptop.
┌──────────── MAC (control plane) ──────────────┐
│ HERMES gateway — kernel, cron, Telegram │
│ PAPERCLIP :3100 — local board + 2 agents │
│ JARVIS — wake-word mic loop │
│ Apple / vault MCP — bound to this machine │
└───────────────────────────────────────────────┘
▲ Tailscale mesh (WireGuard, no open ports)
▼
┌──────────── VPS (always-on executor) ─────────┐
│ PAPERCLIP :3101 — systemd, 2 live agents │
│ PRReviewer · HarvestingEngineer │
│ headless Copilot CLI (token auth, no browser) │
└───────────────────────────────────────────────┘
- Hermes — the kernel — stays on the Mac. It’s light (a cron scheduler, a chat gateway, a memory curator), and it’s married to things that only exist locally: the Obsidian vault and the Apple integrations. Moving it would mean replicating both to a server for almost no benefit.
- Paperclip — the workforce — is what I actually want online around the clock. It runs autonomous GitHub Copilot CLI agents that review pull requests and harvest content against a public repo. That’s the work that should grind away while I sleep.
The kernel lives next to its memory; the workforce lives where uptime is free.
Paperclip, Precisely: Two Instances and a Split-Brain Lock
This is the piece people misread, so let me be exact. Paperclip is an agent runtime, not a note app. It runs GitHub Copilot CLI agents under ACP (Agent Client Protocol), fronted by a local HTTP API, organised around issues, projects, and goals — a tiny autonomous engineering org you can watch in a browser.
I wanted agents executing 24/7 on the VPS without giving up the local board I use to supervise them. So there are two Paperclip instances:
- Mac, on
:3100— my control plane. It’s the board I open in a browser. - VPS, on
:3101— a systemd-managed executor, a cloned database with identical company and agent IDs.
Two instances pointed at the same GitHub repo is a recipe for double-stamped PR reviews. The guard is deliberately simple and exploits how Paperclip schedules work: an agent only executes when it’s active, so I keep each agent active on exactly one machine and paused on the other. PRReviewer and HarvestingEngineer run on the VPS and are paused on the Mac; the rest are active on the Mac and paused on the VPS. A paused agent won’t pick up work even when its schedule fires — that’s the lock.
The cutover order matters, and it’s the kind of thing you only catch by thinking it through before you flip switches: pause the active side first, let it drain, then resume the other side. Reverse that and there’s a window where both are live and racing for the same issues.
Tailscale: Networking That Respects the Landing Zone
The original plan was unglamorous — open SSH to the VPS, run a reverse tunnel so the cloud executor can reach the Mac’s board on :3100. It worked for roughly a day at a time, then died. Repeatedly.
The cause was instructive, and the policy was right. The VPS sits in an Azure Enterprise-Scale Landing Zone, where governance periodically reconciles the network security group back to its secure baseline and removes any public inbound rule — including the SSH rule I’d added. That baseline is deny-inbound by design: a public listening port on a server is exactly the kind of attack surface the landing zone exists to eliminate. My inbound rule was the anomaly. The platform was doing its job; I was the one working against it.
So I stopped fighting the baseline and adopted an architecture it’s happy with — zero public inbound. Tailscale builds an outbound-only WireGuard mesh: both machines dial out to a coordination server and establish a direct, end-to-end encrypted tunnel between themselves. Nothing listens on a public interface, so there’s nothing for the landing zone to revoke — and nothing for the internet to scan. It satisfies the deny-inbound policy and buys real security-through-obscurity: the boards simply aren’t reachable from anywhere off the tailnet.
mac-laptop 100.x.x.x macOS
cloud-vps 100.x.x.y linux (direct, ~9 ms)
Everything now rides the tailnet on stable 100.x addresses:
- A reverse tunnel exposes the Mac’s Paperclip
:3100to the VPS, so cloud agents can reach the local board. - A forward tunnel brings the VPS board back to
localhost:3101on the Mac, so I open both boards in a browser as if they were one machine.
This was the single biggest reliability upgrade in the entire system — and it got there by aligning with the policy instead of resisting it. The baseline can reconcile the NSG all day; an outbound mesh exposes nothing for it to close.
Headless Auth: The Real Tax on “Always-On”
Here’s the detail that separates a demo from a system that survives a reboot: GitHub Copilot CLI authenticates through a browser device-code flow, and a headless VPS has no browser.
The fix took some reading of the CLI’s own behaviour. It checks for an environment token — COPILOT_GITHUB_TOKEN — before it ever falls back to the device flow. So the durable wiring is:
- Mint a token on the Mac; write it to a
0600env file on the VPS (~/.paperclip/copilot-token.env). - Reference that file from the systemd unit with a drop-in:
# /etc/systemd/system/paperclip.service.d/copilot-token.conf
[Service]
EnvironmentFile=/home/<user>/.paperclip/copilot-token.env
- Every agent process the service spawns inherits the token. No browser, ever.
It survives reboots and restarts, and the agents authenticate silently. The same trick — injecting a secret through a systemd EnvironmentFile rather than a login flow — is how the whole VPS half stays hands-off.
Obsidian: One Vault, Three Runtimes, a Real Taxonomy
The keystone is that there is exactly one Obsidian vault — wasita brain — and three different runtimes write to it: Hermes, Paperclip’s Copilot agents, and Copilot CLI when I drive it directly. It’s plain Markdown on disk, which is the entire point: no integration glue, just files every agent can read and write.
But “just files” undersells it. The vault is structured so each top-level folder is a memory category an agent can target deterministically:
wasita brain/
├─ Inbox/ capture-first, triaged weekly
├─ Daily/ one note per day (Hermes cron writes these)
├─ Sources/Repos/ ← symlinks to every project repo on disk
├─ Projects/ hub + notes per engagement
├─ Decisions/ ADR-light decision memory
├─ Knowledge/lessons/ one note per non-obvious bug + fix
├─ Playbooks/patterns/ reusable patterns
├─ MOCs/ maps of content
└─ Archive/
Two mechanisms make this more than a folder tree:
- The project symlinks are real. Under
Sources/Repos/, six symlinks point at actual repos on disk (AI-Lawyer,AI-Delivery-Methodology, the agents dojo, this very site, and more). Editing a file through the vault is editing the source — the vault and the codebase are the same bytes. - The tag taxonomy is enforced, not decorative. A strict root
AGENTS.md(≈130 lines) makes every runtime follow the same conventions — wikilinks over markdown links, frontmatter on every note, the right memory category for the right thought — and a hierarchical tag namespace that keeps the graph legible across minds:
#agent/hermes #agent/paperclip #agent/copilot-cli
#pattern/<domain> #lesson/<topic> #project/<slug>
That #agent/* namespace is my favourite detail: I can open a memory note and see which of my agents learned it. Three different runtimes contribute to one brain, and the tags tell you who taught it what.
MCP: The Syscall Layer (and What Actually Runs)
The last layer is what stops these runtimes from being islands: the Model Context Protocol. Every runtime speaks MCP, so adding one server gives all of them the capability at once — no per-integration glue. That’s precisely a syscall layer: programs don’t ship their own disk drivers; they make a standard call.
In the interest of a real deep dive, here’s what’s actually loaded — a local, Apple-leaning stack rather than the generic one I’d sketched before:
memory— a persistent key/value memory serverfetch— HTTP fetchingtime— clock/timezone primitives- Apple MCP + apple-bridge — calendar, mail, notes, and macOS automation
- vault-rag — semantic search over the Obsidian brain
- Playwright — a real browser for agents that need to click
Drop in one new server and Hermes, Paperclip’s agents, and Copilot CLI all inherit it simultaneously. Composability, uniformity, and locality — the three properties that make a syscall layer worth having.
Jarvis: Voice as Just Another Front-End
The piece that makes it feel like science fiction is Jarvis, a wake-word loop running as an always-on macOS LaunchAgent. The pipeline is deliberately humble and entirely local until the routing step:
mic
→ openWakeWord detects "hey jarvis"
→ RMS voice-activity detection captures the utterance
→ faster-whisper (base.en, int8 on CPU) transcribes it
→ a small model routes the transcript → { agent, message, intent }
→ intent decides: speak the roster · report an agent's status ·
or send the message to that agent's Inbox issue (resume=true)
→ edge-tts speaks the reply back
The clever bit isn’t the transcription — it’s that the router reuses the exact same “Inbox issue + resume=true” mechanism as the Telegram path. Voice isn’t a separate system; it’s another input device dispatching jobs to the same Paperclip workforce. “Hey Jarvis, ask the site agent how the build went” — and a coding agent on a VPS goes and checks while I keep cooking.
A Request’s Journey, End to End
To make the layering concrete, follow one instruction through the stack:
- I say “Hey Jarvis, ask the PR reviewer to look at the open pull requests.”
- Jarvis transcribes locally and routes it to a structured
{agent: "PRReviewer", intent: "dispatch"}. - The dispatch lands as a comment on PRReviewer’s persistent Inbox issue with
resume=true— but on the Mac instance, where PRReviewer is paused. The split-brain lock means nothing executes here. - The VPS instance, where PRReviewer is active, picks the work up. Its Copilot CLI agent authenticates with the injected
COPILOT_GITHUB_TOKEN, clones/pulls over the tailnet, reviews the PRs, and posts back. - Whatever it learned gets written to
Knowledge/lessons/orPlaybooks/patterns/, tagged#agent/paperclip, into the one vault every runtime can read. - I see the result on the Mac board at
localhost:3101— the VPS executor, forwarded home over Tailscale.
One sentence of voice, and the work crossed a voice model, a chat protocol, a mesh network, a headless auth boundary, and a shared filesystem — then came home. That traversal is the operating system.
What I’d Tell You to Steal
- Put the always-on workload in the cloud, not the brain. Keep the kernel next to its memory and hardware; ship the workforce to the server. The thing that must run at 3 a.m. is rarely the thing that’s hardest to host.
- Work with the landing zone, not against it. If governance enforces a deny-inbound baseline, don’t bolt a public port back on — adopt an outbound-only mesh that complies with the policy and shrinks your attack surface. That alignment turned my flakiest dependency into my most reliable one.
- Inject secrets, don’t log in. Anything that wants a browser flow will fight you on a server. Find the env-token escape hatch and wire it through systemd once.
- Make memory a database, not a diary. Deterministic folders plus an enforced tag taxonomy are what let three different agents share one brain without it turning to mush.
The system still comes down to the same two ideas the first post made: one shared memory, one shared bus. Everything above is just the engineering required to make those two ideas survive two machines, a locked-down landing zone, and a server with no screen — and keep running while I’m asleep.