The 5 Pillars of Agentic AI, Part 2: Memory — Why Agents Need to Forget as Much as They Remember

Your agent just recommended a cheese plate to the customer who told it, last week, that they're lactose intolerant. It didn't lie — it forgot. Studying how MemoryBear and Microsoft Foundry build real memory, the same uncomfortable truth shows up: the hard part isn't remembering. It's forgetting.

Your support agent just told a customer to enjoy the cheese plate. That customer is lactose intolerant — and told the agent so, last week.

It didn’t lie. It forgot. And an agent that forgets the one thing that mattered is worse than no agent at all, because you trusted this one.

Welcome to the pillar nobody demos and everybody needs. The demo always shows off reasoning, tools, a slick reply — nobody opens with “…and tomorrow it won’t remember you exist.” But that line is the whole distance between a party trick and a product. It’s the pillar we’re double-clicking today: memory.

(This is Part 2 of my five pillars of agentic AI deep dive. Part 1 was governance.)

The 5 Pillars of Agentic AI — Memory, State, Orchestration, Governance, Evaluation. This post zooms in on the first: memory.

“Just give it a bigger context window”

This is the wrong answer, and it’s the first one everyone reaches for.

A context window is not memory. It’s short-term attention — and it evaporates the second the session ends. Worse, it leaks during the session: self-attention weakens on long-range dependencies, so the model clings to the last thing you said and quietly drops the critical thing you said an hour ago. A bigger window is just a bigger goldfish bowl. The goldfish still forgets.

Real memory is a different machine entirely. So I read two serious attempts at building it — from opposite ends of the universe — to see what they’d agree on:

MemoryBear — 4,000+ stars, open-source, Neo4j-backed, and unapologetically biological. It models memory on the human brain: perceive → extract → associate → forget, hippocampus and synaptic pruning included.
Microsoft Foundry Agent Service Memory — the managed enterprise opposite: extraction → consolidation → retrieval, with retention policies, quotas, and a security model.

One is a neuroscience love letter. The other is a cloud SKU. They agree on five things — and those five things are the whole game.

1. Memory is a pipeline, not a bucket

Here’s the mistake nine in ten “agent memory” builds make: they treat memory as storage. Dump the transcript into a vector database, retrieve by similarity, call it memory, ship it.

Both systems reject this outright. Memory is a pipeline, and all the real work happens between store and retrieve:

MemoryBear: perceive → extract → associate → forget
Foundry: extraction → consolidation → retrieval

Same shape, different vocabulary. You don’t save conversations — you extract the durable facts out of them (“allergic to dairy,” “always books window seats”), reconcile them against what you already know, and then something becomes retrievable. The transcript is raw ore. Memory is the refined metal.

If your “memory” is your chat log with embeddings bolted on, you didn’t build a memory. You built a search index — and it will drown in its own noise inside a week.

2. Your agent needs three memories, not one

There is no single “memory.” Foundry is clearest on this: it extracts three distinct types — and the part people miss is that each one is retrieved at a different moment.

Memory type	What it holds	When you pull it
User profile	Durable preferences, personal context	Up front, once — to set the tone for the whole conversation
Chat summary	Distilled summaries of past threads	Every turn, against what’s being said right now
Procedural	How-to routines learned from past work	On demand, when the user asks for something you’ve done before

That maps almost one-to-one onto the memory tiers I argued for in Inside My AI OS, Part II: working memory for now, episodic for what happened, shared for what the whole fleet knows.

The lesson: “remember everything in one place” is a non-design. Profile facts you load at hello. Conversational continuity you fetch per turn. Procedural know-how you summon on demand. Get the triggers wrong and your agent does both unforgivable things at once — forgets the obvious, and buries you in recall you never asked for.

3. Forgetting is the feature. Remembering is the easy part.

This is the one that surprised me — and the one both systems agree on hardest, despite living in different worlds.

MemoryBear ships a literal Forgetting Engine modeled on synaptic pruning. Every memory carries a strength score that decays with time and climbs with use; when it drops below threshold it slides through dormancy → decay → clearance. The payoff is measured: redundant knowledge held under 8%, ~60% less waste than systems that hoard everything. Foundry attacks the identical problem in a suit: store-level TTL, item-level delete, and direct “remember / forget” commands the moment a user asks.

Sit with how counterintuitive that is. Two teams — one obsessed with brains, one with SLAs — both spent real engineering effort teaching their systems to throw memories away.

A memory system that only accumulates doesn’t grow wiser. It rots.

Skip forgetting and you get three failures, fast: cost (you pay to store and search an ever-growing landfill), conflict (last year’s facts contradict today’s), and noise (retrieval quality falls off a cliff as the haystack swallows the needle). Human memory forgets on purpose — it’s a feature evolved over millions of years, not a flaw we’re stuck with.

The unglamorous version for the rest of us: put a TTL on memories, decay what nobody uses, and let people say “forget that.” If you can’t delete from your agent’s memory, you don’t have a memory system. You have a liability with a growth chart.

4. Writing to memory is a merge, not an append

Both systems use an LLM to consolidate new memories against old ones — and both, independently, reach for the exact same example. When two teams pick the same example, it’s because it’s the canonical landmine: allergies.

A user says “I love cheese” in March. In June they say “I’ve gone lactose intolerant.” A naive store keeps both — and cheerfully recommends the cheese plate from the top of this post. Foundry explicitly resolves conflicting facts (“such as a new allergy”) at write time. MemoryBear runs a nightly Self-Reflection pass that hunts for contradictions and flags them.

New memory doesn’t get added. It gets reconciled — dedupe the overlaps, supersede the stale, flag the genuine conflicts.

That’s why both systems put an LLM — not just a database — on the write path. Skip it and your agent’s “memory” becomes a drawer of mutually contradictory sticky notes: each one technically true, the collection a liar.

5. A flat list of facts is a loaded gun

Here’s where these systems part ways — and the split is the lesson, because it’s the one real disagreement left in the field.

MemoryBear is graph-first. It extracts structured triples (entity → relation → entity) into Neo4j across 12 relationship types — hierarchical, causal, temporal, logical — then retrieves with a hybrid engine: Elasticsearch keyword matching fused with BERT semantic vectors, claiming 92% accuracy versus single-mode. On the other side sit the flat-store camps: Foundry keeps memory as items under a per-user scope, and AtomicMemory — an inspectable, benchmarked, pgvector-backed layer — deliberately bets against the graph entirely, wagering that explicit mutation and lineage on flat vectors beat the cost of maintaining relationships.

It’s a serious bet, made by serious engineers with benchmark numbers behind it. I still come down on the side of the graph — and here’s the hill I’ll die on:

“Aspirin treats headache” and “patient is allergic to aspirin” are each harmless. Stored flat and ungrouped, they’re a malpractice suit.

A pile of facts can’t see what a graph of relations catches. Flat-plus-semantic retrieval will cheerfully return both facts and let the model fail to connect them — because in a vector store, “related” means worded similarly, not causally linked. A graph encodes the relationship as a first-class edge, so the dangerous combination is retrievable, not just the individual facts. The flat camp is right that mutation and lineage matter enormously — that’s Lesson 4 — but mutation tells you when a fact changed, not which facts are lethal together. Those are different problems, and only the graph solves the second one.

The other half of good recall is hybrid retrieval: pure vector search hands back confident, plausible, wrong neighbours; pure keyword misses synonyms and intent. Widen with semantics, sharpen with keywords — that fusion is why the graph camp posts the accuracy numbers it does.

Match the structure to the stakes. A support bot survives on flat items and a TTL. A clinical, financial, or legal agent needs the graph — because there, a missed relationship isn’t a bad answer. It’s a casualty.

The dark side: memory is the perfect place to plant a lie

I can’t write about memory without dragging governance back in from Part 1 — because Foundry’s docs are bracingly honest here: persistent memory opens the door to prompt injection and memory corruption.

Think about what that means. Get one poisoned “fact” written into long-term memory and you haven’t compromised a session — you’ve compromised every future session. The agent will recall the lie faithfully, forever, with total confidence, never knowing it was tampered with. It’s the most patient attack in AI: write once, exploit indefinitely.

The more durable your memory, the bigger the blast radius of a single bad write.

A guardrail that screens the write path to memory isn’t a memory feature — it’s the verifiable-leash governance from Part 1, pointed at what your agent is allowed to learn. Persistent memory without a governed write path is just a confidently, permanently wrong agent waiting to happen.

Build, buy, or keep it boring?

So which do you actually reach for? Be honest about how much memory you need:

MemoryBear — the maximalist: Neo4j + Elasticsearch + Redis + PostgreSQL + Celery workers. Staggering capability, real operational weight. Right when memory is the product (an affective companion, a research assistant) and you’ll happily run the stack.
Foundry Memory — the managed path: three memory types, TTLs, item CRUD, a security model, regional rollout — in exchange for being a preview service in one cloud, with quotas and no VNet for memory stores yet. Right when you want personalization without operating a memory database.
The boring middle, which is what I run: a plain, durable store every runtime can reach — an Obsidian vault as shared memory, tiered into working / episodic / shared. No graph. No service. Just markdown in git that survives a restart and that I can grep and delete by hand. It clears the only bar that ultimately matters: it remembers across sessions, and I can make it forget on command.

You do not need a hippocampus simulator to stop your agent introducing itself to you every morning. You need extraction, a couple of types, a retention policy, and a delete button.

Steal this

Build a pipeline, not a bucket. Extract, consolidate, then retrieve. Your chat log is input, not memory.
Give it three memories, not one. Profile at hello, summaries per turn, procedures on demand.
Engineer forgetting on day one. TTLs, decay, an explicit “forget that.” A memory that only grows is a bug with a roadmap.
Make writes a merge. Reconcile new against old — dedupe, supersede, flag. Put intelligence on the write path.
Hybrid retrieval; structure where it counts. Semantics to widen, keywords to sharpen; a graph when facts are only dangerous together.
Govern what it learns. One poisoned fact outlives the session forever. Screen the write path.

Memory is the difference between a reply machine and a colleague — an agent that’s better next week because of what it learned this week. Nail it and every other pillar gets easier. Botch it and you’ve shipped the most expensive goldfish in the building.

Next: Part 3 — Orchestration, where the memory you just built has to survive being handed from one agent to the next without spilling. (For the runtime mechanics of that handoff, I already mapped the adapter trade-off.)