Custom Claude Memory System

Brief

Claude Code has per-session memory. Close the terminal window and the agent forgets everything: your preferences, your monorepo layout, why you rejected a specific library three days ago, which production incidents we debugged last month. For one-off tasks that's fine - open file, fix bug, close. For long-running development (agentic workflows, multi-week projects), it's a daily productivity bleed: every new session I spend the first 10 minutes re-explaining where things live.

So I built my own persistent memory layer. SQLite for structured episodic memory (who / when / what / why), Chroma vector DB for semantic retrieval, bash hooks at session start and end, an MCP server for runtime tool-use queries. The goal: I land on a fresh claude prompt and within 5 seconds the agent knows where I left off yesterday, which guidelines I respect, and where to find the relevant context.

Architecture: SQLite + Chroma + hooks

Two storage layers, each doing what it does well:

SQLite holds structured memories: id, project, type (user_pref, feedback, project_fact, reference), body, embedding_id, created_at. Fast filtering by project, type, date. Full-text fallback via FTS5.
Chroma holds embeddings of the same body. Vector search gives semantic match ("oauth implementation" matches memories about "Clerk integration via middleware"), which keyword search doesn't.

~/.claude-mem/
├── memories.db          ← SQLite (structured)
├── chroma/              ← Chroma collection (embeddings)
├── archives/            ← .gz JSON snapshots of old sessions
└── hooks/
    ├── session-start.sh ← loads top 20 relevant memories
    └── session-end.sh   ← extracts new memories from session transcript

Hooks fire via the Claude Code hooks API. SessionStart injects a memory header into the system prompt; Stop (end of agent loop) runs the archival job.

Schema

-- ~/.claude-mem/memories.db
CREATE TABLE episodic_memory (
  id TEXT PRIMARY KEY,                 -- nanoid + project hash salt
  project TEXT NOT NULL,               -- 'prace-web', 'dokladbot', ...
  type TEXT NOT NULL CHECK (type IN ('user_pref', 'feedback', 'project_fact', 'reference')),
  body TEXT NOT NULL,
  embedding_id TEXT NOT NULL,          -- foreign id in Chroma collection
  source_session TEXT NOT NULL,        -- claude session id
  importance INTEGER DEFAULT 1,        -- 1-5, drives retention
  created_at INTEGER NOT NULL,         -- ms epoch
  last_accessed_at INTEGER,            -- for LRU eviction
  access_count INTEGER DEFAULT 0
);
 
CREATE VIRTUAL TABLE episodic_memory_fts USING fts5(
  body, content='episodic_memory', content_rowid='rowid'
);
 
CREATE INDEX idx_project_type ON episodic_memory(project, type);
CREATE INDEX idx_importance_recent ON episodic_memory(importance DESC, created_at DESC);

type taxonomy:

user_pref - global preferences ("never ask, always pick option two", "prefer Bun over pnpm")
feedback - explicit user reactions ("this was over-engineered", "lib X let me down on project Y")
project_fact - structural project info ("monorepo with Turborepo, apps/web + apps/api", "Postgres on Neon")
reference - long-lived knowledge base ("DataForSEO API uses basic auth, not bearer")

importance 1–5 controls retention. 1 = doesn't go to permanent storage (single-session only); 5 = never delete.

Hook lifecycle

session-start.sh:

#!/usr/bin/env bash
# ~/.claude-mem/hooks/session-start.sh
 
PROJECT=$(basename "$PWD")
DB=~/.claude-mem/memories.db
 
# top 20 memories for this project + global user_prefs
sqlite3 -json "$DB" <<SQL
SELECT id, type, body, importance, created_at
FROM episodic_memory
WHERE project IN ('$PROJECT', '_global')
  AND (type = 'user_pref' OR project = '$PROJECT')
ORDER BY importance DESC, last_accessed_at DESC
LIMIT 20;
SQL

The output is JSON, which Claude Code injects into the first system message as a <memory-context> block.

session-end.sh does the heavier lifting. It takes the session transcript, sends it through the Claude API with a strict prompt ("extract memories worth persisting, return JSON array, max 5"), and writes to SQLite + Chroma:

#!/usr/bin/env bash
# ~/.claude-mem/hooks/session-end.sh
 
TRANSCRIPT=$1
PROJECT=$(basename "$PWD")
 
# 1. extract candidate memories
NEW_MEMORIES=$(claude-mem-extract --transcript "$TRANSCRIPT" --project "$PROJECT")
 
# 2. dedup against existing (semantic similarity > 0.85 = duplicate)
FILTERED=$(echo "$NEW_MEMORIES" | claude-mem-dedup)
 
# 3. write to SQLite + Chroma
echo "$FILTERED" | claude-mem-store

MCP integration

A memory store is useless if Claude can't reach it at runtime. I built an MCP server (claude-mem-mcp) that exposes 3 tools:

Tool	Purpose
`chroma_query_documents(queries: string[])`	semantic search, top-k = 5
`chroma_get_documents(ids: string[])`	retrieve full body by id
`memory_store(type, body, importance)`	explicitly persist a memory mid-session

The global CLAUDE.md has a rule: "Before starting a task, call chroma_query_documents with the project name + intent." The agent does this automatically, gets the 5 most relevant memories, and proceeds.

Archival loop

Hot memories (last 30 days, importance ≥ 3) stay in SQLite + Chroma. After 30 days a nightly cron:

Selects stale memories (importance ≤ 2, last_accessed_at older than 30 days)
Compresses - sends a bulk Claude prompt "summarize these 50 memories into 5", stores summarized memories with refs to originals
Archives original to ~/.claude-mem/archives/<year>-<month>.json.gz
Deletes originals from hot storage

Storage stays small (~50 MB even after a year of work) but history is searchable - when I need to know how I solved X 8 months ago, archive search finds it.

Concrete metrics after 6 months

Metric	Value
Persisted memories	1,240
Sessions with memory hit	89 %
Avg memory recalls per session	4.2
Storage footprint	47 MB
Re-explanation overhead reduction	~90 % (subjective, measured stopwatch on 5 random sessions)
AI cost/month (extraction + dedup + summarize)	$1.80

90 % reduction in re-explained context means the first 10 minutes of every session, where I used to say "we have a monorepo, packages/db has Prisma, apps/web is Next.js 15, ...", is now 30 seconds of "let's pick up where we left off yesterday."

Lessons

When memory hurts. Over-eager recall pulls in irrelevant memories ("oh, we solved this last year, reuse the pattern") even when the situation differs. The fix is importance × recency × relevance score, not pure semantic similarity. A memory with importance 1 and 4 months of age has near-zero weight even on a semantic match.
Salt for hash IDs. SHA256 of raw content is treacherous - the same body across projects produces the same hash, and you get false dedups. Salt = ${project}:${userId}:${body} fixes it.
Local SQLite > cloud DB. A cloud DB was the first idea (sync between laptop and work machine), but query latency at session start > 200 ms ruined the UX. Local SQLite + opt-in rsync via restic to encrypted S3 does the same job without online dependency.
Memory taxonomy is load-bearing. The first version had a flat "memory" type. After 200 entries I lost track of what was preference vs fact vs feedback. The 4 types above keep order even at scale.
The hooks API is underrated. Claude Code hooks give power without forking the ecosystem - behavior changes on my machine, nobody else sees it, no upstream PR needed.