Custom Claude Memory System
Persistent memory for Claude Code with SQLite + Chroma vector DB + hooks. Context continuity across sessions.
Brief
Claude Code has per-session memory. Close the terminal window and the agent forgets everything: your preferences, your monorepo layout, why you rejected a specific library three days ago, which production incidents we debugged last month. For one-off tasks that's fine - open file, fix bug, close. For long-running development (agentic workflows, multi-week projects), it's a daily productivity bleed: every new session I spend the first 10 minutes re-explaining where things live.
So I built my own persistent memory layer. SQLite for structured episodic memory (who / when / what / why), Chroma vector DB for semantic retrieval, bash hooks at session start and end, an MCP server for runtime tool-use queries. The goal: I land on a fresh claude prompt and within 5 seconds the agent knows where I left off yesterday, which guidelines I respect, and where to find the relevant context.
Architecture: SQLite + Chroma + hooks
Two storage layers, each doing what it does well:
- SQLite holds structured memories: id, project, type (
user_pref,feedback,project_fact,reference), body, embedding_id, created_at. Fast filtering by project, type, date. Full-text fallback via FTS5. - Chroma holds embeddings of the same body. Vector search gives semantic match ("oauth implementation" matches memories about "Clerk integration via middleware"), which keyword search doesn't.
~/.claude-mem/
├── memories.db ← SQLite (structured)
├── chroma/ ← Chroma collection (embeddings)
├── archives/ ← .gz JSON snapshots of old sessions
└── hooks/
├── session-start.sh ← loads top 20 relevant memories
└── session-end.sh ← extracts new memories from session transcript
Hooks fire via the Claude Code hooks API. SessionStart injects a memory header into the system prompt; Stop (end of agent loop) runs the archival job.
Schema
-- ~/.claude-mem/memories.db
CREATE TABLE episodic_memory (
id TEXT PRIMARY KEY, -- nanoid + project hash salt
project TEXT NOT NULL, -- 'prace-web', 'dokladbot', ...
type TEXT NOT NULL CHECK (type IN ('user_pref', 'feedback', 'project_fact', 'reference')),
body TEXT NOT NULL,
embedding_id TEXT NOT NULL, -- foreign id in Chroma collection
source_session TEXT NOT NULL, -- claude session id
importance INTEGER DEFAULT 1, -- 1-5, drives retention
created_at INTEGER NOT NULL, -- ms epoch
last_accessed_at INTEGER, -- for LRU eviction
access_count INTEGER DEFAULT 0
);
CREATE VIRTUAL TABLE episodic_memory_fts USING fts5(
body, content='episodic_memory', content_rowid='rowid'
);
CREATE INDEX idx_project_type ON episodic_memory(project, type);
CREATE INDEX idx_importance_recent ON episodic_memory(importance DESC, created_at DESC);type taxonomy:
user_pref- global preferences ("never ask, always pick option two", "prefer Bun over pnpm")feedback- explicit user reactions ("this was over-engineered", "lib X let me down on project Y")project_fact- structural project info ("monorepo with Turborepo, apps/web + apps/api", "Postgres on Neon")reference- long-lived knowledge base ("DataForSEO API uses basic auth, not bearer")
importance 1–5 controls retention. 1 = doesn't go to permanent storage (single-session only); 5 = never delete.
Hook lifecycle
session-start.sh:
#!/usr/bin/env bash
# ~/.claude-mem/hooks/session-start.sh
PROJECT=$(basename "$PWD")
DB=~/.claude-mem/memories.db
# top 20 memories for this project + global user_prefs
sqlite3 -json "$DB" <<SQL
SELECT id, type, body, importance, created_at
FROM episodic_memory
WHERE project IN ('$PROJECT', '_global')
AND (type = 'user_pref' OR project = '$PROJECT')
ORDER BY importance DESC, last_accessed_at DESC
LIMIT 20;
SQLThe output is JSON, which Claude Code injects into the first system message as a <memory-context> block.
session-end.sh does the heavier lifting. It takes the session transcript, sends it through the Claude API with a strict prompt ("extract memories worth persisting, return JSON array, max 5"), and writes to SQLite + Chroma:
#!/usr/bin/env bash
# ~/.claude-mem/hooks/session-end.sh
TRANSCRIPT=$1
PROJECT=$(basename "$PWD")
# 1. extract candidate memories
NEW_MEMORIES=$(claude-mem-extract --transcript "$TRANSCRIPT" --project "$PROJECT")
# 2. dedup against existing (semantic similarity > 0.85 = duplicate)
FILTERED=$(echo "$NEW_MEMORIES" | claude-mem-dedup)
# 3. write to SQLite + Chroma
echo "$FILTERED" | claude-mem-storeMCP integration
A memory store is useless if Claude can't reach it at runtime. I built an MCP server (claude-mem-mcp) that exposes 3 tools:
| Tool | Purpose |
|---|---|
chroma_query_documents(queries: string[]) | semantic search, top-k = 5 |
chroma_get_documents(ids: string[]) | retrieve full body by id |
memory_store(type, body, importance) | explicitly persist a memory mid-session |
The global CLAUDE.md has a rule: "Before starting a task, call chroma_query_documents with the project name + intent." The agent does this automatically, gets the 5 most relevant memories, and proceeds.
Archival loop
Hot memories (last 30 days, importance ≥ 3) stay in SQLite + Chroma. After 30 days a nightly cron:
- Selects stale memories (importance ≤ 2, last_accessed_at older than 30 days)
- Compresses - sends a bulk Claude prompt "summarize these 50 memories into 5", stores summarized memories with refs to originals
- Archives original to
~/.claude-mem/archives/<year>-<month>.json.gz - Deletes originals from hot storage
Storage stays small (~50 MB even after a year of work) but history is searchable - when I need to know how I solved X 8 months ago, archive search finds it.
Concrete metrics after 6 months
| Metric | Value |
|---|---|
| Persisted memories | 1,240 |
| Sessions with memory hit | 89 % |
| Avg memory recalls per session | 4.2 |
| Storage footprint | 47 MB |
| Re-explanation overhead reduction | ~90 % (subjective, measured stopwatch on 5 random sessions) |
| AI cost/month (extraction + dedup + summarize) | $1.80 |
90 % reduction in re-explained context means the first 10 minutes of every session, where I used to say "we have a monorepo, packages/db has Prisma, apps/web is Next.js 15, ...", is now 30 seconds of "let's pick up where we left off yesterday."
Lessons
- When memory hurts. Over-eager recall pulls in irrelevant memories ("oh, we solved this last year, reuse the pattern") even when the situation differs. The fix is importance × recency × relevance score, not pure semantic similarity. A memory with importance 1 and 4 months of age has near-zero weight even on a semantic match.
- Salt for hash IDs. SHA256 of raw content is treacherous - the same body across projects produces the same hash, and you get false dedups. Salt =
${project}:${userId}:${body}fixes it. - Local SQLite > cloud DB. A cloud DB was the first idea (sync between laptop and work machine), but query latency at session start > 200 ms ruined the UX. Local SQLite + opt-in rsync via restic to encrypted S3 does the same job without online dependency.
- Memory taxonomy is load-bearing. The first version had a flat "memory" type. After 200 entries I lost track of what was preference vs fact vs feedback. The 4 types above keep order even at scale.
- The hooks API is underrated. Claude Code hooks give power without forking the ecosystem - behavior changes on my machine, nobody else sees it, no upstream PR needed.