What's the cheapest agent memory solution?

Hmem and Engram are both completely free, open-source tools that use local SQLite storage — $0/month. As long as your memory stays under tens of thousands of entries and you're a single user, either solution handles most indie maker needs.

What memory solution does LangGraph use by default?

LangGraph uses a checkpointer mechanism to store conversation state, supporting SQLite and PostgreSQL backends. This is episodic memory (conversation history), suited for tracking multi-turn dialogue state. For cross-conversation long-term semantic memory, you'll need to integrate the LangMem SDK separately.

Can Claude Code use an MCP memory server? How do I set it up?

Yes. Add the Hmem or Engram server configuration to your Claude Code MCP config file. Hmem provides an interactive installer that auto-detects your AI tools and completes the setup. Engram only requires downloading a Go binary and adding one server path line to your MCP config. Neither requires Docker or an API key.

What's the difference between agent memory and RAG?

RAG (Retrieval-Augmented Generation) pulls relevant information from an external knowledge base and injects it into the prompt — typically for static document queries. Agent memory emphasizes dynamic accumulation: the agent automatically stores important information during interactions and recalls conversation context, user preferences, and learned patterns in future sessions. The two can be used together.

Do I need to build my own embedding model?

No. If you use an SQLite+FTS5 solution (Hmem, Engram), you don't need embeddings at all — FTS5 uses full-text search instead of vector matching. If you choose a vector database, Chroma and Pinecone both have built-in embedding capabilities, or you can call OpenAI's embedding API. Only enterprise self-hosted deployments need to consider running your own embedding model.

AI Agent Memory Architecture Guide: Three Paths to the Right Memory Solution

Does your AI agent forget everything on restart? Have to re-explain the entire project context every time you switch machines? That's not your fault — it's a memory architecture problem. In Q1 2026, OSS Insight reported that open-source projects related to agent memory had accumulated over 80,000 stars, showing the entire community is searching for answers. This guide helps you find the right agent memory solution, whether you're a solo dev, a startup, or deploying at enterprise scale.

TL;DR

Solo dev: Hmem or Engram — 5-minute setup, SQLite storage, $0/month, handles under 100K memories with ease
Startup: Mem0 can save 90% on token costs (per Mem0's own arXiv paper vs the LOCOMO dataset — not an independent third-party test), supporting dozens of vector and graph databases (Pinecone, Qdrant, Kuzu, etc.) to handle user growth. Hmem's defining feature is 5-level lazy loading; hierarchical memory (scope hierarchy) is the defining feature of agent-recall and OpenViking
Enterprise: agent-recall's scope-chain architecture enables project-level memory isolation, and Markdown-as-source-of-truth makes auditing possible
SQLite+FTS5 queries 4,300 memories in under 1ms; Pinecone p95 is ~25-50ms (independent developer community benchmarks, not controlled same-environment comparisons) — most indie projects don't need a vector database

Note: Mem0's claimed 90% token savings and 91% p95 latency reduction are self-reported paper results. Actual performance depends on your use case and memory volume.

You Don't Need a Vector Database: SQLite Wins on Both Speed and Cost for Most Indie Use Cases

"You need a vector database for agent memory" is the most common misconception of 2026.

According to benchmarks published by multiple developers on Dev.to and independent tech blogs, SQLite+FTS5 full-text search dramatically outperforms cloud vector DBs when memory stays in the tens of thousands of entries. SQLite+FTS5 recall on 4,300 memories is under 1 millisecond; at similar scale, Pinecone p95 latency is ~25-50ms, Weaviate ~8-35ms, and Chroma ~4-60ms (these figures come from different test environments — not a controlled comparison on the same machine with the same dataset; actual latency varies with vector volume).

The cost gap is even more striking: SQLite is a free local file, while Pinecone has a Builder plan at $20/month and a Standard plan at $50/month (with a minimum usage commitment, as of Q1 2026; pricing may change). For a side project, that price difference alone can determine your architecture choice.

That said, vector databases have their place. When your queries are primarily semantic similarity matches (e.g., "find memories most related to this description"), or your memory exceeds 100K entries and needs high-dimensional indexing, a vector database is genuinely the better choice. The key is: understand your query patterns first, then decide on architecture.

What Kind of Agent Developer Are You? Three Paths, Three Memory Architectures

Choosing a memory architecture isn't a technical decision — it's a business decision. Your user scale, privacy requirements, and budget constraints determine which path to take:

Dimension	Solo Dev	Startup	Enterprise
User scale	1 (yourself)	10–1,000 users	Internal teams, multi-agent
Monthly budget	< $50	$50–500	Not the primary concern
Privacy requirements	Low	Medium (GDPR)	High (fully on-prem)
Recommended architecture	SQLite MCP	Hybrid (SQLite + vector DB)	SQLite + scope-chain + local embedding
Representative tools	Hmem, Engram	Mem0, LangGraph	agent-recall, Engram

The next three sections dive into each path's concrete implementation.

Solo Path: Give Claude Code Cross-Session Memory in 5 Minutes with Hmem or Engram

If you're a solo developer working on a side project with Claude Code or Cursor, your number one need is simple: make the agent remember the last conversation. No Docker, no Python environment, no API keys required.

Hmem: Hierarchical Memory, Loads ~5k Tokens at Startup

Hmem is an MCP server that stores memory in a local SQLite file (.hmem) using a 5-level hierarchical structure. On startup, the agent loads only the L1 summary (300 entries consume ~5k tokens, roughly 17 tokens per entry) and drills down to full memories only when needed.

Setup steps (see Hmem GitHub for details):

Download Hmem from the GitHub releases page and run the interactive installer
The installer auto-detects your AI tools (Claude Code, Cursor, Windsurf, etc.)
Choose system-level installation (memory stored in ~/.hmem/) or project-level (stored in the current directory)

The same .hmem file can be shared across Claude Code, Cursor, Windsurf, Gemini CLI, and OpenCode — switching tools won't lose your memory.

Engram: Single Go Binary, Sub-Millisecond Recall

Engram takes the minimalist route: one Go binary + one SQLite file, zero external dependencies. It uses FTS5 full-text search instead of vector matching, achieving sub-millisecond query speeds. See Engram GitHub for installation details — just download the binary for your platform from the releases page.

Engram supports four interfaces: CLI, HTTP API, MCP server, and TUI. All data lives in ~/.engram/engram.db. The agent saves memories via mem_save (including title, type, and What/Why/Where/Learned structure) and retrieves relevant context through search in the next session.

When Is Hmem Enough? When Should You Choose Engram?

Using Claude Code for a personal agent only: Hmem is the more straightforward choice — the interactive installer auto-configures your MCP setup
If you need to share memory across multiple AI tools, Hmem's cross-tool .hmem file is more convenient
If you prefer zero-dependency deployment and need an HTTP API or TUI, Engram's Go binary is the better fit
Neither requires an embedding model, both use local SQLite storage, and both cost $0/month

Memory Architecture 101: Four Memory Types and Their Storage Patterns

Before picking a tool, understand the four types of agent memory. This taxonomy comes from LangChain's official documentation and the LangMem SDK — it's the most widely adopted framework in the community:

Memory Type	Description	Suitable Storage	Tool Examples
Working Memory	Current conversation's context window	LLM native context	No extra tools needed
Episodic Memory	Past conversation history, event logs	SQLite / checkpointer	Hmem, LangGraph
Semantic Memory	Knowledge base, facts, concepts	Vector search / FTS5	Engram, Chroma, Pinecone
Procedural Memory	Operational patterns, SOPs, learned patterns	Markdown files / rule files	CLAUDE.md, agent-recall

Most indie makers primarily need episodic memory (so the agent remembers "what we discussed last time") and procedural memory (so the agent remembers "what this project's coding style is"). If that's all you need, an SQLite MCP server is sufficient — no vector database required.

Only when you need semantic search across large volumes of unstructured knowledge (semantic memory) — for example, "find all memories related to React Server Components" — do vector embeddings become necessary.

Startup Path: Hybrid Architecture + Mem0 — Serve 1,000 Users While Controlling Token Costs

When your product needs to serve 10 to 1,000 users, each with their own conversation history and preferences, a pure SQLite approach hits two bottlenecks:

Token cost explosion: The naive approach of stuffing all memories into the prompt means 1,000 users x an average of 500 memories = massive token consumption per request
Cross-user semantic search: FTS5 keyword matching falls short of vector search in fuzzy query scenarios

Mem0's Layered Memory Strategy

Mem0's arXiv paper (ECAI 2025) proposes a solution: dynamically extract, consolidate, and retrieve important information from conversations instead of injecting everything. The paper's self-reported benchmarks show that compared to naive full-memory injection, Mem0 reduces p95 latency by 91% and saves over 90% on token costs.

Important: These figures are self-reported by the Mem0 team, tested on the LOCOMO standardized dataset. Actual results depend on your conversation length, memory volume, and query patterns.

Practical Hybrid Architecture Recommendations

For startups, my recommended architecture is:

Episodic layer: Use SQLite (or PostgreSQL) to store precise conversation history and user preferences, supporting exact queries ("What was this user's last order?")
Semantic layer: Use a vector DB (self-hosted Chroma or managed Pinecone) for semantic search ("Find topics this user might be interested in")
Hierarchical loading: Adopt Hmem's layered strategy — load summaries at startup, drill down only when needed

Mem0 offers a managed service option if you'd rather not build a hybrid architecture yourself. But keep in mind: managed service pricing scales with usage. It may be cost-effective early on, but you'll need to reassess costs as you grow.

When to Upgrade from SQLite to Hybrid

Based on community reports and tool documentation, here are the recommended upgrade triggers:

Total memory exceeds 100K entries
You need cross-user semantic similarity search
FTS5 query results aren't precise enough (recall drops)
You need to serve multiple concurrent users (SQLite's write lock becomes a bottleneck)

Enterprise Path: Fully On-Prem + agent-recall Scope-Chain — Fully Auditable Memory

Enterprise environments have three requirements solo devs typically don't worry about: security compliance (data can't leave the premises), data isolation (different projects' memories can't mix), and auditability (being able to answer "what's stored in the agent's memory?").

agent-recall's Scope-Chain Architecture

agent-recall is an SQLite-backed knowledge graph that manages memory through scoped entities, relations, and slots. Its MCP server provides 9 tools for agents to actively store facts. The core design is scope-chain with inheritance:

The same person can have different roles across different projects
Each agent reads and writes only within its own scope chain
The MCP server automatically enforces isolation — no application-layer logic needed

According to agent-recall's GitHub documentation, it's used daily in production environments with over 30 agents. All data lives in ~/.agent-recall/frames.db — a single SQLite file, fully offline.

Markdown-as-Source-of-Truth: Anti-Fragile Memory Design

The memweave and sqlite-memory projects embody an important design philosophy: Markdown is the human-readable, version-controllable, permanently portable source of truth; the SQLite index is merely a derived layer for faster queries.

What this means for enterprises:

Auditable: Open the Markdown file to see exactly what the agent remembers — no special tools needed
Rebuildable: If the SQLite file gets corrupted, rebuild the index from Markdown — no risk of permanent data loss
Zero vendor lock-in: No dependency on any cloud service, near-zero migration cost

The Full Cost Picture: From $0 to Production

Stage	Solution	Monthly Cost	Memory Capacity	User Scale
Getting started	Hmem / Engram (SQLite MCP)	$0	< Tens of thousands	1 person
Growth	Self-hosted Chroma	$0 (infra costs separate)	< 1M entries	10–100
Scale	Pinecone Standard	$50+/month	Unlimited (usage-based)	100–1,000+
Managed	Mem0 Managed Service	Usage-based	Unlimited	Depends on plan

The trigger to upgrade isn't "memory grew" — it's when these three signals appear simultaneously:

FTS5 query precision can't meet your business requirements
SQLite's single write lock is causing user experience delays
You need cross-user semantic similarity search

Until these signals appear, every extra dollar spent is waste.

Privacy-First Design: Three Scenarios Where Local-First Memory Is Irreplaceable

Many developers see "local-first" as "a cheap alternative," but in these three scenarios, local-first isn't a compromise — it's the only correct choice:

Scenario 1: Personal Finance Assistant

Your agent needs to remember a user's income, expenses, and investment portfolio. Sending this data to a cloud vector DB means financial privacy risk and potential violations of local data protection laws. Local SQLite storage ensures data never leaves the user's device.

Scenario 2: Medical Records Organization

The agent processes health data and medical records. Even if cloud services claim encryption, the burden of proving regulatory compliance falls on you. A local-first architecture fundamentally eliminates the possibility of data leakage.

Scenario 3: Enterprise Code Review

The agent needs to remember codebase architectural decisions and technical debt. Source code can't be sent to Pinecone or any external service. agent-recall's scope-chain + SQLite keeps each project's memory fully isolated — IT can rest easy.

The common thread across all three: data stays on-device = GDPR compliance + offline capability + zero vendor dependency. Cloud solutions can't satisfy all three simultaneously.

Tool Selection Decision Matrix

Use Case	Recommended Tool	Cost	Offline Capability	Scale Ceiling
Personal coding agent	Hmem	$0	Fully offline	Single user, tens of thousands
Personal productivity tool	Engram	$0	Fully offline	Single user, tens of thousands
Multi-agent collaboration (enterprise)	agent-recall	$0	Fully offline	30+ agents (validated)
B2C chat product	Mem0	Usage-based	Requires network	Thousands of users
Large-scale semantic search	Pinecone	$50+/month	Requires network	Unlimited
Self-hosted semantic search	Chroma (self-hosted)	$0 + infra	Can be offline	Depends on hardware
Conversation state management	LangGraph checkpointer	$0	Can be offline	Depends on backend DB

Note: The "Scale Ceiling" column represents conservative estimates based on tool documentation and community reports, not hard limits. Actual limits depend on your hardware, query patterns, and data structure.

Pre-Launch Checklist: 10 Questions to Confirm Your Agent Memory Architecture Is Ready

Before pushing memory features to production, walk through these 10 questions drawn from common pitfalls reported by the community:

Backup strategy: Is your SQLite file backed up regularly? If using Markdown-as-source-of-truth, can you rebuild the SQLite index from Markdown?
Memory ceiling: Do you expect memory to exceed 100K entries? If so, what's your upgrade path?
Multi-agent conflicts: If multiple agents write to the same memory store simultaneously, do you have a conflict resolution mechanism? (agent-recall's scope-chain naturally solves this)
Memory quality: Do you have a process to periodically clean out stale or incorrect memories? Agent memory degrades over time (memory decay)
Privacy classification: Which memories can be sent to the cloud? Which must stay local? Do you have clear classification criteria?
Query patterns: Does your agent primarily do exact queries ("User A's preferences") or semantic search ("memories related to React")? This determines FTS5 vs vector DB
Cold start: When a new user's agent has zero memories, how much does the experience suffer? Do you have a default memory strategy?
Cost monitoring: If using a cloud vector DB or managed service, have you set up usage alerts? Token costs and query costs can creep up without notice
Embedding model choice: If you need vector search, which embedding model did you choose? OpenAI's embedding API is the simplest option, but enterprise deployments may need a self-hosted model
Observability: Can you inspect which memories the agent stored and retrieved in each session? Debugging memory systems is more important than you'd expect

Conclusion: Start with the Simplest Solution, Upgrade When You Need To

Agent memory architecture isn't a one-time decision — it's a process that evolves as your product grows. My recommendation is straightforward:

If you're a solo dev: Install Hmem or Engram today. In 5 minutes, your agent will stop forgetting. Wait until your memory actually exceeds 100K entries or you need semantic search before considering an upgrade.

If you're building a startup: Start with SQLite. When user scale and query demands genuinely grow, bring in Mem0 or Chroma for a hybrid architecture. Don't set up Pinecone when you only have 10 users.

If you're in an enterprise environment: agent-recall's scope-chain + Markdown-as-source-of-truth is currently the best combination for meeting security and audit requirements.

Remember one principle: premature optimization is the most common mistake in agent memory architecture. Solve the "agent keeps forgetting" problem first, then optimize your architecture gradually.

AI Agent Memory Architecture Guide: From SQLite to Vector DBs — Pick the Right Memory Solution (2026)

AI Agent Memory Architecture Guide: Three Paths to the Right Memory Solution

TL;DR

You Don't Need a Vector Database: SQLite Wins on Both Speed and Cost for Most Indie Use Cases

What Kind of Agent Developer Are You? Three Paths, Three Memory Architectures

Solo Path: Give Claude Code Cross-Session Memory in 5 Minutes with Hmem or Engram

Hmem: Hierarchical Memory, Loads ~5k Tokens at Startup

Engram: Single Go Binary, Sub-Millisecond Recall

When Is Hmem Enough? When Should You Choose Engram?

Memory Architecture 101: Four Memory Types and Their Storage Patterns

Startup Path: Hybrid Architecture + Mem0 — Serve 1,000 Users While Controlling Token Costs

Mem0's Layered Memory Strategy

Practical Hybrid Architecture Recommendations

When to Upgrade from SQLite to Hybrid

Enterprise Path: Fully On-Prem + agent-recall Scope-Chain — Fully Auditable Memory

agent-recall's Scope-Chain Architecture

Markdown-as-Source-of-Truth: Anti-Fragile Memory Design

The Full Cost Picture: From $0 to Production

Privacy-First Design: Three Scenarios Where Local-First Memory Is Irreplaceable

Scenario 1: Personal Finance Assistant

Scenario 2: Medical Records Organization

Scenario 3: Enterprise Code Review

Tool Selection Decision Matrix

Pre-Launch Checklist: 10 Questions to Confirm Your Agent Memory Architecture Is Ready

Conclusion: Start with the Simplest Solution, Upgrade When You Need To

FAQ

cognee vs codebase-memory-mcp: Which AI Agent Memory Tool Do You Actually Need? (2026)

Quality guarded by our community

AI Agent Memory Architecture Guide: From SQLite to Vector DBs — Pick the Right Memory Solution (2026)

AI Agent Memory Architecture Guide: Three Paths to the Right Memory Solution

TL;DR

You Don't Need a Vector Database: SQLite Wins on Both Speed and Cost for Most Indie Use Cases

What Kind of Agent Developer Are You? Three Paths, Three Memory Architectures

Solo Path: Give Claude Code Cross-Session Memory in 5 Minutes with Hmem or Engram

Hmem: Hierarchical Memory, Loads ~5k Tokens at Startup

Engram: Single Go Binary, Sub-Millisecond Recall

When Is Hmem Enough? When Should You Choose Engram?

Memory Architecture 101: Four Memory Types and Their Storage Patterns

Startup Path: Hybrid Architecture + Mem0 — Serve 1,000 Users While Controlling Token Costs

Mem0's Layered Memory Strategy

Practical Hybrid Architecture Recommendations

When to Upgrade from SQLite to Hybrid

Enterprise Path: Fully On-Prem + agent-recall Scope-Chain — Fully Auditable Memory

agent-recall's Scope-Chain Architecture

Markdown-as-Source-of-Truth: Anti-Fragile Memory Design

The Full Cost Picture: From $0 to Production

Privacy-First Design: Three Scenarios Where Local-First Memory Is Irreplaceable

Scenario 1: Personal Finance Assistant

Scenario 2: Medical Records Organization

Scenario 3: Enterprise Code Review

Tool Selection Decision Matrix

Pre-Launch Checklist: 10 Questions to Confirm Your Agent Memory Architecture Is Ready

Conclusion: Start with the Simplest Solution, Upgrade When You Need To

FAQ

Read next

cognee vs codebase-memory-mcp: Which AI Agent Memory Tool Do You Actually Need? (2026)

Quality guarded by our community