Real numbers, reproducible from the repo. The honest headline: a single local SQLite file retrieves the right memory ~99% of the time at k=10 — with zero network calls and zero per-recall cost.
Did the store surface a gold-evidence session in the top-k? Measured across all 500 questions, embeddings off (the air-gapped path).
| k | Session recall@k | Notes |
|---|---|---|
| @10 | 99.4% (497/500) | Only 3 misses across the whole set. |
| @20 | 99.6% (498/500) | Default reader window. |
| @50 | 100% (500/500) | Every gold session is in the candidate set. |
| Metric | Locamem (local) | Cloud memory (typical) |
|---|---|---|
| Recall latency | ~9 ms (SimHash band + FTS5; hybrid) | Network round trip: tens–hundreds of ms + tail latency |
| Write / ingest | ~3.8 ms per memory | API write + async indexing |
| API calls / recall | 0 | ≥1 (vector search), often + embedding call |
| Cost / 1,000 recalls | $0 | Metered API + embedding/model cost |
| Footprint | One SQLite file; runs on a laptop | Managed datastore + vector index, server-side |
500 long-horizon questions over ~48-session haystacks — the standard long-term-memory benchmark. We report session-level retrieval recall (the metric that decides whether the reader even sees the evidence).
64-bit SimHash (LSH) ∪ FTS5 full-text ∪ optional on-device embeddings, fused and scored with a per-facet breakdown. Embeddings off for the air-gapped numbers above.
The QA path never reads the answer or answer_session_ids — a runtime assert enforces it. Numbers come from the public set as a report, not a tuning signal.
# reproduce, end to end git clone https://github.com/TeamWilcoe/locamem && cd locamem python benchmarks/build_failure_dossier.py # session recall@10/20/50, CPU-only python benchmarks/bench_longmemeval_qa.py --use-solvers --model claude # end-to-end
We separate the two on purpose, and we don't claim to beat anyone on answer accuracy.
The store reliably surfaces the right evidence. This is what Locamem owns, on-device, at $0 — and it's genuinely strong.
End-to-end accuracy depends on the reader model, not retrieval. Even an oracle GPT-4o handed perfect evidence tops out near 82%. We report ~58% with the current reader and treat the gap as a reader problem — not a retrieval one, and not a claim of accuracy superiority over cloud products.
Roadmap: RRF rank-fusion + a local cross-encoder rerank are the next retrieval upgrades; reader-side anti-hedge + aggregation work targets the end-to-end gap. Both are tracked openly in the repo.
No account. No keys. One SQLite file and an MCP server, on your machine.
$ curl -fsSL https://locamem.com/install | bash