Context Backend for Modern Apps
A high-performance memory engine that sits alongside your database. Use it to reduce token usage for LLMs and inject associative context into your search results.
$ pip install cuemap
$ npm install cuemap
$ docker run -p 8080:8080 cuemap/engine:latest
Engine: AGPLv3 • SDKs: MIT • Commercial license available
The difference between fuzzy matching and precise truth.
Vectors retrieve fuzzy matches. You pay for chunks of noise for every relevant fact.
CueMap intersects Context + Time + Habit. You pay only for grounded context. Precise. Deterministic.
Reduce output token usage up to 90% by injecting only high-signal memories.
Feed your LLM verifiable, deterministic context via the Grounded Recall API.
Provide the "needle in the haystack" without the haystack.
Turn vague queries into precise results.
CueMap's cue co-occurrence graph
and
self-learning lexicon expand user intent before it hits your database.
Your database returns only the Stripe integration logs with timeout errors. Millions of unrelated logs filtered out instantly.
Use CueMap's Context Expansion API to expand user intent before querying your database.
Track user interests in CueMap's graph to re-rank results from your main database based on individual context.
Bridge the gap between vague user queries and precise database records with explainable, auditable expansions.
Biologically inspired. Mathematically continuous.
The Context Filter
Unlike vector search which finds "neighbors," CueMap intersects cues directly. More cues → smaller candidate set → higher precision.
The Freshness Gradient
Just like biological signals fade over distance, memory relevance decays over time (1/(1+t)). CueMap models this with a continuous gradient, ensuring recent events naturally overpower old habits.
The Weight of Habit
"Cells that fire together, wire together." Every recall event physically strengthens the memory's signal. Critical knowledge gains massive "mass," preventing it from being displaced by trivial events.
Know exactly why every result was returned. No black boxes.
Great for similarity. Hard to audit.
Good for fuzzy matching. Hard to audit why a chunk ranked above another. Time and policy versioning usually bolted-on.
Match Integrity: Proof of Work
Every score is decomposable. Every source is verifiable. Audit-ready.
Bootstrap cues automatically. Let the engine learn your context.
Watches your codebase and files in real-time. Supports most popular file formats out of the box. Automatically extracts cues using high-precision parsers.
A self-learning map of your vocabulary. Because the Lexicon is running on its own internal CueMap engine, it uses biological reinforcement to automatically disambiguate words based on how you actually use them.
CueMap automatically surfaces synonyms and related terms from your usage patterns. No manual synonym mapping required—the engine discovers relationships on its own.
Don't spend weeks defining synonyms. CueMap comes pre-loaded with WordNet for instant English
associations.
Coming Soon: The Public
Context API. Instantly inherit
millions of global associations—from pop culture to deep tech—without training a single byte.
Real benchmarks on real Wikipedia data. No marketing fluff.
Time to parse a raw sentence and extract semantic cues.
| Dataset Scale | Avg Latency | P50 (Median) | Throughput | Scaling |
|---|---|---|---|---|
| 10,000 | 2.29 ms | 1.91 ms | ~436 ops/s | — |
| 100,000 | 2.06 ms | 2.08 ms | ~327 ops/s | 🟢 Flat |
| 1,000,000 | 2.34 ms | 2.00 ms | ~427 ops/s | 🟢 O(1) |
Observation: Ingestion latency is effectively O(1). Scaling from 10k → 1M memories results in zero latency penalty (2.00ms flat).
Time to parse a query, perform pattern completion (context expansion), and intersect the semantic graph.
| Dataset Scale | Operation | Avg Latency | P50 (Median) | P99 (Tail) |
|---|---|---|---|---|
| 100,000 | Smart Recall With PC | 5.17 ms | 4.25 ms | 13.08 ms |
| Raw Recall No PC | 4.33 ms | 4.19 ms | 11.10 ms | |
| 1,000,000 | Smart Recall With PC | 11.57 ms | 10.97 ms | 26.17 ms |
| Raw Recall No PC | 7.18 ms | 6.65 ms | 15.82 ms |
What is Smart Recall? Pattern Completion (PC) automatically expands your query with inferred cues from the co-occurrence graph. Even with PC enabled, 1M item search (10.97ms) is faster than a 60Hz screen refresh (16ms).
Observation: 10x more data = only 2x more latency. Sub-linear scaling means you can grow without fear.
Benchmark Setup: Machine: Apple M1 Max, 64GB RAM | Engine: single-node Rust server | Dataset: Real Wikipedia articles, full NLP pipeline | Workload: between 5 to 10 cues per memory See full methodology →
CueMap excels at multi-dimensional, time-aware queries where explicit cue intersection matters more than semantic similarity.
Not a toy. Not a prototype. A production-grade memory engine.
Isolate & Correlate
Partition data into logical namespaces (e.g., news, stocks, logs) for strict isolation. Then, perform Cross-Namespace Unions instantly to find correlations between siloed datasets.
# Start with project isolation
./cuemap --multi-tenant
Background Snapshots
State is automatically serialized to disk using Bincode (compact binary format). Near zero-latency snapshots happen in the background. Instant recovery on restart.