Shipped v0.1.0 yesterday, v0.2.0 today with cluster mode. Streaming support coming next.
Existing options locked you into one tier (LangChain = LLM only, LangGraph = state only) or one framework. This solves both.
npm: https://www.npmjs.com/package/@betterdb/agent-cache Docs: https://docs.betterdb.com/packages/agent-cache.html Examples: https://valkeyforai.com/cookbooks/betterdb/ GitHub: https://github.com/BetterDB-inc/monitor/tree/master/packages...
Happy to answer questions.
Three tiers: if your agent calls gpt-4o with the same prompt twice, the second call returns from Valkey in under 1ms instead of hitting the API. Same for tool calls - if your agent calls get_weather("Sofia") twice with the same arguments, the cached result comes back instantly. And session state (what step the agent is on, user intent, LangGraph checkpoints) persists across requests with per-field TTL.
The main difference from existing options is that LangChain's cache only handles LLM responses, LangGraph's checkpoint-redis only handles state (and requires Redis 8 + modules), and none of them ship OpenTelemetry or Prometheus instrumentation at the cache layer. This puts all three tiers behind one Valkey connection with observability built in.
If you want the 'basically the same question' behavior, that's our other package - @betterdb/semantic-cache. It embeds the prompt as a vector and does similarity search, so 'What is the capital of France?' and 'Capital city of France?' both hit. The trade-off is it needs valkey-search for the vector index, while agent-cache works on completely vanilla Valkey with no modules.
In practice, agent-cache hits its cache less often than semantic-cache would, but when it does hit, you know the result is correct - there's no chance of returning a response for a question that was similar but not actually the same.