Show HN: Agent-cache – Multi-tier LLM/tool/session caching for Valkey and Redis

Multi-tier exact-match cache for AI agents backed by Valkey or Redis. LLM responses, tool results, and session state behind one connection. Framework adapters for LangChain, LangGraph, and Vercel AI SDK. OpenTelemetry and Prometheus built in. No modules required - works on vanilla Valkey 7+ and Redis 6.2+.

Shipped v0.1.0 yesterday, v0.2.0 today with cluster mode. Streaming support coming next.

Existing options locked you into one tier (LangChain = LLM only, LangGraph = state only) or one framework. This solves both.

npm: https://www.npmjs.com/package/@betterdb/agent-cache Docs: https://docs.betterdb.com/packages/agent-cache.html Examples: https://valkeyforai.com/cookbooks/betterdb/ GitHub: https://github.com/BetterDB-inc/monitor/tree/master/packages...

Happy to answer questions.

16 points | by kaliades 21 hours ago

2 comments

revenga99 18 hours ago
Can you explain what this does?
[-]
- kaliades 17 hours ago
  It caches AI agent operations in Valkey (or Redis) so you don't repeat expensive work.
  Three tiers: if your agent calls gpt-4o with the same prompt twice, the second call returns from Valkey in under 1ms instead of hitting the API. Same for tool calls - if your agent calls get_weather("Sofia") twice with the same arguments, the cached result comes back instantly. And session state (what step the agent is on, user intent, LangGraph checkpoints) persists across requests with per-field TTL.
  The main difference from existing options is that LangChain's cache only handles LLM responses, LangGraph's checkpoint-redis only handles state (and requires Redis 8 + modules), and none of them ship OpenTelemetry or Prometheus instrumentation at the cache layer. This puts all three tiers behind one Valkey connection with observability built in.
  [-]
  - trueno 16 hours ago
    when you say "same prompt" are you saying its similar prompt and something in the middle determines that "this is basically the same question" or is it looking for someone who for whatever reason prompted, then copied and pasted that prompt and prompted it again word for word?
    [-]
    - kaliades 16 hours ago
      Exact match, word for word. agent-cache takes everything that defines an LLM request - which model you're calling (gpt-4o, Claude, etc.), the full conversation history (system prompt + user messages + assistant responses), sampling parameters like temperature, and any tool/function definitions the model has access to - serializes it all into a canonical JSON string with sorted keys, and hashes it with SHA-256. That hash is the cache key in Valkey. Same inputs down to the last character = cache hit, anything different = miss.
      If you want the 'basically the same question' behavior, that's our other package - @betterdb/semantic-cache. It embeds the prompt as a vector and does similarity search, so 'What is the capital of France?' and 'Capital city of France?' both hit. The trade-off is it needs valkey-search for the vector index, while agent-cache works on completely vanilla Valkey with no modules.
      In practice, agent-cache hits its cache less often than semantic-cache would, but when it does hit, you know the result is correct - there's no chance of returning a response for a question that was similar but not actually the same.
eddy_cammegh 16 hours ago
[dead]