Semantic caching

Cache LLM responses and tool results using semantic similarity with Redis.

adk-redis provides semantic caching at two levels: LLM response caching and tool result caching, both backed by Redis. Caching uses ADK's callback system, so enabling it requires no changes to your agent's core logic.

How it works

Before each LLM call (or tool execution), the cache checks whether a semantically similar prompt already exists in Redis. If so, the cached response is returned immediately. If not, the call proceeds and the response is stored for future lookups.

Cache providers

Two backends are available:

Provider

Embeddings

Setup

Best for

RedisVLCacheProvider

Local (you provide vectorizer)

Self-managed Redis

Full control

LangCacheProvider

Server-side (managed)

API key from Redis Cloud

Zero embedding overhead

RedisVL provider (local embeddings)

from redisvl.utils.vectorize import HFTextVectorizer
from adk_redis.cache import RedisVLCacheProvider, RedisVLCacheProviderConfig

provider = RedisVLCacheProvider(
    config=RedisVLCacheProviderConfig(
        redis_url="redis://localhost:6379",
        name="my_cache",
        ttl=3600,
        distance_threshold=0.1,
    ),
    vectorizer=HFTextVectorizer(model="redis/langcache-embed-v1"),
)

LangCache provider (managed)

No local vectorizer needed. Embeddings are generated server-side.

from adk_redis.cache import LangCacheProvider, LangCacheProviderConfig

provider = LangCacheProvider(
    config=LangCacheProviderConfig(
        cache_id="your-cache-id",
        api_key="your-api-key",
        ttl=3600,
    )
)

LLM response cache

Intercepts model calls through ADK's before_model_callback and after_model_callback.

from adk_redis.cache import (
    LLMResponseCache,
    LLMResponseCacheConfig,
    create_llm_cache_callbacks,
)

llm_cache = LLMResponseCache(
    provider=provider,
    config=LLMResponseCacheConfig(
        first_message_only=True,
        include_app_name=True,
        include_user_id=True,
    ),
)

before_cb, after_cb = create_llm_cache_callbacks(llm_cache)

agent = Agent(
    name="cached_agent",
    model="gemini-2.0-flash",
    instruction="You are a helpful assistant.",
    before_model_callback=before_cb,
    after_model_callback=after_cb,
)

Configuration notes

first_message_only=True caches only the first message in a session. Later messages depend on conversation context, making cache hits unreliable.
Function call responses and errors are automatically excluded from caching.
distance_threshold (set on the provider) controls how similar two prompts need to be for a cache hit. 0.0 = exact match only. 0.1 = small phrasing variations. Higher values risk returning wrong cached responses.

Tool result cache

Caches tool executions using before_tool_callback and after_tool_callback.

from adk_redis.cache import (
    ToolCache,
    ToolCacheConfig,
    create_tool_cache_callbacks,
)

tool_cache = ToolCache(
    provider=provider,
    config=ToolCacheConfig(
        tool_names={"web_search", "get_weather"},
    ),
)

before_tool_cb, after_tool_cb = create_tool_cache_callbacks(tool_cache)

The tool_names set specifies which tools to cache. Not all tools are idempotent: cache get_weather but not send_email.

More info

semantic_cache example: Local caching with RedisVL
langcache_cache example: Managed caching with LangCache