Metrion can cache AI responses at the proxy level. When the same request is sent more than once, Metrion returns the stored response without calling the provider — no tokens consumed, no cost, response in ~80ms instead of 2–4 seconds.Documentation Index
Fetch the complete documentation index at: https://metrion.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Activation
Caching is opt-in and never enabled automatically. You control it per-request via an HTTP header.| Header value | Behavior |
|---|---|
x-metrion-cache: true | Read + write. Returns cached response if available, otherwise calls the provider and stores the result. |
x-metrion-cache: refresh | Forces a provider call even if a cached entry exists, then updates the cache with the new response. |
| (absent) | Cache disabled — normal proxy behavior. |
Custom TTL
By default, cached responses are kept for 7 days. Override with theCache-Control header:
Code examples
Force refresh
Custom TTL (1 hour)
How cache matching works
Cache key
The cache key is a SHA-256 hash of:max_tokens, temperature, stream, or any other parameter.
Storage
Responses are stored in Redis (Upstash, EU West — Ireland). Streamed responses are stored chunk by chunk and replayed as a real stream on a cache hit — your code does not need to handle two different response modes.Use cases
What benefits from caching
FAQ bots and automated support When multiple users ask the same question, the provider is called only once. The second user (and every subsequent one) gets the cached response instantly.What caching doesn’t cover
Conversations with history In a chat, each request includes the full conversation history. Themessages[] array grows with every exchange — the hash is different each time, so cache hits are not possible.
Templates with variable content
If your prompt changes on every call (summarizing different documents, analyzing different products), the hash will differ each time.
Verifying cache hits
In the Logs page of your Metrion dashboard, requests served from cache display a green Cached badge. They appear with 0 input tokens, 0 output tokens, and $0.00 cost. The Dashboard overview tab shows a Savings card with the total amount saved through caching for the selected period.Limits
| Parameter | Value |
|---|---|
| Default TTL | 7 days |
| Maximum TTL | 365 days |
| Storage | Upstash Redis, EU West (Ireland) |
| Supported providers | All Metrion providers (Anthropic, OpenAI, Gemini, Mistral, Grok) |
| Cache type | Exact-match (model + messages) |