Response Cache

Metrion can cache AI responses at the proxy level. When the same request is sent more than once, Metrion returns the stored response without calling the provider — no tokens consumed, no cost, response in ~80ms instead of 2–4 seconds.

Activation

Caching is opt-in and never enabled automatically. You control it per-request via an HTTP header.

Header value	Behavior
`x-metrion-cache: true`	Read + write. Returns cached response if available, otherwise calls the provider and stores the result.
`x-metrion-cache: refresh`	Forces a provider call even if a cached entry exists, then updates the cache with the new response.
(absent)	Cache disabled — normal proxy behavior.

Custom TTL

By default, cached responses are kept for 7 days. Override with the Cache-Control header:

Cache-Control: max-age=3600

Maximum TTL: 365 days.

Code examples

import Anthropic from '@anthropic-ai/sdk'

const client = new Anthropic({
  apiKey: 'sk-metrion-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
  baseURL: 'https://www.metrion.dev/api/proxy',
  defaultHeaders: {
    'x-metrion-cache': 'true',
  },
})

const response = await client.messages.create({
  model: 'claude-haiku-4-5-20251001',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'What are your delivery times?' }],
})

import OpenAI from 'openai'

const client = new OpenAI({
  apiKey: 'sk-metrion-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
  baseURL: 'https://www.metrion.dev/api/proxy/openai/v1',
  defaultHeaders: {
    'x-metrion-cache': 'true',
  },
})

const response = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'What are your delivery times?' }],
})

curl https://www.metrion.dev/api/proxy/v1/messages \
  -H "x-api-key: sk-metrion-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
  -H "anthropic-version: 2023-06-01" \
  -H "x-metrion-cache: true" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-haiku-4-5-20251001",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "What are your delivery times?"}]
  }'

Force refresh

defaultHeaders: {
  'x-metrion-cache': 'refresh',
}

Custom TTL (1 hour)

defaultHeaders: {
  'x-metrion-cache': 'true',
  'Cache-Control': 'max-age=3600',
}

How cache matching works

Cache key

The cache key is a SHA-256 hash of:

model + messages[]

Two requests are considered identical if they use the same model and the same messages array — regardless of max_tokens, temperature, stream, or any other parameter.

Storage

Responses are stored in Redis (Upstash, EU West — Ireland). Streamed responses are stored chunk by chunk and replayed as a real stream on a cache hit — your code does not need to handle two different response modes.

Use cases

What benefits from caching

FAQ bots and automated support When multiple users ask the same question, the provider is called only once. The second user (and every subsequent one) gets the cached response instantly.

User A: "What are your delivery times?" → provider call (~2s)
User B: "What are your delivery times?" → cache hit (0 tokens, ~80ms)
User C: "What are your delivery times?" → cache hit (0 tokens, ~80ms)

Development and testing When you’re iterating on a prompt in a loop, you only pay for the first call. Every subsequent identical request is free. Automatic retries If your app retries the same request after a network error, the second attempt will be a cache hit — as long as the first response was successfully stored. Batch processing with duplicates If your pipeline processes data that repeats (same text submitted multiple times), duplicates are free.

What caching doesn’t cover

Conversations with history In a chat, each request includes the full conversation history. The messages[] array grows with every exchange — the hash is different each time, so cache hits are not possible. Templates with variable content If your prompt changes on every call (summarizing different documents, analyzing different products), the hash will differ each time.

Verifying cache hits

In the Logs page of your Metrion dashboard, requests served from cache display a green Cached badge. They appear with 0 input tokens, 0 output tokens, and $0.00 cost. The Dashboard overview tab shows a Savings card with the total amount saved through caching for the selected period.

Limits

Parameter	Value
Default TTL	7 days
Maximum TTL	365 days
Storage	Upstash Redis, EU West (Ireland)
Supported providers	All Metrion providers (Anthropic, OpenAI, Gemini, Mistral, Grok)
Cache type	Exact-match (model + messages)

Get Started

Integration

Dashboard

Alerts

Account

Activation

Custom TTL

Code examples

Force refresh

Custom TTL (1 hour)

How cache matching works

Cache key

Storage

Use cases

What benefits from caching

What caching doesn’t cover

Verifying cache hits

Limits

​Activation

​Custom TTL

​Code examples

​Force refresh

​Custom TTL (1 hour)

​How cache matching works

​Cache key

​Storage

​Use cases

​What benefits from caching

​What caching doesn’t cover

​Verifying cache hits

​Limits

Activation

Custom TTL

Code examples

Force refresh

Custom TTL (1 hour)

How cache matching works

Cache key

Storage

Use cases

What benefits from caching

What caching doesn’t cover

Verifying cache hits

Limits