Skip to main content

Documentation Index

Fetch the complete documentation index at: https://metrion.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Metrion can cache AI responses at the proxy level. When the same request is sent more than once, Metrion returns the stored response without calling the provider — no tokens consumed, no cost, response in ~80ms instead of 2–4 seconds.

Activation

Caching is opt-in and never enabled automatically. You control it per-request via an HTTP header.
Header valueBehavior
x-metrion-cache: trueRead + write. Returns cached response if available, otherwise calls the provider and stores the result.
x-metrion-cache: refreshForces a provider call even if a cached entry exists, then updates the cache with the new response.
(absent)Cache disabled — normal proxy behavior.

Custom TTL

By default, cached responses are kept for 7 days. Override with the Cache-Control header:
Cache-Control: max-age=3600
Maximum TTL: 365 days.

Code examples

import Anthropic from '@anthropic-ai/sdk'

const client = new Anthropic({
  apiKey: 'sk-metrion-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
  baseURL: 'https://www.metrion.dev/api/proxy',
  defaultHeaders: {
    'x-metrion-cache': 'true',
  },
})

const response = await client.messages.create({
  model: 'claude-haiku-4-5-20251001',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'What are your delivery times?' }],
})

Force refresh

defaultHeaders: {
  'x-metrion-cache': 'refresh',
}

Custom TTL (1 hour)

defaultHeaders: {
  'x-metrion-cache': 'true',
  'Cache-Control': 'max-age=3600',
}

How cache matching works

Cache key

The cache key is a SHA-256 hash of:
model + messages[]
Two requests are considered identical if they use the same model and the same messages array — regardless of max_tokens, temperature, stream, or any other parameter.

Storage

Responses are stored in Redis (Upstash, EU West — Ireland). Streamed responses are stored chunk by chunk and replayed as a real stream on a cache hit — your code does not need to handle two different response modes.

Use cases

What benefits from caching

FAQ bots and automated support When multiple users ask the same question, the provider is called only once. The second user (and every subsequent one) gets the cached response instantly.
User A: "What are your delivery times?" → provider call (~2s)
User B: "What are your delivery times?" → cache hit (0 tokens, ~80ms)
User C: "What are your delivery times?" → cache hit (0 tokens, ~80ms)
Development and testing When you’re iterating on a prompt in a loop, you only pay for the first call. Every subsequent identical request is free. Automatic retries If your app retries the same request after a network error, the second attempt will be a cache hit — as long as the first response was successfully stored. Batch processing with duplicates If your pipeline processes data that repeats (same text submitted multiple times), duplicates are free.

What caching doesn’t cover

Conversations with history In a chat, each request includes the full conversation history. The messages[] array grows with every exchange — the hash is different each time, so cache hits are not possible. Templates with variable content If your prompt changes on every call (summarizing different documents, analyzing different products), the hash will differ each time.

Verifying cache hits

In the Logs page of your Metrion dashboard, requests served from cache display a green Cached badge. They appear with 0 input tokens, 0 output tokens, and $0.00 cost. The Dashboard overview tab shows a Savings card with the total amount saved through caching for the selected period.

Limits

ParameterValue
Default TTL7 days
Maximum TTL365 days
StorageUpstash Redis, EU West (Ireland)
Supported providersAll Metrion providers (Anthropic, OpenAI, Gemini, Mistral, Grok)
Cache typeExact-match (model + messages)