Metrion supports four alert types, each monitoring a different aspect of your AI usage. When you create a rule, you choose one type and configure a threshold. Metrion evaluates the rule after every proxied request and sends an email when your usage reaches 90% (warning) or 100% (alert) of that threshold.Documentation Index
Fetch the complete documentation index at: https://metrion.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Budget
Budget
Monitors the total cost of your AI requests over a defined period.
The budget is measured in the currency you select. If you choose EUR or CHF, Metrion converts your USD costs at a fixed rate.When to use it: Set a budget alert to catch unexpected spending early. For example, alert me when I’ve spent more than $50 on OpenAI this month gives you time to scale back before the bill arrives.
| Field | Details |
|---|---|
| Threshold unit | usd, eur, or chf |
| Providers | All providers, or specific ones (Anthropic, OpenAI, Gemini, Mistral, Grok) |
| Period | Start of current month, rule creation date, or a custom date — through today |
Error Rate
Error Rate
Monitors the percentage (or absolute count) of requests that return a 4xx or 5xx HTTP status from the provider.
Use
| Field | Details |
|---|---|
| Threshold unit | percent (0–100) or count (absolute number of errors) |
| Providers | All providers, or specific ones |
| Period | Start and end date for the measurement window |
percent to catch systemic issues — e.g. alert me when my error rate exceeds 5%. Use count if you have a hard tolerance for a specific number of failures regardless of total request volume.When to use it: Error rate alerts are useful for production workloads where reliability matters. A sudden spike in 4xx or 5xx responses often indicates a misconfiguration, a provider outage, or a rate-limit problem.Latency p95
Latency p95
Monitors the 95th-percentile response latency across your requests, measured in milliseconds.
The p95 value means that 95% of your requests are faster than the reported figure. It is a better indicator of real-world performance than the average, because it captures the tail of slow requests without being skewed by occasional outliers.When to use it: Set a latency p95 alert when your application has response-time requirements. For example, alert me when my p95 latency exceeds 3000ms tells you when a meaningful share of your users is experiencing slow responses.
| Field | Details |
|---|---|
| Threshold unit | ms |
| Providers | All providers, or specific ones |
| Period | Start and end date for the measurement window |
Request Volume
Request Volume
Monitors the total number of requests made through the Metrion proxy over a defined period.
When to use it: Request volume alerts are useful for staying within quota limits. For example, alert me when I’ve made 8,000 requests this month helps Free plan users stay under the 10,000-request monthly limit before they hit the cap. They’re also useful for cost forecasting and detecting unexpected traffic spikes.
| Field | Details |
|---|---|
| Threshold unit | count |
| Providers | All providers, or specific ones |
| Period | Start and end date for the measurement window |