Cut your LLM bill. Change one line.
Optera is a drop-in proxy that sits between your app and your model provider. Point your base URL at it, keep everything else the same, and it trims cost on every request — then shows you exactly what it saved.
Under the hood Optera applies semantic caching, model routing, and request compression to each call, returning the provider's normal response plus a set of headers that report tokens and dollars saved. It speaks the OpenAI API format, so most SDKs work with a one-line change.
Quickstart
Three changes: point the base URL at Optera, add your proxy key header, and keep sending your provider key exactly as before.
Python — OpenAI SDK
from openai import OpenAI
client = OpenAI(
api_key="YOUR_OPENAI_KEY",
base_url="https://proxy.optera.dev/v1", # ← the only change
default_headers={"x-proxy-key": "YOUR_PROXY_KEY"},
)
resp = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)
Node — OpenAI SDK
import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY, baseURL: "https://proxy.optera.dev/v1", // ← the only change defaultHeaders: { "x-proxy-key": process.env.OPTERA_PROXY_KEY }, });
curl
curl https://proxy.optera.dev/v1/chat/completions \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H "x-proxy-key: $OPTERA_PROXY_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}' -i
The -i flag prints the response headers — that's where you'll see what Optera saved on the call.
Authentication
Every request carries two credentials, and they do different jobs:
Authorization: Bearer <provider key>— your own OpenAI (or Anthropic) key. Optera forwards this to the provider; it's never the thing that authenticates you to Optera.x-proxy-key: <proxy key>— your Optera key. This identifies your workspace, enforces your plan, and is how usage and savings are attributed.
Generate and rotate proxy keys in the dashboard under API keys. Treat them like secrets — anyone with a proxy key can spend against your plan.
402 until your workspace has an active subscription. Start one on the Billing page.Response headers
Optera returns the provider's normal response body untouched, and adds headers describing what it did. The ones you'll use most:
| Header | Meaning |
|---|---|
X-Tokens-Saved | Tokens avoided on this request versus sending it raw. |
X-Cost-Saved-USD | Estimated dollars saved on this request. |
X-Cache-Hit | true if served from the exact-match cache. |
X-Semantic-Cache-Hit | true if served from a semantically similar cached response. |
X-Model-Routed | The model Optera actually used after routing. |
X-Original-Model | The model you requested, before routing. |
Optimization steps each set a flag when they fire, so you can see precisely what was applied:
| Header | Set when |
|---|---|
X-Request-Compressed | The outgoing request payload was compressed. |
X-Preamble-Stripped | Boilerplate preamble was removed from the prompt. |
X-Repetition-Removed | Redundant repeated content was collapsed. |
X-Conversation-Summarized | Long history was summarized to fit fewer tokens. |
X-Tools-Compressed | Tool/function schemas were compacted. |
X-Image-Optimized | Image payloads were optimized before sending. |
X-Provider-Cache-Hit · X-Provider-Cache-Savings-USD | The provider's own prompt cache was hit, with its savings. |
Request options
Optional request headers let you segment analytics and work with Anthropic:
| Header | Purpose |
|---|---|
x-feature-tag | Attribute spend/savings to a product feature (e.g. chat-support). |
x-team-tag | Attribute usage to a team (e.g. backend). |
x-user-tag | Attribute usage to an end-user or tenant. |
anthropic-version | Pass through when calling Anthropic models. |
Tags are free-form — pick a naming scheme and they'll show up as breakdowns in the dashboard's analytics.
How it optimizes
- Semantic caching — near-duplicate prompts return a cached answer instead of paying for a fresh completion. Exact and semantic matches are reported separately.
- Model routing — requests that don't need a frontier model are routed to a cheaper one that's good enough for the task.
- Request compression — preamble stripping, repetition removal, history summarization, and tool-schema compaction shrink the tokens you send.
- Leak detection — flags wasteful patterns (runaway prompts, oversized context, retries) so you can fix them at the source.
- Analytics — every request's savings roll up by feature, team, and user in the dashboard.
Errors & status codes
| Status | Meaning & fix |
|---|---|
| 200 | Success — check the X-* headers for what was saved. |
| 401 | Missing or invalid x-proxy-key. Check the key and that it's active. |
| 402 | No active subscription on the workspace. Subscribe on the Billing page. |
| 429 | You've exceeded your plan's monthly request quota, or the short-term burst limit. Wait or upgrade. |
| 5xx | Upstream/provider errors are passed back unchanged so you can handle them as usual. |
Rate limits & quotas
Each plan includes a monthly request allowance. When you pass it, requests return 429 until the next cycle or an upgrade.
| Plan | Requests / month |
|---|---|
| Starter | 100,000 |
| Growth | 500,000 |
| Enterprise | 5,000,000 |
There's also a short-term burst limit per key to protect the service; sustained traffic well under it won't notice. A "request" is one forwarded API call — cache hits still count as requests but cost you nothing in provider spend.
FAQ
Does it work with Anthropic?
Yes — point your base URL at the proxy and send requests as you normally would, including the anthropic-version header. Your Anthropic key rides along in the usual auth header and is forwarded to Anthropic.
Will my responses change?
The response body is the provider's own output. Optera adds headers and may serve a cached answer for repeat prompts, but it doesn't rewrite the model's content.
What happens on a cache hit?
You get the cached completion immediately, X-Cache-Hit (or X-Semantic-Cache-Hit) is true, and no provider tokens are spent — the saving shows up in X-Cost-Saved-USD.
How is my data handled?
See the Privacy Policy and Terms for what's processed and stored.
Need something that isn't here? Email support@optera.dev.