/ 01 — Introduction

Cut your LLM bill. Change one line.

Optera is a drop-in proxy that sits between your app and your model provider. Point your base URL at it, keep everything else the same, and it trims cost on every request — then shows you exactly what it saved.

Under the hood Optera applies semantic caching, model routing, and request compression to each call, returning the provider's normal response plus a set of headers that report tokens and dollars saved. It speaks the OpenAI API format, so most SDKs work with a one-line change.

You'll need an Optera proxy key and an active plan. Create both in the dashboard → API keys and Billing.

/ 02 — Quickstart

Quickstart

Three changes: point the base URL at Optera, add your proxy key header, and keep sending your provider key exactly as before.

Python — OpenAI SDK

pythonopenai

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_OPENAI_KEY",
    base_url="https://proxy.optera.dev/v1",   # ← the only change
    default_headers={"x-proxy-key": "YOUR_PROXY_KEY"},
)

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)

Node — OpenAI SDK

javascriptopenai

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://proxy.optera.dev/v1",        // ← the only change
  defaultHeaders: { "x-proxy-key": process.env.OPTERA_PROXY_KEY },
});

curl

shellchat/completions

curl https://proxy.optera.dev/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "x-proxy-key: $OPTERA_PROXY_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}' -i

The -i flag prints the response headers — that's where you'll see what Optera saved on the call.

/ 03 — Authentication

Authentication

Every request carries two credentials, and they do different jobs:

Authorization: Bearer <provider key> — your own OpenAI (or Anthropic) key. Optera forwards this to the provider; it's never the thing that authenticates you to Optera.
x-proxy-key: <proxy key> — your Optera key. This identifies your workspace, enforces your plan, and is how usage and savings are attributed.

Generate and rotate proxy keys in the dashboard under API keys. Treat them like secrets — anyone with a proxy key can spend against your plan.

Requests are rejected with 402 until your workspace has an active subscription. Start one on the Billing page.

/ 04 — Reference

Response headers

Optera returns the provider's normal response body untouched, and adds headers describing what it did. The ones you'll use most:

Header	Meaning
`X-Tokens-Saved`	Tokens avoided on this request versus sending it raw.
`X-Cost-Saved-USD`	Estimated dollars saved on this request.
`X-Cache-Hit`	`true` if served from the exact-match cache.
`X-Semantic-Cache-Hit`	`true` if served from a semantically similar cached response.
`X-Model-Routed`	The model Optera actually used after routing.
`X-Original-Model`	The model you requested, before routing.

Optimization steps each set a flag when they fire, so you can see precisely what was applied:

Header	Set when
`X-Request-Compressed`	The outgoing request payload was compressed.
`X-Preamble-Stripped`	Boilerplate preamble was removed from the prompt.
`X-Repetition-Removed`	Redundant repeated content was collapsed.
`X-Conversation-Summarized`	Long history was summarized to fit fewer tokens.
`X-Tools-Compressed`	Tool/function schemas were compacted.
`X-Image-Optimized`	Image payloads were optimized before sending.
`X-Provider-Cache-Hit` · `X-Provider-Cache-Savings-USD`	The provider's own prompt cache was hit, with its savings.

Request options

Optional request headers let you segment analytics and work with Anthropic:

Header	Purpose
`x-feature-tag`	Attribute spend/savings to a product feature (e.g. `chat-support`).
`x-team-tag`	Attribute usage to a team (e.g. `backend`).
`x-user-tag`	Attribute usage to an end-user or tenant.
`anthropic-version`	Pass through when calling Anthropic models.

Tags are free-form — pick a naming scheme and they'll show up as breakdowns in the dashboard's analytics.

How it optimizes

Semantic caching — near-duplicate prompts return a cached answer instead of paying for a fresh completion. Exact and semantic matches are reported separately.
Model routing — requests that don't need a frontier model are routed to a cheaper one that's good enough for the task.
Request compression — preamble stripping, repetition removal, history summarization, and tool-schema compaction shrink the tokens you send.
Leak detection — flags wasteful patterns (runaway prompts, oversized context, retries) so you can fix them at the source.
Analytics — every request's savings roll up by feature, team, and user in the dashboard.

/ 05 — Operations

Errors & status codes

Status	Meaning & fix
200	Success — check the `X-*` headers for what was saved.
401	Missing or invalid `x-proxy-key`. Check the key and that it's active.
402	No active subscription on the workspace. Subscribe on the Billing page.
429	You've exceeded your plan's monthly request quota, or the short-term burst limit. Wait or upgrade.
5xx	Upstream/provider errors are passed back unchanged so you can handle them as usual.

Rate limits & quotas

Each plan includes a monthly request allowance. When you pass it, requests return 429 until the next cycle or an upgrade.

Plan	Requests / month
Starter	`100,000`
Growth	`500,000`
Enterprise	`5,000,000`

There's also a short-term burst limit per key to protect the service; sustained traffic well under it won't notice. A "request" is one forwarded API call — cache hits still count as requests but cost you nothing in provider spend.

/ 06 — FAQ

FAQ

Does it work with Anthropic?

Yes — point your base URL at the proxy and send requests as you normally would, including the anthropic-version header. Your Anthropic key rides along in the usual auth header and is forwarded to Anthropic.

Will my responses change?

The response body is the provider's own output. Optera adds headers and may serve a cached answer for repeat prompts, but it doesn't rewrite the model's content.

What happens on a cache hit?

You get the cached completion immediately, X-Cache-Hit (or X-Semantic-Cache-Hit) is true, and no provider tokens are spent — the saving shows up in X-Cost-Saved-USD.

How is my data handled?

See the Privacy Policy and Terms for what's processed and stored.

Need something that isn't here? Email support@optera.dev.