Track OpenAI API costs per agent — without the runaway-bill anxiety.
PromptCost is a drop-in proxy for the OpenAI API. Swap api.openai.com for api.promptcost.io/openai, add two headers, and every call gets tagged: cost, tokens, latency, model — broken down by agent. With hard budget caps that return 429 before forwarding.
Free forever · No card · Indie plan $9/mo for first 50 users
Why OpenAI's own dashboard isn't enough
OpenAI gives you a usage dashboard and a Usage API. Both are useful, both have real limits when you're running multiple agents:
- The dashboard lags. By the time a runaway loop shows up, the damage is done. Several developers have reported $30K–$72K incidents where the spike registered hours after the fact.
- "Hard limits" aren't hard. OpenAI removed real-time hard caps in 2024 in favor of alerts and delayed enforcement. Your "monthly limit" emails you. It does not block requests in real time.
- No native per-agent attribution. If twelve agents share one API key, the dashboard shows you one total, not who spent what.
Setup in 60 seconds
Get a PromptCost key
Sign up free at admin.promptcost.io, create a workspace, and generate an sk-pc- key.
Swap the endpoint
# Before POST https://api.openai.com/v1/chat/completions # After POST https://api.promptcost.io/openai/v1/chat/completions
All paths under /v1/* are proxied as-is — chat completions, embeddings, audio, batch, etc.
Add the headers
Authorization: Bearer sk-•••••••••• # your OpenAI key cg-key: sk-pc-••••3f9a # promptcost key cg-agent: "lead-scorer" # your agent name Content-Type: application/json
The body is identical to OpenAI's API. Nothing else changes.
Use the OpenAI SDK if you want
from openai import OpenAI
client = OpenAI(
base_url="https://api.promptcost.io/openai/v1",
api_key="sk-••••••••••",
default_headers={
"cg-key": "sk-pc-••••3f9a",
"cg-agent": "lead-scorer",
},
)
Works with the official Python and Node SDKs out of the box — both expose base_url and custom headers.
Supported models
All current OpenAI models on the Chat Completions, Responses, Embeddings, and Audio endpoints are supported. Pricing tables are updated automatically when OpenAI changes rates.
- Chat: GPT-4o, GPT-4o mini, GPT-4 Turbo, o1, o1-mini, o3, o3-mini
- Embeddings: text-embedding-3-large, text-embedding-3-small
- Audio: whisper-1, gpt-4o-mini-transcribe, gpt-4o-mini-tts
- Batch API: all of the above at half cost
What you get
- Per-agent cost breakdown — tag any request with a name and see it grouped on the dashboard.
- Hard budget cap — set a USD limit per agent. PromptCost returns 429 before forwarding. You don't pay for blocked calls.
- Streaming responses work natively — SSE passes through; usage logs after stream completion.
- Function/tool calls tracked — input + output tokens both counted; tool definitions included in cost.
- Zero key storage — your OpenAI key passes through as a header.
- ~5–15ms overhead — async logging never blocks the response path.
FAQ
Does this work with Assistants API / threads?
Yes. The proxy supports the Assistants and Responses APIs. Each run is logged as one cost-tracked event.
What about the Batch API?
Yes — batch requests are tracked at the discounted batch rate.
Function calls and tool use?
Tracked. The proxy logs input tokens (including the tool schemas) and output tokens (including tool call arguments).
Will OpenAI see this as a different IP?
Yes — requests go from PromptCost's infra to OpenAI. If you have IP-restricted keys, allowlist api.promptcost.io's outbound range (provided in dashboard).