Track Claude API costs per agent — with hard budget caps Anthropic doesn't give you.
PromptCost is a drop-in proxy for the Anthropic API. Swap api.anthropic.com for api.promptcost.io/anthropic, add two headers, and every Claude call gets tagged: cost, input/output tokens, latency, model — broken down by agent. Plus the one feature Anthropic doesn't offer: hard caps that block at 429.
Free forever · No card · Indie plan $9/mo for first 50 users
The Anthropic-specific gap
Anthropic's billing UI gives you usage alerts and a Usage API. What it doesn't give you is the thing teams actually want: a hard cap that stops requests when the budget is hit. There's no equivalent of "fail at $X." If a Claude-powered agent goes into a loop, you'll get an alert email and the bill keeps growing.
PromptCost adds that layer. The proxy maintains a per-agent spend counter, and once an agent's monthly cap is hit, the request is rejected with a 429 — Anthropic never sees it, and you don't pay for it.
Setup in 60 seconds
Get a PromptCost key
Sign up free at admin.promptcost.io, create a workspace, and generate an sk-pc- key.
Swap the endpoint
# Before POST https://api.anthropic.com/v1/messages # After POST https://api.promptcost.io/anthropic/v1/messages
Add the headers
x-api-key: sk-ant-•••••••••• # your Anthropic key anthropic-version: 2023-06-01 cg-key: sk-pc-••••3f9a # promptcost key cg-agent: "lead-scorer" # your agent name Content-Type: application/json
The body is identical to Anthropic's API. Same JSON shape, same model names, same parameters.
Use the Anthropic SDK if you want
from anthropic import Anthropic
client = Anthropic(
base_url="https://api.promptcost.io/anthropic",
api_key="sk-ant-••••••••••",
default_headers={
"cg-key": "sk-pc-••••3f9a",
"cg-agent": "lead-scorer",
},
)
Supported models
- Claude Haiku 4.5 — cheapest, fastest
- Claude Sonnet 4.6 — balanced (the workhorse for most agents)
- Claude Opus 4.7 — most capable
- Plus Claude 3.5 Haiku, Claude 3.5 Sonnet, Claude 3 Opus and other supported versions
Pricing tables are updated automatically when Anthropic changes rates.
What you get
- Per-agent cost breakdown — tag any request with a name and see it grouped on the dashboard.
- Hard budget cap — set a USD limit per agent. PromptCost returns 429 before forwarding to Anthropic. You don't pay for blocked calls.
- Tool use tracked — function calls and tool definitions are counted in token usage.
- Streaming responses work natively — the SSE stream passes through; usage logs after stream completion.
- Prompt caching cost — cache reads and writes are priced separately, exactly as Anthropic bills them.
- Zero key storage — your Anthropic key passes through as a header.
- ~5–15ms overhead — async logging never blocks the response path.
FAQ
Is prompt caching tracked correctly?
Yes. Anthropic returns separate cache_read_input_tokens and cache_creation_input_tokens; PromptCost prices them at the correct discounted/premium rates.
Does this work with the Messages Batches API?
Yes — batch requests are tracked at Anthropic's discounted batch rate.
Tool use / function calling?
Tracked. Both the tool schemas (input tokens) and the tool call arguments (output tokens) count toward usage.
What about Vision (image input)?
Image tokens are counted exactly as Anthropic bills them.
Does the proxy support extended thinking / reasoning models?
Yes. Reasoning tokens are billed at the model's output rate, same as Anthropic.