Start free →
Python integration

Track OpenAI and Claude costs from Python — without installing another SDK.

PromptCost works with the official openai and anthropic Python SDKs natively. No extra package. No decorators. No instrumentation. Just set base_url and pass two headers — the same per-agent cost tracking and hard budget caps you'd get from any HTTP client.

Free forever · No card · Indie plan $9/mo for first 50 users

Why a proxy beats SDK instrumentation

Most LLM observability tools in Python ask you to install a package, wrap your client, or sprinkle decorators. They work, but they couple your code to the tracker. If you switch tools later, you're rewriting code.

PromptCost is just an HTTP endpoint. Your code stays vanilla — the official OpenAI and Anthropic SDKs you already use. Switch to or away from PromptCost by changing base_url, nothing else.

OpenAI (Python SDK)

01 —

Configure the client once

from openai import OpenAI

client = OpenAI(
    base_url="https://api.promptcost.io/openai/v1",
    api_key="sk-••••••••••",         # your OpenAI key
    default_headers={
        "cg-key":   "sk-pc-••••3f9a",
        "cg-agent": "lead-scorer",
    },
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
)

Every call from this client is automatically tagged to the lead-scorer agent. Switch agents per-call by passing extra_headers={"cg-agent": "..."} to create().

Anthropic (Python SDK)

02 —

Same pattern

from anthropic import Anthropic

client = Anthropic(
    base_url="https://api.promptcost.io/anthropic",
    api_key="sk-ant-••••••••••",
    default_headers={
        "cg-key":   "sk-pc-••••3f9a",
        "cg-agent": "email-drafter",
    },
)

response = client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=1000,
    messages=[{"role": "user", "content": "Draft an email"}],
)

Per-call agent tagging

If you have multiple agents in one process, pass the agent name per request:

# OpenAI
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[...],
    extra_headers={"cg-agent": "summarizer"},
)

# Anthropic
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1000,
    messages=[...],
    extra_headers={"cg-agent": "summarizer"},
)

Async clients

Both AsyncOpenAI and AsyncAnthropic work identically:

from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://api.promptcost.io/openai/v1",
    api_key="sk-••••••••••",
    default_headers={"cg-key": "sk-pc-...", "cg-agent": "my-agent"},
)

async def run():
    return await client.chat.completions.create(...)

Streaming

SSE streaming passes through unchanged. Token usage is logged when the stream completes:

with client.messages.stream(
    model="claude-haiku-4-5",
    max_tokens=1000,
    messages=[...],
) as stream:
    for chunk in stream.text_stream:
        print(chunk, end="")

Handling 429 (budget exceeded)

When an agent hits its monthly cap, PromptCost returns a 429 with a structured body. The SDKs surface this as RateLimitError — handle it like any rate limit, except this one means "you ran out of budget":

from openai import RateLimitError

try:
    response = client.chat.completions.create(...)
except RateLimitError as e:
    if e.body.get("error") == "budget_exceeded":
        # your agent hit its monthly cap
        notify_slack(e.body["agent"], e.body["message"])
    else:
        raise

What you get

FAQ

Does this work with LangChain / LlamaIndex?

Yes — they use the underlying SDKs. Set base_url and headers on the client you pass in. For LangChain's ChatOpenAI, use openai_api_base and default_headers.

Does this work with Instructor / Pydantic-based clients?

Yes. Instructor wraps the OpenAI client; configure that client with the proxy base_url and headers, then pass it to Instructor.

What about Azure OpenAI?

Azure is on the roadmap but not yet supported. For now PromptCost proxies OpenAI's direct API and Anthropic's direct API.