Start free →
Cost engineering

How a single OpenAI retry loop burned $72,000 overnight — and how to stop yours

2026-05-02 · 8 min read · PromptCost team

Here's a story you'll find on the OpenAI developer forum, paraphrased only slightly:

"I went to bed with $200 in credits. I woke up at 7am to a $72,000 bill. A retry loop in our agent had been hammering GPT-4o all night."

This isn't an outlier. The OpenAI community forum is full of these threads — people losing hundreds or thousands of dollars to a single misconfigured loop, a forgotten while True:, or an agent that decides to recursively call itself when it doesn't know what to do.

And here's the kicker most teams don't realize: OpenAI removed hard budget caps in 2024. The "Monthly budget" field in your billing dashboard is now an alert, not a stop. It will email you. It will not cut you off.

Why this happens more often than people admit

According to Zylo's 2026 SaaS Management Index, 78% of IT leaders saw unexpected charges tied to AI consumption last year. The pattern is consistent across the incidents we've seen:

Why "set a budget alert" isn't enough

OpenAI's billing UI lets you set:

If a runaway loop spends $30,000 in three hours — which has happened — the alert email might arrive at hour 1, but the loop runs unchecked because nobody's awake to kill the process. The "hard limit" enforcement happens on OpenAI's billing cycle, not in real time on the API path. Several developers have reported continuing to be charged well past their stated cap.

And on Anthropic? There's no equivalent of a hard cap at all. You set up usage alerts and that's it.

The uncomfortable truth: the only way to actually stop a runaway agent before it bills you is to put something between your code and the provider that enforces a budget on the request path itself.

The proxy pattern that actually works

The fix is structural, not procedural. Instead of:

# Your code
POST https://api.openai.com/v1/chat/completions

You route through a thin proxy that maintains a real-time spend counter per agent or per project, and returns a 429 Budget Exceeded before the request ever reaches OpenAI:

# Same request, different host
POST https://api.promptcost.io/openai/v1/chat/completions
cg-key:    sk-pc-••••3f9a     # identifies your workspace
cg-agent:  "lead-scorer"        # tags this request
x-api-key: sk-••••••••••        # your provider key, untouched

What changes:

What "good" looks like

A safe LLM cost setup, in our opinion, has four properties:

  1. Per-agent attribution. If you run twelve agents on one provider key, you need to know which one is responsible for spend. "Total monthly bill" is too coarse to act on.
  2. Real-time enforcement on the request path. Polling the usage API every 15 minutes is not real time. Counters that increment on the request itself are.
  3. Hard caps that block. An alert is information; a 429 is a fix.
  4. Zero key storage. Whatever sits between you and OpenAI shouldn't persist your provider keys. Headers in, headers out.

What about retry loops specifically?

The proxy pattern protects you from yourself in three layers:

If you're rolling your own

You don't need a service to do this. The pattern is small:

The reason most teams don't roll their own is that maintaining the pricing table alone is a part-time job, and getting the budget enforcement right (especially with streaming responses) has nasty edge cases.

Or just use PromptCost.

Drop-in proxy for OpenAI & Anthropic. Per-agent cost tracking. Hard budget caps that return 429 before forwarding. Free forever, no credit card.

Start free →

Further reading