Index  ›  ai  ›  The Decoder

OpenAI reportedly cut response costs for guest ChatGPT users by more than half

The Decoder Published Jun 30, 2026 Reviewed Jul 3, 2026 ✓ Reviewed by citations.press editors
Citation-ready fact
OpenAI cut inference costs for guest ChatGPT users by more than half.
more than 50 % · inference costs
OpenAI engineers
View source ↗
Citation-ready fact
DeepSeek's new open-source method speeds up inference requests by 60 to 85 percent.
at least 60 % · inference request speedat most 85 % · inference request speed
DeepSeek
View source ↗
Citation-ready fact
The number of Nvidia GPUs needed to serve guest ChatGPT users dropped to just a few hundred.
at least 100 · Nvidia GPUsat most 900 · Nvidia GPUs
View source ↗

OpenAI engineers told colleagues earlier this month that they'd managed to cut inference costs—the expense of running existing AI models—by more than half. That's according to a person familiar with the discussions, as reported by The Information.

OpenAI applied the new optimizations to ChatGPT, specifically for visitors who don't have an account. The number of Nvidia GPUs needed to serve those users dropped to just a few hundred. It's not clear how many were required before or what techniques OpenAI used to pull it off. Guest users can only access a very limited set of ChatGPT features, so whether these gains would carry over to the full product is an open question.

Deepseek also just dropped a new open-source method that can speed up inference requests by 60 to 85 percent. The freed-up resources could go toward scaling services, better models, faster responses, or bigger margins. But since data center buildouts are moving slowly, gains like these will probably give labs more breathing room rather than cut into chip demand.

This article was originally published by The Decoder ↗. citations.press indexes the source-backed facts above and links to the original. Something wrong? Corrections policy · Report an error