6 min readUpdated May 19, 2026

GPU Cloud Pricing in 2026

From the $9/hr H100 of early 2023 to the $1.92/hr of today — what's next.

In Q2 2023 you could not get an H100 for less than $7.50/hr and you usually paid $9–11. As of this month the median neocloud price is $1.92/hr — a 78% drop. That curve isn't extrapolatable: it bends. Here's the shape we expect.

The 2023–2026 curve

QuarterH100 on-demand medianDriver
Q1 2023$9.40Supply shock, single supplier (CoreWeave + a handful)
Q3 2023$6.20Lambda + RunPod ramp
Q1 2024$4.10Spot product launches
Q3 2024$2.75Hyperscalers cut to defend share
Q1 2025$2.18H200 launches; H100 price floor visible
Q4 2025$1.99Post-IPO CoreWeave dumps short-tenor capacity
Q2 2026$1.92Asymptote forming around fully-burdened cost

Why it stops falling

An H100 SXM5 module costs NVIDIA about $3,200 to manufacture and sells through to neoclouds for ~$30,000. The all-in deployed cost per H100 — DC space, power, networking, financing, support staff — runs $42,000–$48,000 over its life. At 95% utilisation across a 4-year depreciation schedule, that's a break-even of about $1.50–$1.65/hr. Below that, the marginal neocloud loses money on every hour.

Some neoclouds will go below break-even temporarily to defend share, but they can't stay there. So expect H100 on-demand to settle in the $1.40–$1.80 band through end of life, with reserved 1-yr at $0.85–$1.05.

Where H200 lands

H200 launched in late 2024 at $5.80/hr and has been falling on a faster curve than H100 did, because the supply ramp was much steeper and the neocloud category was already mature. Current median is $3.20. We expect a $2.30–$2.60 asymptote by Q1 2027, then a long flat tail.

Where B200 / GB200 lands

B200 launched in volume in mid-2025 at $9.40/hr. GB200 NVL72 racks rent for the equivalent of $11/GPU-hr. Both are following an H100-shaped curve compressed into ~24 months. Best-guess bottoms: B200 around $4.10, GB200 effective around $5.20.

What this means strategically

  • Time-to-train per dollar is improving ~30% per year before any model-architecture gains. Plan accordingly.
  • Reserved capacity is the right product if you can predict 6–12 months out. Spot is the right product if you have checkpointing.
  • The marginal cost of training a 70B-parameter model is now well under $1M. The marginal cost of a 7B fine-tune is under $30K.
  • Inference cost per million tokens is falling faster than training cost per GPU-hr, because batching + quantisation gains compound on top.

The two things that could break the trend

First: a true AGI demand shock that resoaks all spare capacity. Possible, not base case. Second: an energy constraint — if grid interconnect queues stretch beyond 4 years in the markets where neoclouds want to build, prices stop falling because supply stops growing. That's the realistic ceiling on this decline.

Own the category this article is about.
NeoClouds.com — exact-match .com.

Related articles