THE $400 BILLION DATACENTER

Journalist

THE $400 BILLION DATACENTER

THE $400 BILLION DATACENTER

What Big Tech is actually building, why it’s sitting idle, and what the numbers really mean

PART III

eyesonsuriname

Amsterdam, 3 Nov. 2025–There’s a datacenter outside Des Moines, Iowa, that cost $10 billion to build. It covers 2.1 million square feet – roughly 36 football fields. Inside are 50,000 GPU servers, each containing 8 NVIDIA H100 chips. The electricity consumption equals that of 180,000 homes.

Data Center, Des Moines, Iowa

On an average Tuesday afternoon, about 62% of those GPUs are doing… something. Training models, running inference, processing workloads. The other 38% are idle, consuming power, depreciating, one day closer to obsolescence.

This is not an Amazon datacenter, or a Google one, or Meta’s. This is a composite – a simplified snapshot of what’s actually happening across dozens of facilities that Big Tech has built or is building right now.

Over the next three years, they’ll spend $400 billion on this infrastructure. That’s more than the GDP of Austria. More than the combined market value of Boeing, Intel, and IBM.

Let’s understand what they’re buying, why so much of it sits unused, and what it means when hardware with a 1-3 year lifespan is built for demand that may arrive in 5-10 years.


WHAT IS A GPU AND WHY DOES AI NEED SO MANY?

Start with the basics, because the technology matters.

A GPU – Graphics Processing Unit – was originally designed to render video game graphics. The key insight: instead of doing one complex calculation at a time (like a CPU), a GPU does thousands of simple calculations simultaneously.

Turns out, that’s also perfect for AI.

Training an AI model like GPT-4 or Claude requires processing billions of examples, looking for patterns, adjusting billions of parameters. Each adjustment is relatively simple math. But you need to do it trillions of times.

A modern AI training run might involve:

  • 1 trillion parameters (variables to optimize)
  • 10 trillion tokens of training data (words/pieces of text)
  • Weeks or months of continuous computation
  • Thousands of GPUs working in parallel

The math is staggering:

Training GPT-4 required an estimated 25,000 NVIDIA A100 GPUs running for 90-100 days. At current cloud rates, that’s $50-100 million in compute costs alone.

Training the next generation models (GPT-5, Gemini Ultra, Claude 4) could require 10x more: 250,000 GPUs, $500 million to $1 billion per training run.

This is why Big Tech is buying GPUs by the hundreds of thousands.

Not because they’re wasteful. Because at this scale, you literally cannot train state-of-the-art AI models without them.


THE INVESTMENT BREAKDOWN

Let’s follow the money. Where is that $400 billion actually going?

Hardware: ~$200-250 billion

  • NVIDIA H100/H200 GPUs: $25,000-40,000 per chip
  • Servers to house them: networking, cooling, power systems
  • Each “GPU server” (8 chips + infrastructure): $300,000-500,000
  • At scale: 400,000-800,000 GPU servers across the industry

Infrastructure: ~$100-150 billion

  • Datacenter construction and expansion
  • Power systems (these facilities need 50-300 megawatts each)
  • Cooling systems (GPUs generate enormous heat)
  • Networking infrastructure (moving data between GPUs)

Energy: ~$30-50 billion

  • Electricity to run the facilities
  • A single large AI datacenter: 200-300 megawatts continuous
  • Annual cost: $100-200 million in electricity alone

Other: ~$20-50 billion

  • Software, talent, acquisitions
  • Research and development
  • Geographic expansion

The Dashboard – AI Infrastructure 2025

  • Total investment: $300-400 billion
  • Number of GPU datacenters: 50-80 globally
  • Total GPUs deployed: 2-3 million (estimated)
  • Average cost per GPU (installed): $150,000-200,000
  • Average utilization rate: 60-70%
  • Annual energy consumption: equivalent to ~5 million homes

WHY 60-70% UTILIZATION?

Here’s where it gets interesting. Because “60-70% utilized” doesn’t mean what most people think it means.

I spoke with a datacenter architect who’s built facilities for two of the Big Four (anonymously, naturally). He walked me through the reality:

“People hear ‘idle GPUs’ and imagine computers sitting dark and unused. That’s not how this works.”

There are several types of “unutilized” capacity, and they’re not all waste:

1. Peak capacity reserves (15-20% of total) “You don’t build a highway for average traffic. You build for rush hour. Same with AI infrastructure. When someone starts a major training run, they need 10,000 GPUs immediately. If you don’t have them available, they go to your competitor.”

2. Redundancy and failover (10-15%) “At this scale, hardware fails constantly. GPUs die. Servers crash. Network connections drop. You need spare capacity to maintain uptime. A single failed GPU in a 10,000-GPU training run can corrupt the entire job.”

3. Development and testing (10-15%) “You can’t test experimental models on production systems. You need dedicated capacity for R&D, testing, debugging. That looks ‘idle’ from outside but it’s essential.”

4. Geographic distribution (5-10%) “AI inference (actually using the models) needs to be close to users for low latency. That means you need capacity in North America, Europe, Asia. Each region has its own utilization patterns.”

5. Actually idle (10-20%) “And yes, some capacity is genuinely underutilized. We built for projected 2026 demand. It’s 2025. We’re ahead of the curve.”

Add it up: 15 + 15 + 15 + 10 + 15 = 70% “in use” by various definitions, 30% genuinely idle.

Is that reasonable?

THE COMPARISON PROBLEM

To understand if 60-70% is good or bad, we need context.

Traditional cloud datacenters: 40-60% average utilization Amazon’s AWS, Microsoft’s Azure, Google Cloud Platform all run at roughly 50% average utilization. That’s considered healthy and efficient.

Why so low?

  • Cloud needs to handle spikes (Black Friday, product launches, viral content)
  • Different time zones create uneven demand
  • Enterprise customers demand guaranteed availability
  • Redundancy for reliability

Manufacturing plants: 70-85% utilization A car factory running at 75% is considered well-managed. Below 60% is concerning. Above 90% means you’re risking breakdowns and can’t handle disruptions.

Power plants: 50-70% utilization Electrical grids maintain massive overcapacity for peaks. Average utilization of 50-60% is normal. It looks “wasteful” until there’s a heat wave and everyone turns on air conditioning simultaneously.

So where does AI infrastructure fit?

The datacenter architect’s take:

“For brand-new technology with explosive growth projections, 60-70% is actually pretty good. If we were at 90%, I’d be worried – that means we’re capacity-constrained and turning away business. If we were at 40%, I’d be concerned we overbuilt.

“60-70% means we have runway. We can handle growth. We can take on big customers. We have redundancy for reliability.

“Is it expensive? Yes. But is it wasteful? Not by datacenter standards.”

BUT THERE’S A CATCH

Here’s where Big Tech’s defense starts to crack.

Everything above assumes one critical thing: that demand will grow to fill the capacity.

For traditional cloud, that happened. AWS built capacity, and within 2-3 years, enterprises migrated. The “idle” capacity became fully utilized.

For AI, that hasn’t happened yet.

And there’s a more fundamental problem: GPU lifespan.

The depreciation timeline:

A traditional server: 5-10 year useful life A GPU for AI: 1-3 years before it’s functionally obsolete

Why so short?

1. Rapid technological advancement

  • NVIDIA releases new GPU generations every 1.5-2 years
  • Each generation: 2-3x performance improvement
  • Your $40,000 H100 from 2023 is outclassed by the $35,000 H200 in 2025

2. Extreme operational stress

  • AI training runs GPUs at 100% capacity, 24/7
  • Heat cycling causes physical degradation
  • Failure rates: ~9% annually, 25%+ over three years

3. Software optimization

  • New models are optimized for new hardware
  • Running 2027 models on 2024 GPUs is like running Windows 11 on a 2015 laptop – technically possible, practically painful

The math gets ugly fast:

If you spend $200 billion on GPUs in 2024-2025, and they need replacement in 2027-2028, you’ve got a 3-year window to extract value.

If demand takes 5 years to materialize, you’re replacing hardware before it ever paid for itself.

This is the real risk.

Not that 60-70% utilization is too low. But that the remaining 30-40% never gets utilized before the hardware becomes obsolete.

THE UTILIZATION TREND

So is utilization increasing? Are they filling that capacity?

Mixed signals.

Microsoft (internal data, shared off-record):

  • Q1 2024: 45% average GPU utilization
  • Q4 2024: 58%
  • Q1 2025: 68%

Trending up. Good news.

But:

  • Much of that increase is internal R&D (training their own models)
  • Enterprise customer workloads (the real revenue) are growing slower
  • GitHub Copilot, Office AI features: incremental adoption, not explosive

Meta (public earnings calls + analysis):

  • Heavy utilization for internal products (feed algorithms, content moderation, ad targeting)
  • Llama model training
  • External customer usage: minimal (Meta doesn’t really sell AI services)

Amazon AWS (from customer conversations):

  • Strong growth in AI services (80%+ year-over-year)
  • But from a small base ($5-8 billion annual run rate)
  • Most of AWS’s $100 billion revenue is still traditional cloud

Google Cloud (public data + analyst estimates):

  • AI services growing fast in percentage terms
  • Absolute numbers still small relative to infrastructure investment
  • Most revenue still from traditional search and ads

The pattern:

Utilization is increasing, but slowly. Growth is real but not explosive. Internal usage (training proprietary models) is high. External customer usage (the actual business model) is lagging.

THE REAL NUMBERS

Let me put this in perspective with a thought experiment.

Scenario: Microsoft’s AI investment

  • 2025 AI capex: $80 billion
  • Expected useful life: 3 years
  • Annual depreciation: $26.7 billion/year
  • Additional operating costs (power, cooling, staff): ~$8 billion/year
  • Total annual cost: ~$35 billion/year

To break even on that investment, Microsoft needs to generate $35 billion in additional annual revenue from AI.

Their current AI revenue (GitHub Copilot, Office AI, Azure AI services): estimated $8-12 billion annually.

Gap to fill: $23-27 billion/year.

That’s not impossible. But it requires:

  • Tripling current AI revenue within 2-3 years
  • Maintaining growth as hardware ages
  • Beating competitors who are making identical bets

Now multiply across the industry:

  • Combined Big Tech AI investment: $400 billion
  • Depreciation + operating costs: ~$180-200 billion/year
  • Current combined AI revenue (estimated): $40-60 billion/year
  • Gap: $120-160 billion/year

To justify the investment, the industry needs to roughly triple AI revenue within 2-3 years.

Is that realistic?

THREE SCENARIOS

Scenario A: The Bull Case AI adoption accelerates. The 95% pilot failure rate drops to 60% as tooling improves. Enterprise spending goes from $20 billion/year to $150 billion/year by 2027. The infrastructure becomes fully utilized. Hardware refresh cycles are funded by cash flow. Big Tech wins big.

Probability: 20-25%

Scenario B: The Muddle AI adoption happens, but slower than projected. Revenue grows from $50 billion to $100 billion by 2027 – real growth, but not enough to justify the investment. Utilization stays at 65-70%. Some hardware gets replaced, some gets written off early. Companies take moderate losses but survive. No one “wins” but no one catastrophically loses.

Probability: 50-60%

Scenario C: The Crash Enterprise adoption stalls. The 95% failure rate persists. Revenue grows to only $60-70 billion by 2027. Utilization drops as companies slow investment. Hardware ages out unused. Big Tech faces $100+ billion in write-downs. Stock prices crater. Layoffs accelerate. The bubble pops.

Probability: 20-25%

VOICES FROM THE FIELD

“I manage GPU purchasing for a major cloud provider. We’re buying tens of thousands per quarter. Do I think we’ll use all of them? Honestly? I don’t know. But I know if we don’t buy them and the competitor does, we lose. So we buy.”
– Infrastructure VP, Big Tech (anonymous)

“The utilization numbers we report publicly are… let’s say ‘optimistic.’ We count things as ‘utilized’ that most people would call idle. But everyone does it, so it’s fair, right?”
– Datacenter Operations Manager (anonymous)

“We’re building for a future we believe is coming. But belief and certainty are different things. This is the biggest bet in tech history. We’re either visionaries or idiots. We’ll know in three years.”
– CFO, Fortune 100 tech company (on background)

THE QUESTION THAT HAUNTS

Here’s what keeps me up at night about this story:

Big Tech is spending $400 billion on infrastructure with a 1-3 year useful life, betting on demand that may take 5-10 years to materialize.

If they’re wrong about timing – not about AI’s potential, just about the timeline – they’ll be replacing obsolete hardware before they ever extracted value from it.

That’s not a margin problem. That’s not a stock price correction.

That’s a write-off of historic proportions.

And they’re funding it by firing 176,000 people.

In the next essay: “The Customers Who Didn’t Come” – why 95% of enterprise AI pilots fail, and what it means for that $400 billion bet.

PART III

eyesonsuriname

LET’S KEEP IN TOUCH!

The stories on antonfoek seem to delight the producers, readers and writers alike.
Presumably appealing to their diverse interests as a reflection of life itself.
I have had the privilege of looking after and reading several issues times and times over again. And on each occasion I have been struck by the breadth and unexpectedness of the topics that get pitched.
Somehow, they all fall in together to make a satisfying whole, leaving us readers behind with a hunger for more.

We don’t spam! Read our privacy policy for more info.

Leave a Reply

Your email address will not be published. Required fields are marked *