This website uses cookies

Read our Privacy policy and Terms of use for more information.

In partnership with

Three frontier models shipped this week. One costs $5/$30 per million tokens. Two are Chinese, open-source, free, and a problem.

The White House accused China of "industrial-scale" distillation of US frontier AI systems. The same week, SpaceX offered $60 billion for a coding platform whose model was distilled from a Chinese one. 

Distillation, it turns out, goes both ways.

But only one of those is a problem.

I'm Ben Baldieri, and every week I break down the moves shaping GPU compute, AI infrastructure, and the data centres that power it all.

Here's what's inside this week:

Let's get into it.

Today's issue is brought to you by hosted·ai

The GPU lives on sponsorships. If you value independent, no-fluff analysis of the AI infrastructure market, the sponsors are who make it possible. I'd appreciate you checking them out.

Struggling to find B200s for your project?

For hosted·ai, GPU availability is not a supply chain problem.

It’s an orchestration problem, a utilisation problem, and a waste problem. Fix that, and everyone gets better access to more cost-effective GPUs.

The GPU Audio Companion Issue #103

Want the GPU breakdown without the reading? The Audio Companion does it for you, but only if you’re subscribed. If you can’t see it below, click here to fix that.

Anthropic's Revenue Tripled in Four Months. It Just Committed $100 Billion to Keep Up.

$30 billion run-rate. $9 billion four months ago.

Anthropic and Amazon are securing up to 5GW of capacity for Claude. Anthropic is committing $100 billion over ten years to AWS, covering Amazon's custom server chips (Graviton) and AI accelerators (Trainium2 through Trainium4). Amazon CEO Andy Jassy is investing $5 billion today, with up to $20 billion more to come. Over 1 million Trainium2 chips are already in use. Nearly 1GW comes online by year end. Over 100,000 customers run Claude on Bedrock.

Why this matters:

  • The $100 billion is sized to demand that exists. Anthropic acknowledged that growth "has impacted reliability and performance" across all tiers during peak hours. 

  • The strain is in pricing too: Anthropic tested removing Claude Code from its $20/month Pro plan this week before walking it back as a "test on ~2% of new prosumer signups." Pro subscribers consume tokens worth up to 10x their fee. Current plans, per Anthropic head of growth Amol Avasare, "weren't built for this." The 5GW commitment is the infrastructure fix. A pricing restructure is the commercial fix. Both are coming.

  • Separately, Anthropic shipped Cowork on 3P, a deployment mode that routes Claude's desktop app through the customer's own cloud (Bedrock, Vertex AI, or Azure Foundry) rather than Anthropic's API. Zero conversation data leaves the customer's environment. Compliance is inherited from their existing cloud provider. For regulated enterprises in finance, healthcare, and defence that cannot send data to a third-party API, this removes the last barrier.

SpaceX Options Cursor for $60 Billion

xAI lost $6.4 billion in 2025. Grok trails every frontier coding model. Musk's answer: pay $60 billion for a product distilled from the same kind of Chinese open-source model the White House spent this week accusing China of stealing.

SpaceX has an option to acquire Anysphere, Cursor's parent company, for $60 billion this year. If it walks, it pays $10 billion. Cursor is an AI-powered code editor used by millions of developers. It was valued at $29 billion in November. Revenue surpassed $2 billion. Its own Composer model was distilled from Chinese AI lab Moonshot AI's open-source Kimi K2.5.

Why this matters:

  • The deal gives each side what it cannot build alone. Cursor gets access to Colossus, SpaceX's GPU supercomputer in Memphis, to train better coding models. xAI gets Cursor's dev traces: the prompts developers type, the code they accept or reject, the edits they make after accepting suggestions, the errors they hit, the context they provide. 

  • Millions of professional developers generating those signals daily is a dataset no lab can replicate synthetically or buy on the open market. Whether those traces are worth $60 billion depends on whether they can close the gap between Grok and Claude on coding benchmarks. So far, nothing else has.

  • The White House accused China this week of "industrial-scale" distillation of US frontier AI systems using tens of thousands of proxy accounts. SpaceX is offering $60 billion for a product whose model was distilled from Chinese open-source technology. Distillation is apparently theft when China does it to American models and a $60 billion acquisition target when an American company does it to Chinese ones.

A NASDAQ-Listed DePIN Company Just Signed a $260 Million Enterprise GPU Contract

$260 million. 2,304 B300s. Delivered through a decentralised network.

Axe Compute (NASDAQ: AGPU), an enterprise GPU-as-a-Service provider, signed a multi-year contract deploying 2,304 NVIDIA B300 GPUs through Aethir, a DePIN (decentralised physical infrastructure network) that pools GPU capacity from distributed providers and coordinates it through token incentives. Aethir claims 430,000 GPU containers across 200+ locations in 94 countries.

Why this matters:

  • 2,304 B300s in a "unified cluster" is real hardware. The question: does "decentralised" mean geographically distributed nodes, or co-located GPUs marketed under a token structure? The first limits training workloads. The second is a neocloud with a blockchain wrapper. The press release does not clarify.

  • Second NASDAQ-listed GPUaaS play in three weeks, after Allbirds rebranded as NewBird AI with $50M and no GPUs. Axe Compute at least has a $260M contract and named hardware. The pattern is the same: public shell, GPU narrative, narrative capital. The substance varies.

  • The enterprise compliance question is the real test. SLA-backed uptime, data residency guarantees, and audit trails are table stakes for production AI workloads. Axe Compute claims enterprise-grade delivery. Whether a decentralised network can match the compliance posture of CoreWeave or Nebius at contract scale is unproven.

Three Frontier Models Shipped This Week. Two Were Chinese, Open-Source, and Free.

GPT-5.5, Kimi K2.6, and DeepSeek V4 all landed in the same seven days.

OpenAI released GPT-5.5, codenamed "Spud," to paid subscribers in ChatGPT and Codex. 88.7% SWE-bench Verified (vs Opus 4.7's 85.4%). 60% fewer hallucinations than 5.4. Pricing: $5/$30 standard, $30/$180 Pro. 900 million weekly active users. 50 million subscribers.

The same week, Chinese AI lab Moonshot AI shipped Kimi K2.6 open-source with agent swarms scaling to 300 sub-agents across 4,000 coordinated steps. K2.6 beats GPT-5.4 on SWE-Bench Pro (58.6 vs 57.7). Free. Moonshot is backed by Tencent, China's largest internet company, which also unveiled its own Hy3 model this week, led by Tencent chief AI scientist Yao Shunyu, recruited from OpenAI.

Then, DeepSeek released V4 preview. V4-Pro: 1.6 trillion total parameters, 49 billion active, with performance DeepSeek claims rivals the top closed-source models. V4-Flash: 284 billion total, 13 billion active, for fast inference. Both ship with 1 million token context as default, novel sparse attention that cuts compute and memory costs, and native integration with Claude Code, OpenClaw, and OpenCode.

Why this matters:

  • Three frontier-class models in one week. One costs $5/$30 per million tokens. Two are free.

  • GPT-5.5 leads SWE-bench at 88.7%, but the gap between the closed frontier and the open-source frontier is measured in single percentage points and shrinking by the month.

  • The pricing spread is widening as the capability gap closes, and closed model provider revenues increasingly depend on enterprise customers paying for capability they could likely soon replicate with open-source models and their own infrastructure.

Google Splits the TPU Into Two Chips: One for Training, One for Inference

Fourth disaggregated architecture in two months.

At Google Cloud Next, Google introduced TPU 8t and TPU 8i, two purpose-built chips replacing the single TPU that previously handled both workloads. TPU 8t (training): 9,600 chips per superpod, 121 ExaFLOPS, near-linear scaling to 1 million chips via Google's new Virgo network fabric, 3x compute per pod over previous generation. TPU 8i (inference): 288 GB HBM, 3x more on-chip SRAM, a new Boardfly network topology that halves the distance between chips for faster inter-GPU communication, 80% better performance-per-dollar. Both run on Google's own Axion server processors (custom Arm-based CPUs). Both liquid-cooled.

Why this matters:

  • TPU 8i was designed for agent swarms. Google's framing: "swarms of agents perfectly orchestrated." With GPT-5.5 and K2.6 both shipping agentic capabilities this week, and Google shipping the silicon to run them, the direction of travel is clear.

  • AWS/Cerebras. NVIDIA/Groq LPX. SambaNova/Intel. Now Google at the chip level. 

  • Every major player has now concluded that the one-chip paradigm cannot serve both training and inference, and signalled to the market that while GPUs currently dominate, it likely won’t be that way forever.

NVIDIA and Google Cloud Announce 960,000 Vera Rubin GPUs

Nearly 1 million GPUs as one logical system.

NVIDIA and Google Cloud announced A5X instances powered by Vera Rubin NVL72, NVIDIA's next-generation GPU platform succeeding Blackwell. Clusters scale to 80,000 GPUs per site and 960,000 across multiple sites. 10x lower inference cost per token. 10x higher throughput per megawatt.

Why this matters:

  • 960,000 Vera Rubin GPUs is the largest announced cluster from any provider. Google now offers its own custom silicon (TPU 8t/8i) and nearly 1 million NVIDIA GPUs. Dual silicon at a scale only AWS and Meta can match.

  • Google also launched the first confidential computing offering for Blackwell GPUs, where prompts and model weights stay encrypted and invisible to the infrastructure operator.

  • This comes the same week Anthropic shipped Cowork on 3P for sovereign deployments, singalling that Sovereign AI keeps moving from marketing slogan to required reality.

Meta Breaks Ground on Data Centre #32. It's the Fourth Infrastructure Move This Month.

$1 billion. Liquid-cooled. Workforce pipeline included.

Meta is building its first Oklahoma data centre: 28th in the US, 32nd globally. Over $1 billion investment. Closed-loop liquid cooling designed for zero water consumption most of the year. 1,500MW of clean energy added to the Oklahoma grid. 200+ graduates annually from local partnerships in cooling, fibre optics, and cabling.

Why this matters:

  • CoreWeave extended to $21B. The Nebius deals. Broadcom MTIA to 1GW+. Muse Spark shipped. Now Tulsa. Three parallel strategies: own facilities, neocloud leases, custom silicon. All at gigawatt scale. All accelerating.

  • Liquid cooling and a workforce pipeline say Vera Rubin and Kyber, not today's hardware. A multi-year training programme for a single site means a decade-plus commitment.

  • 1,500MW of clean energy added to the Oklahoma grid is more than the entire Stargate UK project would have delivered. Meta is adding the equivalent of OpenAI's cancelled UK commitment to a single state's grid. The hyperscaler that moves fastest on power secures the sites that matter for the next generation of silicon.

The Rundown

The contradictions inherent to the entire industry are the theme of the week.

Distillation is both theft and a $60 billion asset. Subscription pricing is unsustainable and generating $30 billion in revenue. Export controls are preventing Chinese AI progress and Chinese open-source models are matching the frontier anyway. Every position contains its opposite, and the market is pricing both sides simultaneously.

And all without a hint of irony.

See you next week.

Everything Else

Reply

Avatar

or to participate

Keep Reading