In partnership with

Three people were arrested this week for allegedly smuggling $2.5 billion in AI servers to China. 

One of them co-founded the company that built them. He was photographed at NVIDIA's GTC booth on Monday. He was charged on Thursday.

The export control regime we've been tracking for over a year just produced its first high-profile criminal case. 

And it didn't come from a shadowy intermediary.

It came from inside one of NVIDIA's largest hardware partners.

I'm Ben Baldieri, and every week I break down the moves shaping GPU compute, AI infrastructure, and the data centres that power it all.

Here's what's inside this week:

Let's get into it.

The GPU Audio Companion Issue #98

Want the GPU breakdown without the reading? The Audio Companion does it for you, but only if you’re subscribed. If you can’t see it below, click here to fix that.

Today's issue is brought to you by Rafay

The GPU lives on sponsorships. If you value independent, no-fluff analysis of the AI infrastructure market, the sponsors are who make it possible. I'd appreciate you checking them out.

How to Build an AI Factory with Rafay

Organisations are investing heavily in GPUs, but infrastructure alone doesn’t create an AI factory. That’s why Rafay put together this paper exploring the architectural and operational model required to transform raw GPU infrastructure into a self-service AI platform. Click the button below to learn how leading companies operationalise AI infrastructure with governance, developer self-service, and production-ready AI workflows.

Supermicro Co-Founder Charged with Smuggling $2.5 Billion in AI Chips to China

The export control regime just got its first criminal test case. And it starts with hair dryers.

According to Reuters, the US Department of Justice has charged three people associated with Super Micro Computer, including co-founder Yih-Shyan Liaw, with conspiring to smuggle at least $2.5 billion of US AI server technology to China in violation of export laws. Per the indictment, servers assembled in the United States were shipped to facilities in Taiwan, then forwarded to Southeast Asia, where they were reboxed into unmarked packaging and sent onward to China. Prosecutors allege workers used hair dryers to remove labels and serial numbers from real servers, placing them on thousands of non-working "dummy" replicas staged for compliance inspections. More than half a billion dollars in servers were diverted between April and mid-May 2025 alone. Liaw and one other were arrested Thursday. A third remains a fugitive.

Why this matters:

  • $2.5 billion in alleged diversions through a publicly traded US server company is not a backroom operation. It's industrial-scale evasion running through one of NVIDIA's largest hardware partners.

  • The scheme's mechanics matter for the infrastructure market. Servers assembled in the US, shipped to Taiwan (where Super Micro has facilities), rerouted through Southeast Asia with labels removed, then forwarded to China. 

  • This is exactly the transhipment risk the draft export rules from Issue #95 were designed to address: tiered licensing, site visits, and end-use verification for large GPU deployments. Super Micro's co-founder allegedly running the scheme from inside the company while attending GTC makes the compliance gap visceral. Every neocloud and server OEM will face tighter scrutiny on supply chain documentation.

Nscale Acquires American Intelligence & Power Corporation, Secures 8GW Site in West Virginia

Nscale just bought its own power grid. One week after raising $2 billion.

Nscale has signed an agreement to acquire American Intelligence & Power Corporation (AIPCorp), sponsored by Fidelis New Energy and 8090 Industries. The deal secures the Monarch Compute Campus in Mason County, West Virginia: 2,250 acres, the United States' first state-certified AI microgrid, and a power runway scalable to over 8GW. Initial capacity of 2GW is expected online by the first half of 2028, with expansion to approximately 8GW planned for 2031.

Why this matters:

  • Last week: $2 billion Series C, Europe's largest ever (Issue #96). This week: vertical integration into energy through a US acquisition. The speed from fundraise to deployment is the signal. The Aker JV rollup, the $1.4 billion GPU-backed debt facility, and now an energy acquisition. Nscale is assembling the full stack from power generation to GPU deployment, and doing it in the US market where the largest contracts sit.

  • "State-certified AI microgrid" means behind-the-meter power purpose-built for compute, similar to the xAI gas turbine strategy (Issue #96) but at potentially four times the scale. The 2028 timeline for 2GW and 2031 for 8GW needs tracking against execution.

  • The Guardian investigation we covered last week (Issue #96) questioned whether Nscale's UK commitments match physical reality. The timing of this announcement is potentially a direct response: Nscale is shifting its centre of gravity to the US. The Loughton supercomputer site in Essex may still be a scaffolding yard. The Monarch campus in West Virginia has 2,250 acres and a state-certified power grid.

Meta Signs $27 Billion AI Infrastructure Deal with Nebius, Nebius Raises $4 Billion to Fund It

Last week, NVIDIA put $2 billion into Nebius. This week, Meta showed up with $27 billion more.

Meta will pay up to $27 billion over five years for AI infrastructure from Nebius. The deal splits into two parts: $12 billion of dedicated capacity starting early 2027, built on NVIDIA Vera Rubin GPUs, and up to $15 billion in additional capacity from infrastructure Nebius is building for third-party customers. The contract sits on top of a separate $3 billion deal signed last year and the $17.4 billion Microsoft contract (Issue #63). A Meta spokesperson described it as part of "building a more resilient and flexible infrastructure."

Two days later, Nebius priced an upsized $4 billion convertible senior notes offering: $2.25 billion in 2031 notes at 1.25% and $1.75 billion in 2033 notes at 2.625%. Net proceeds of approximately $3.96 billion. The proceeds are earmarked for data centre builds, GPU procurement, and AI cloud expansion.

Why this matters:

  • In roughly 18 months, Nebius has gone from an unknown European neocloud to holding anchor contracts from Microsoft, Meta, and NVIDIA simultaneously. No other independent infrastructure operator has that customer concentration at the top.

  • The $15 billion in "additional capacity" is not dedicated Meta infrastructure. That's Meta buying capacity from pools Nebius is building for third-party customers, locking in supply before competitors can access it. The dedicated $12 billion is the floor. The $15 billion is the land grab.

  • The $4 billion convertible raise exposes the capital intensity underneath the headline numbers. Nebius generated $385 million in operating cash flow in 2025 against $4.1 billion in capex. Revenue grew 547% to an annualised run rate of $1.25 billion, with a target of $7 billion to $9 billion by end of 2026. The contracts are enormous. The cash generation is not yet close.

Basecamp Research’s Trillion Gene Atlas with Anthropic, NVIDIA, and Two Sequencing Giants

The biological data wall just got a demolition order.

Basecamp Research has launched the Trillion Gene Atlas, a project to expand known evolutionary genetic diversity 100-fold by collecting genomic data from over 100 million species across thousands of sites globally. The initiative partners Anthropic (Claude for life sciences integration), Ultima Genomics and PacBio (industrial-scale sequencing), and NVIDIA (accelerated computing via Parabricks).

Why this matters:

  • Basecamp's EDEN foundation models, released in January, train on a proprietary genomic database 10x larger than all public repositories combined. The Atlas takes that 100x further. Processing at the petabase scale, a task that would have taken 20+ years, is targeted for completion in under two.

  • The EDEN models demonstrated zero-shot therapeutic activity in primary human T-cells and a 97% hit rate in designing antimicrobial peptides against priority pathogens.

  • This is exactly what the GPU infrastructure buildout is actually for. Not chatbots, SlopTok, or novel and exciting ways to serve ads. Real scientific breakthroughs with a material impact on the well-being of humanity as a whole. If BCR are successful, we could be on the cusp of a Golden Age of Biology.

Inference Disaggregation: AWS + Cerebras vs. NVIDIA + Groq

Cerebras and AWS announced disaggregated inference before Jensen Huang took the GTC stage.

Last week, AWS and Cerebras announced a collaboration to deliver disaggregated inference through Amazon Bedrock: Trainium for prefill, Cerebras CS-3 for decode, connected via the Elastic Fabric Adapter. AWS will also offer open-source LLMs and Amazon Nova on Cerebras hardware later this year. This is the first time a major hyperscaler has deployed Cerebras inside its own data centres.

Days later, NVIDIA unveiled the Groq 3 LPX at GTC: a rack-scale decode accelerator built around 256 interconnected LPU chips, designed to sit alongside Vera Rubin NVL72 systems. The architecture uses "attention-FFN disaggregation": GPUs handle prefill and decode attention, LPX handles feed-forward and mixture-of-experts layers. 315 PFLOPS of FP8 compute, 40 PB/s of SRAM bandwidth. NVIDIA claims 35x higher inference throughput per megawatt versus GB200 NVL72. This is the first product from the $20 billion Groq acquisition. It ships end of 2026.

Why this matters:

  • Feldman posted the hardware comparison on LinkedIn within hours of the GTC announcement, and the raw numbers are striking. CS-3: 44GB memory, 21 PB/s bandwidth, 23 wafers to run a 2-trillion-parameter model, shipping now. Groq 3 LPU: 0.5GB memory, 150 TB/s bandwidth, 2,000 chips for the same model, shipping end of 2026. 

  • That's a 90x memory gap, a 140x bandwidth gap, and a 90x difference in chip count. The numbers are vendor-supplied and should be treated as such. But even so, the architectural difference is real: wafer scale avoids the interconnect tax that comes from wiring thousands of small chips together. NVIDIA's counter is ecosystem integration: LPX plugs into Vera Rubin, Dynamo orchestration, and the full NVIDIA software stack. Cerebras' counter is physics.

  • Both NVIDIA & AWS are signalling that homogeneous GPU clusters are suboptimal for production inference. That means the looming question for neocloud operators is no longer just "which NVIDIA GPU." It's which heterogeneous architecture to deploy, and whether NVIDIA's ecosystem lock-in outweighs Cerebras's raw hardware advantage on decode.

Microsoft Ends Data Centre NDAs with Local Governments

Microsoft just made a concession no other hyperscaler has matched.

Microsoft has announced it will end the use of non-disclosure agreements with local governments for data centre projects globally. The company is working to identify and terminate all active NDAs, allowing local authorities to disclose details of Microsoft's plans publicly. The shift follows Microsoft's Community-First AI Infrastructure Plan from January. Microsoft will still seek to protect specific trade secrets required for building permits "where allowed by law," but the blanket NDA requirement is gone.

Why this matters:

  • This lands one week after the Guardian investigation (Issue #96) reported that the UK government is "not playing an active role in auditing" AI infrastructure commitments. Microsoft is responding to the same political pressure from the opposite direction: offering transparency pre-emptively rather than waiting for governments to demand it.

  • AWS, Google, and Oracle all use NDAs as standard practice. Microsoft ending them creates competitive pressure: local governments can now compare Microsoft's transparency with the opacity of its rivals.

  • This matters particularly in Europe, where data sovereignty and community consent are increasingly political, and while practical impact is limited, the signal is not. Microsoft will still protect specific technical details. But it removes the mechanism that allowed companies to announce multi-billion-dollar investments publicly while preventing local governments from discussing the specifics. That gap between press release and physical reality is exactly what the Guardian exposed. Microsoft is closing it.

GMI Cloud Announces $12 Billion, 1GW Sovereign AI Factory in Japan

Japan just got its largest sovereign AI infrastructure commitment.

GMI Cloud has announced a $12 billion AI factory in Kagoshima, Japan, targeting 1GW of power capacity. The development is initiated by Kai Shin Digital Infrastructure, a joint venture structured by CDIB Capital and Shinetsu Science Industry, in collaboration with the Kagoshima Prefectural Government and Satsumasendai City. Construction starts late 2026 with ramp to 1GW. GMI Cloud will be among the first to deploy the NVIDIA Vera Rubin NVL72 platform. The facility is purpose-built for physical AI: robotics, autonomous vehicles, and industrial automation.

Why this matters:

  • Japan has been absent from the sovereign AI infrastructure wave we've tracked through the Gulf (Issue #75: Brookfield/Qatar, Issue #86: MBZUAI), Europe (Issue #96: Nscale), and Australia (Issue #93: Sharon AI). A $12 billion, 1GW commitment positions Kagoshima alongside the largest sovereign projects globally. Japan's robotics and automotive industries provide a demand base most sovereign initiatives lack: existing industrial workloads that need on-shore compute.

  • The "physical AI" positioning distinguishes this from most neocloud announcements. GMI Cloud is building specifically for workloads that control systems in the physical world. NVIDIA's GTC keynote emphasised physical AI as the next demand driver. Kagoshima is the first gigawatt-scale facility designed for it.

  • The public-private structure is the sovereign AI model in practice. Prefectural government involvement, domestic JV structuring, local energy partnerships, and foreign technology access through GMI Cloud and NVIDIA. Japan's status as a US ally and semiconductor manufacturing partner (TSMC's Kumamoto fab is 200km away) makes Vera Rubin deployment here politically straightforward in ways that Gulf or Southeast Asian deployments are not.

The Rundown

Three themes this week, all of them structural.

First, the inference stack split. 

AWS and NVIDIA both shipped heterogeneous architectures that separate prefill from decode across different silicon. The homogeneous GPU cluster is no longer the default for production serving. 

That changes procurement, pricing, and power planning for every operator building out capacity right now.

Second, capital concentration continues to accelerate. 

$50 billion in commitments flowing through one neocloud. $8GW secured through a single site. $12 billion for a sovereign AI factory in a country that wasn't on the map six months ago. 

The scale of individual bets is outpacing the industry's ability to audit them, which is exactly what the Guardian reported, what Microsoft responded to, and what the DOJ just prosecuted.

And finally, export controls went from rulemaking to handcuffs. 

The Super Micro indictment means every GPU transaction now carries criminal liability, not just regulatory risk. Compliance teams at every neocloud and OEM worldwide are likely recalculating, because after this week, every server that ships will carry a chain of custody and carry a weight of scrutiny that didn't exist before. 

That's the lasting change.

Not the arrest but the paperwork and invasive oversight that follows it.

See you next week.

Everything Else

Reply

Avatar

or to participate

Keep Reading