In partnership with
Neoclouds are an arbitrage.
The hyperscalers priced bare metal high to protect their margins. The neoclouds filled the gap, renting out the same compute for less, and in return handed the giants more chips than Nvidia would sell them directly. A symbiosis of sorts.
That is right up until the point one of those tenants decides to turn the tables and play landlord.
I'm Ben Baldieri. Every week, I break down what's moving in GPU compute, AI infrastructure, and the data centres that power it all.
Here's what's inside this week:
Let's get into it.
Today's issue is brought to you by Rafay
The GPU runs on sponsorships. If you value independent, no-fluff analysis of the AI infrastructure market, the sponsors are who make it possible. I'd appreciate you checking them out.
The World's Leading Neoclouds Run on Rafay
Rafay helps neoclouds, sovereign AI clouds, and enterprises turn GPU infrastructure into self-service, governed AI cloud services, from Token Factory and inferencing to Kubernetes, SLURM, bare metal, and VMs.
Meta Plans a Cloud Business to Sell Its Excess Compute
One of the largest buyers of AI compute on earth just signalled it might have bought too much.
Bloomberg reported on Wednesday that Meta is building a cloud business to sell spare AI capacity through an internal unit called Meta Compute, offering both a model-access service like AWS Bedrock (hosting Muse Spark) and raw capacity sold the way CoreWeave sells it. Meta already rents capacity from the neoclouds. Now it plans to sell some itself.
Why this matters:
It leaked, right after a quarter of nerves about a 2026 capex bill of $125bn to $145bn, nearly double last year's. A leak at that moment does the work an earnings-call reassurance would, without Meta committing to anything.
This is the xAI playbook from The GPU #112: a fleet this size has to earn its keep, and Muse Spark, the frontier model Meta shipped in April, cannot fill it. A company renting out capacity its own model cannot absorb is telling you which business it would rather be in.
The deeper threat is the cost of capital. Neoclouds borrow dearer than the giants, so their floor sits higher. They can undercut what the giants charge. They cannot undercut what the giants pay. Watch whether Meta ever prices the capacity, or whether the plan fades once the capex pressure eases.
Together AI Raises $800m Betting Open Source Is the Neocloud Moat
Together AI just raised $800m at $8.3bn on the pitch that open weights are the neocloud moat.
Together AI closed a $800m Series C at $8.3bn post-money, led by Aramco Ventures with Vista Equity, General Catalyst, Emergence Capital and Nvidia participating. It holds a mark while several private ML infrastructure names have been written down this year. Together AI runs open-weight models on its own clusters, sells API access and fine-tuning against them, and pitches itself as the alternative to big-three lock-in. The money goes to capacity, training runs, and an inference platform build.
Why this matters:
$8.3bn post-money against February 2025's $3.3bn, a 2.5x mark in eighteen months.
Aramco is back. Prosperity7 (Aramco's earlier venture arm) co-led the Series B, and Aramco Ventures leads the Series C. Nvidia in the round is the recurring pattern where the chip supplier holds equity in the buyer of its chips.
The pitch is that the moat sits in the software layer, not the metal. Open-weight serving, fine-tuning, and inference optimisation are what a hyperscaler will not build for its rivals' models and cannot commoditise as easily as raw capacity.
Firmus Signs NVIDIA for 170,000 GPUs, Using Sharon AI's Exact Template
The story is not 170,000 GPUs but that Nvidia has run this exact contract before.
Firmus, last seen raising $505m from Coatue with Nvidia participating (The GPU #101), announced a compute partnership with Nvidia through 2034: a 360MW DSX campus in Batam, Indonesia, up to 170,000 accelerators across Grace-Blackwell, Vera-Rubin and Vera, built with Singapore's DayOne. Firmus projects $25bn to $30bn in offtake revenue over six years, a company estimate rather than a booked figure. Nvidia takes three positions in one deal: it sells the chips, takes a cut of Firmus's cloud revenue, and backstops the offtake with credit support.
Why this matters:
The precedent is seventeen days old. Sharon AI, now Nasdaq-listed as SHAZ, filed an 8-K on a six-year Nvidia deal worth up to $4.88bn, same DSX design, near-identical language (The GPU #104).
Same template twice in a month is a channel programme, not a one-off.
DeepSeek Open-Sources DSpark and Makes the Same GPUs Give More
The most consequential AI release of the week was not a model but a way to need fewer chips.
On 27 June, DeepSeek and Peking University released DSpark, a speculative-decoding framework, alongside DeepSpec, the toolkit for training the draft models it runs on, both MIT-licensed on GitHub. On DeepSeek's own V4-Flash it reports 60 to 85% faster per-user generation against its prior baseline, and 57 to 78% on V4-Pro, with no retraining, no weight changes and no new hardware.
Why this matters:
The durable edge in 2026 is how you serve a model, not which one you pick. An MIT-licensed gain anyone can copy compresses the moat around raw capacity across the whole industry.
The efficiency wave has been running all year, KV-cache tricks through quantisation, now with a name and a repo. Each release shaves GPU-seconds per request, and it compounds across every hosted call.
If a lab under export pressure hands the world a 1.6x-plus speedup for free, the demand curve for frontier silicon gets harder to read.
Etched Exits Stealth With Working Silicon and a Quietly Wider Pitch
Etched has working silicon and a different pitch than the one it spent two years selling.
Etched left stealth with first-pass (A0) silicon on TSMC's N4P, a rack-scale inference system in customer validation, a claimed $800m raised, and more than $1bn in signed contracts at a $5bn valuation. The December round was $500m at that $5bn, with TSMC's venture arm and Thiel, Karpathy and Hinton on the cap table. First-pass working silicon is rare, and most designs need a respin, so the chip is a genuine signal. The pitch has moved: the original Sohu thesis was a transformer-only ASIC, but the release now talks about prefill and decode, models of all shapes, and lists Mamba, which is not a transformer.
Why this matters:
Sohu was Etched's transformer-only ASIC, sold since 2024 on the bet that transformer architecture would sit still.
The release now lists Mamba, a non-transformer architecture, and talks about prefill and decode across models of all shapes.
Inference silicon is splitting on how much of the model you burn into the chip. Etched hard-codes the transformer architecture and lets weights load in. Taalas, a rival on more than $200m of funding, bakes both the architecture and the weights, one chip per model.
Google Rations Meta's Gemini Access as Its Own Compute Runs Short
Meta's biggest bottleneck last quarter was not chips but a rival's model it could not get enough of.
The Financial Times reported on 28 June that Google has capped Meta's use of Gemini after Meta asked for more than Google could supply. Meta had leaned on Gemini for coding, ads, customer service and moderation because Gemini outperformed its own Llama on those tasks, per FT sources. The cap was set around March, delaying internal Meta projects and pushing Meta staff to conserve tokens. Google itself is short because it is paying a reported $920m a month to SpaceX for around 110,000 Nvidia GPUs in xAI's data centres. Apple's reported $1bn a year for the same access kept flowing uncapped.
Why this matters:
Capacity is spoken for before it is poured, and the largest buyers are getting turned away.
That is the opposite of a glut, and the tell that supply, not demand, is the binding constraint in frontier inference.
The SpaceX bridge is the same Colossus capacity from The GPU #112, leased down from xAI to Anthropic at a reported $1.25bn a month (The GPU #107), and now Google at $920m. Emergency compute is a monthly line item for the richest firms alive.
Aligned Turns $1.19bn of Data Centre Rent Into Bonds
Aligned's $1.19bn ABS matters less than the structure sitting underneath it.
Aligned Data Centers is raising about $1.19bn in asset-backed securities against four US data centres in Ashburn, Plano and Northlake, per S&P Global's pre-sale report. It is a master trust, series 2026-1, and the proceeds repay every outstanding note before it. Aligned is shrinking the pool: three of the previous seven data centres come out at close, taking $1.09bn of appraised value with them, while the four that stay rose $887m to a combined $3.16bn. Morgan Stanley is sole structuring advisor, the deal closes on 27 July, and it carries a cash trap plus a class A amortisation trigger on debt-service cover of 1.35x and 1.20x. Aligned itself is being bought by the AI Infrastructure Partnership consortium (BlackRock's GIP, MGX, Microsoft, Nvidia) in a roughly $40bn deal, the largest ever for a data centre operator. The same complex funding the buildout owns the landlord and packages the rent.
Why this matters:
Data centre cash flows are now bond collateral, tranched and sold, which front-loads the downside: if utilisation softens, the cash trap fires and equity takes the hit first.
The buyer is what makes this rare. The same consortium underwrites the $40bn buyout and then packages the tenant rent behind it, funding the buildout and skimming its rent on both sides of one balance sheet.
Vertical integration is now also happening to the financial stack, not just the compute stack.









