The GPU’s grip on AI infrastructure is slipping.
The largest US tech IPO since Snowflake went to a chip company that doesn’t make GPUs. OpenAI spent $4 billion to put engineers inside the customer to drive outcomes and demand for that company’s chips. And a British chip startup raised $220 million two years before shipping silicon.
If this continues, I’m going to have to rebrand.
I’m Ben Baldieri. Every week I break down what’s moving in GPU compute, AI infrastructure, and the data centres that power it all.
Here’s what’s inside this week:
Let’s get into it.
Cerebras Runs the Largest US Tech IPO Since Snowflake
The company that has spent ten years arguing NVIDIA isn’t enough for inference just got the public market to agree.
Cerebras priced its IPO on Wednesday at $185 per share, selling 30 million Class A shares to raise $5.55 billion. The book was oversubscribed roughly 20 times. It is the largest US tech IPO since Snowflake’s 2020 debut. Cerebras packages a full silicon wafer as a single chip and has pivoted commercially from training to inference. OpenAI is the anchor: a binding Master Relationship Agreement for 750MW of inference capacity through 2028, expandable to 2GW by 2030, valued at $20 billion+ at full expansion.
Why this matters:
Artificial Analysis benchmarks 22 providers serving gpt-oss-120B (high). Cerebras runs at 2,230 tokens per second. Fireworks 836. SambaNova 704. Cerebras also holds the lowest time-to-first-token on the board at 1.37 seconds against SambaNova’s 4.61. The speed advantage is documented. The institutional bet is that it survives the move to frontier-model production load.
Per Cerebras’s S-1 filing, MBZUAI accounted for 62% of 2025 revenue and G42 for 24%, totalling 86% from two Abu Dhabi-linked entities. The OpenAI contract diversifies the book without dissolving the concentration. A single sovereign customer pull-back is the entire bear case.
Three non-GPU inference plays have priced through three capital channels in twelve months: Cerebras via IPO, Fractile via venture (see below), Groq absorbed into NVIDIA at Christmas. Public-market capital just signalled inference compute is a separable market from training, and the second source for it does not have to be a GPU.
OpenAI Just Spent $4 Billion to Put McKinsey on Its Payroll
The model isn’t selling itself, so OpenAI is buying the people who sell models for a living.
OpenAI launched the Deployment Company on Monday: majority-owned by OpenAI, $4 billion+ from 19 partners, structured as a $10 billion pre-money JV with a 17.5% annual return guaranteed to PE backers over five years. TPG leads. Advent, Bain Capital, and Brookfield co-lead. Goldman Sachs, SoftBank, and Warburg Pincus are founding partners. Bain & Company, Capgemini, and McKinsey are the consulting partners. The vehicle launches with the acquisition of Tomoro, a ~150-engineer outfit. The move is a follow-up: Anthropic launched a parallel partnership a week earlier with Blackstone, Hellman & Friedman, and Goldman Sachs as founding partners, plus General Atlantic, Apollo, Sequoia, GIC, and Leonard Green in the backing consortium.
Why this matters:
Bain, McKinsey, and Capgemini sell AI transformation engagements for a living. They have just signed up to sell OpenAI’s stack specifically. The conflict that exists when your "neutral consultant" is also a co-investor in the vendor’s distribution arm is the entire point of the structure.
A 17.5% guaranteed annual return over five years is not a venture term. It is a credit instrument with equity features. OpenAI has converted a slice of its growth optionality into something a private credit desk can underwrite.
TPG, Advent, Bain Capital, Brookfield, and Warburg Pincus collectively own thousands of portfolio companies. Each is now a Deployment Company prospect. Accenture took years to build a $3 billion AI practice. OpenAI committed more than that on day one.
The model is no longer the product. The integration is. Inference demand will increasingly flow through enterprise integration cycles, not consumer hype cycles. Harder to capture as a model vendor without engineers inside the customer.
Nebius Buys Clarifai’s Brain Eleven Days After Buying Eigen AI’s
Two acquihires in eleven days. Nebius is done just renting GPUs.
Nebius announced on Tuesday that Clarifai founder Matthew Zeiler joins as SVP Research, bringing his engineering team and a perpetual licence to Clarifai’s inference and compute orchestration patents. It follows the $643 million Eigen AI acquisition two weeks ago. Eigen optimises at the model layer. Clarifai at the system layer. In Wednesday’s Q1 results, Nebius confirmed it has secured up to 1.2GW of power and land for a new owned AI factory in Pennsylvania, a day after breaking ground on a 1.2GW campus in Independence, Missouri.
Why this matters:
Inference throughput gains come from software optimisation on top of the silicon, not from the silicon itself. Eigen AI optimises at the model layer. Clarifai optimises at the system layer. Nebius has now bought both. Token Factory is Nebius productising that optimisation as workload margin on top of the GPU rental box. Operators selling raw GPU hours compete on price for a workload that gets won on throughput per dollar.
NVIDIA owns equity in Nebius and just watched its portfolio company buy two inference-software stacks in eleven days. Software that wrings more tokens out of each GPU drives utilisation up, which drives more GPU purchases. Vertical integration with extra steps.
1.2GW broken ground in Missouri. 1.2GW more secured in Pennsylvania. Capacity at this scale does not get permitted, financed, and announced on speculation. Nebius is sizing for the inference workload it expects to win, not the GPU rental contracts it already has.
Self-Improving Agents Just Became the Top Inference Consumer on OpenRouter
224 billion tokens a day. Each task fires reflection, skill creation, and sub-agent calls. Chat is no longer the unit.
Hermes Agent from Nous Research overtook OpenClaw on May 10 to claim the #1 daily-token position on OpenRouter’s app and agent rankings, processing 224 billion tokens against OpenClaw’s 186 billion. Hermes crossed 140,000 GitHub stars in the three months since its February launch. It runs persistently rather than per-prompt, refines its own skills after each task, and delegates sub-agents for multi-step work. Every completed task fires multiple inference calls: the primary work, the reflection pass that grades it, the skill-file creation that captures the pattern, and any sub-agent delegations spawned along the way. This is the first time the top OpenRouter slot has gone to anything other than OpenClaw since OpenClaw’s late-2025 rise.
Why this matters:
The unit of inference demand is shifting from chat turn to agent task. Chat fires one inference call per user message. A self-improving agent fires reflection, skill creation, and sub-agent calls on top of the primary task. Hermes hitting 224 billion daily tokens on OpenRouter alone is the public-facing evidence that the multiplier is live in production.
OpenRouter is the visible slice. Claude Code, Cursor, ChatGPT’s agentic modes, and every enterprise agent stack that does not route through public model marketplaces are invisible on this leaderboard. The total agent-driven inference workload across the market is a multiple of what the public rankings show.
The demand curve compounds. Self-improving agents take on more ambitious tasks as they accumulate skills, and more ambitious tasks generate more tokens per task. Throughput becomes the binding constraint on what users attempt next, which becomes the demand signal pricing every inference contract on the market.
Every other story in this issue prices against an inference workload class that did not exist eighteen months ago. Hermes is the public-facing proof that the workload class exists, is growing fast, and is not the hyperscalers’ captive market.
Fractile Raises $220M Two Years Before Shipping a Chip
The venture market just pre-paid for non-GPU silicon that doesn’t ship until 2027.
Fractile raised $220 million in a Series B led by Accel, Factorial Funds, and Founders Fund, with participation from Conviction, Gigascale, O1A, Felicis, Buckley Ventures, 8VC, and existing backers. Former Intel CEO Pat Gelsinger participated personally, having previously backed the company as an angel in its 2024 seed round. Founded in 2022 by Walter Goodwin, Fractile builds inference-only silicon on an in-memory-compute architecture optimised for token throughput. First commercial silicon is not expected until 2027. The Information reported earlier this month that Anthropic was in discussions to buy the chips.
Why this matters:
Capital is pre-paying for non-GPU silicon two years before it ships. The venture market is acting as though the inference architecture question is settled. The risk is not that the chip is bad. The risk is that someone else’s chip ships first and locks the workload before Fractile gets to market.
Pat Gelsinger ran Intel for four years and walked out with full visibility into where every wafer in the industry is sitting. He has now backed Fractile at the seed and the Series B. That is the most personal endorsement available in inference silicon right now, and it is not a passive cheque.
Fractile’s headline claim is 1,200 tokens per second against ~40 t/s on current GPUs. The chip doesn’t ship until 2027, so there is nothing to benchmark yet. A 30x throughput claim, if it holds, means a workload that takes a month on current GPUs takes a day. The capital is acting as though the if is settled.
AWS Hands Anthropic Operational Control to Keep the AWS Bill
AWS just admitted operating Claude isn’t the moat. Billing it is.
Claude Platform on AWS went GA on Monday. Anthropic operates the inference on its own infrastructure and its own capacity pool, with inference routable outside AWS to Anthropic’s primary cloud and inference_geo pinning geography per request. AWS provides SigV4 authentication, IAM access control, and Marketplace billing. Anthropic is the data processor for inference inputs and outputs; AWS processes billing and identity metadata. Anthropic also manages rate limits, tier advancement, customer support, and pricing. The full Anthropic platform ships at first-party-API parity: Messages API, Agent Skills, code execution, beta headers, batch, prompt caching, Files API, Claude Console. New models launch simultaneously with the first-party Anthropic API.
Why this matters:
AWS now sells three Claude tiers. Bedrock legacy, where AWS operates everything through its own Converse/InvokeModel API. Claude in Amazon Bedrock, where AWS still operates but with the Anthropic Messages API surface. Claude Platform on AWS, where Anthropic operates and AWS only bills. Each rung up gives the customer more Anthropic and less AWS. FedRAMP, IL4, IL5, and HIPAA-ready compliance only exist on the bottom two rungs.
Same-day feature shipping is a privilege AWS extends to exactly one model vendor. Meta, Mistral, Cohere, AI21, Stability. Every other model on Bedrock is on the previous release cadence. AWS just signalled which model vendor matters most, in writing, inside its own marketplace.
Anthropic manages rate limits, tier advancement, pricing, and customer support on the top rung. AWS keeps authentication, IAM, Marketplace billing, the compute commit drawdown, and the data centre colocation. AWS got the parts a hyperscaler monetises at scale. Anthropic got every part that touches the customer’s actual workload.
Azure and GCP now have a template. Third-party model vendors on hyperscaler marketplaces can ask for the same terms. The hyperscaler as inference operator narrows into a regulated-compliance niche. The hyperscaler as billing and identity layer is the new surface area.
Fermi America Lost $189M This Quarter With 35 Employees, No Tenant, and No CEO
17 gigawatts of design capacity. Zero contracted revenue. 35 employees.
Fermi reported Q1 2026 results on Thursday: $189 million net loss (roughly 70% non-cash, driven by $134 million in share-based comp and a $25 million Macquarie debt extinguishment); $243 million cash; $785 million in equipment financing secured in the quarter, anchored by a $500 million MUFG facility. 35 employees. No revenue. No contracted tenant. No permanent CEO since the board terminated Toby Neugebauer for cause on April 30, having removed him as CEO on April 17. Neugebauer has filed suit alleging wrongful termination. Project Matador in Carson County, Texas spans 7,500 acres designed for 17GW with a 6GW clean air permit, the second-largest in the United States.
Why this matters:
Multi-gigawatt sites get built when a hyperscaler signs the offtake. The hyperscaler pipeline this quarter is going to named geographies with named partners. None of those geographies is West Texas. None of those partners is Fermi.
The headline loss is mostly non-cash. The cash burn is still material. The $785 million in Q1 equipment financing buys time, not customers. Heidrick & Struggles has the CEO search mandate. New CEOs do not typically run toward an active litigation file.
17GW of design capacity without a tenant is not the start of an asset story. It is the start of a balance sheet story. Power-first speculation without an anchor tenant is over as a financing pitch. Buyers want signed offtake before they sign capital commitments.
The Rundown
Artificial Analysis posts benchmarks on gpt-oss-120B (high) performance.
The entire GPU-based field clusters between roughly 270 and 560 t/s. Eigen AI runs at 532 tokens per second. Clarifai at 476. Nebius’s own "Fast" endpoint at 560. Groq was a challenger, but they’re now aboard the mothership. SambaNova is also competitive at 704, but they’ve had other issues. Fireworks AI tops the charts at 836 tk/s. Cerebras runs the same model at 2,230 tk/s. Software optimisation on top of GPUs has a ceiling.
The ceiling is the GPU.
That’s why this week the market stopped pretending NVIDIA’s monopoly was indivisible.
The largest US tech IPO in five years went to a non-GPU silicon company. A British inference startup raised $220 million for non-GPU silicon that doesn’t ship for another eighteen months. OpenAI spent $4 billion to put engineers inside the customer, because the model is no longer the product, while signing on for more capacity with Cerebras to satisfy the demand thet are creating. Anthropic put its entire platform one IAM policy away from every AWS shop on earth. Nebius did two acquisitions/acquihires in eleven days to graduate from GPU rental to inference cloud. And a self-improving open-source agent just became the top inference consumer on OpenRouter.
Each of these moves is either driven by or seeks to drive massive and accelerating inference demand.
Which chip that demand lands on is now up for grabs.
See you next week.
Everything Else
Boost Run (BRUN) begins trading on Nasdaq via de-SPAC merger with Willow Lane, $940M of long-term contracted customer revenue, FCF positive on day one
xAI deploys 19 portable gas turbines at Colossus 2 in Southaven, Mississippi, bringing the site to 46 turbines and 500MW+ amid an active NAACP/SELC lawsuit
Cisco reports record $15.8B Q3 revenue, raises FY26 AI infrastructure order forecast from $5B to $9B, announces restructuring plan with up to $1B in pretax charges
Meta and DESRI sign 850MW of solar + storage PPAs across Oklahoma (500MW), Texas (200MW), and Mississippi (150MW)
Google signs a 15-year 500MW solar PPA with Linea Energy for the Duffy Solar Project in Matagorda County, Texas
A $1.7B data centre in Lebanon County, Pennsylvania is withdrawn by Inch & Co. after community opposition in South Annville Township
Nscale secures $790 million in financing to support AI infrastructure buildout in Norway

