Issue #27: Jensen’s GTC Keynote Part 2

GTC doesn’t stop, and neither do we.

After a week of jetlag, back-to-back meetings, and way too much caffeine, I’m still digging through everything coming out of San Jose for Panchaea.

Jensen’s keynote was so packed I had to split it in two.

Now, here’s the second half - unedited, raw, and straight from the source.

More Blackwell and Rubin (and beyond). More AI factories. More of NVIDIA’s vision for the next wave of AI infrastructure.

If you missed part 1, find that here.

Now, let’s get into it.

Nvidia Dynamo
Tokens/s/MW
Don’t Buy Hopper
Omniverse DCs
Vera Rubin
Rubin Ultra
Cisco and Spectrum-X
Silicon Photonics
AI in Storage
Enterprise AI
Robots
Physical AI with Blue

Nvidia Dynamo

Well, this dynamic operation is really complicated. So I've just now described pipeline parallel, tensor parallel, export parallel, in-flight batching, disaggregated inferencing, workload management, And then I've got to take this thing called a KV cache, I've got to route it to the right GPU, I've got to manage it through all the memory hierarchies. That piece of software is insanely complicated. And so today we're announcing the NVIDIA Dynamo. NVIDIA Dynamo does all that. It is essentially the operating system of an AI factory.

Whereas in the past, in the way that we ran data centers, our operating system would be something like VMware. And we would orchestrate, and we still do, you know, we're a big user, we would orchestrate a whole bunch of different enterprise applications running on top of our enterprise IT. But in the future, the application is not enterprise IT, it's agents. And the operating system is not something like VMware, it's something like Dynamo. And this operating system is running on top of not a data center, but on top of an AI factory.

Now, we call it Dynamo for a good reason. As you know, the Dynamo was the first instrument that started the last industrial revolution, the industrial revolution of energy. Water comes in, electricity comes out. It's pretty fantastic. Water comes in, you light it on fire, turn it into steam, What comes out is this invisible thing that's incredibly valuable. It took another 80 years to go to Ultima and Heru, but Dynamo. Dynamo is where it all started. So we decided to call this operating system, this piece of software, insanely complicated software, the NVIDIA Dynamo.

It's open source, open source. And we're so happy that so many of our partners are working with us on it. And one of my favorite framework partners, I just love him so much, because the revolution in the work that they do, and also because apparently he's such a great guy. But Perplexity is a great partner of ours in working through this. So anyhow, I really, really appreciate it.

Tokens/s/MW

OK, so now we're going to have to wait until we scale up all these infrastructure, but in the meantime, we've done a whole bunch of very in-depth simulation. We have supercomputers doing simulation of our supercomputers, which makes sense. And I'm now going to show you the benefit. of everything that I just said. And remember, the factory diagram. On the x-axis, on the x-axis is tokens per second throughput, excuse me, the y-axis, tokens per second throughput of the factory, and the x-axis, tokens per second of the user experience. And you want super smart AIs, and you want to produce a whole bunch of them. This is hot.

OK, so this is Hopper. And it can produce for one user, for each user, about 100 tokens per second. This is eight GPUs. And it's connected with InfiniBand. And I'm normalizing it to tokens per second per megawatt. So it's a 1 megawatt data center, which is not a very large data factory, but anyhow, 1 megawatt. And so it can produce, for each user, 100 tokens per second. And it can produce at this level, whatever that happens to be, 100,000 tokens per second for that 1 megawatt data center. Or it can produce about 2 and 1 half million tokens per second.

2.5 million tokens per second for that AI factory if it was super batched up and the customer is willing to wait a very long time. OK? Does that make sense? Alright, so 9. All right. Because this is where, you know, every GTC, there's a price for entry. You guys know. It's like, you get tortured with math. This is the only, only, only entry that you get tortured with math. All right. So Hopper, you get 2 and 1 half. Now, what's that 2 and 1 half million? How do you translate that? 2 and 1 half million. Remember, Chad GT is like $10 per million tokens.

$10 per million tokens. Let's pretend for a second that that's, I think the $10 per million tokens is probably down here, okay? I'll probably say it's down here, but let me pretend it's up there, because two and a half million, 10, so $25 million per second. Does that make sense? That's how you think through it. Or, on the other hand, if it's way down here, then the question is, you know, so it's $100,000, $100,000 just divide that by 10. OK? $250,000. per factory per second. And then it's 31 million, 30 million seconds in a year. And that translates into revenues for that one megawatt data center.

This is not your goal. On the one hand, you would like your token rate to be as fast as possible so that you can make really smart AIs. And if you have smart AIs, people will pay you more money for it. On the other hand, the smarter the AI, the less you can make in volume. Very sensible training. And this is the curve we're trying to bend. Now what I'm just showing you right now is the fastest computer in the world. Properly. It's the computer that revolutionized everything. And so how do we make that better?

So the first thing that we do is we come up with Blackwell with MDLink8. Same black ball, but one same compute. And that one compute node, with any link A, is using FPA. And so black ball is just faster. Faster, bigger, more transistors, more everything. But we'd like to do more than that. And so we introduce a new precision. It's not quite as simple as 4-bit floating point. But using 4-bit floating point, we can quantize the model, use less energy, Use less energy to do the same. And as a result, when you use less energy to do the same, you can do more. Because remember, one big idea is that every single data center in the future will be power limited. Your revenues are power limited. You can figure out what your revenues are going to be based on the power you have to work with.

This is no different than, you know, like many other industries. And so we are now a power-limited industry. Our revenues will associate with that. Well, based on that, you want to make sure you have the most energy-efficient compute architecture you can possibly get. Then we scale up with NVLink 72. Does that make sense? Look at the difference between that and NVLink 72 FP4. And then, because our architecture is so tightly integrated, and now we add Dynamo to it, Dynamo can extend that even further. Are you following me? So Dynamo also helps Hopper, but Dynamo helps Blackboard incredibly. Now, yep.

Only at GTC do you get an applause. So now notice what I put, those two shiny parts, that's kind of where your max Q is. You know, that's likely where you run your factory operations. You're trying to find that balance between maximum throughput and maximum quality of AI. Smartest AI, the most level. Those two, that xy-intercept, is really what you're optimizing for. And that's what it looks like when you look underneath those two squares. Blackwell is way, way better than hardware. And remember, this is not ISO chips. This is ISO power.

This is ultimate Moore's model. This is what Moore's model was always about in the past. And now here we are, 25x in one generation as isopower. It's not ISO chips. It's not ISO transistors. It's not ISO anything. ISO power, the ultimate, the ultimate limit. There's only so much energy we can get into a data center. And so within ISO power, Blackwell is 25 times. Now, here's that rainbow. That's incredible. That's the fun part. Look, all the different config, every,

Underneath the pareto, the frontier pareto, we call it the frontier pareto, under the frontier pareto are millions of points we could have configured the datacenter to do. We could have parallelized and split the work and shuffled the work in a whole lot of different ways.

And we found the most optimal answer, which is the Pareto, the frontier Pareto. OK? Pareto frontier. And each one of them, because of the color, shows you it's a different configuration. Which is the reason why this image says very, very clearly, you want a programmable architecture that is as homogeneously fungible, as fungible as possible. Because the workload changes so dramatically across the entire frontier.

And look, we got on the top, Expert Parallel 8, batch of 3,000, disaggregation off, dynamo off. In the middle, Expert Parallel 64 with, oh, the 26% of 26% is used for context, so Dynamo is 2 in 1, 26% context, the other 74% is not. Batch is 64, and Expert Parallel is 64 on one, Expert Parallel 4 on the other. And then down here, all the way to the bottom, you've got Tensor Parallel 16 with Expert Parallel 4, Batch is 2, 1% context.

The configuration of the computer is changing across that entire spectrum. And then this is what happens. So this is with input sequence length. This is kind of a commodity test case. This is a test case that you can benchmark relatively easily. The input is 1,000 tokens. The output is 2,000. Notice earlier, we just showed you a demo where the output is very simply 9,000, right? 8,000.

And so obviously, this is not representative of just that one chat. Now, this one is more representative. And this is what the goal is to build these next-generation computers for next-generation workloads. And so here's an example of a reasoning model. And in a reasoning model, black box 40 times the performance model, straight up.

Don’t Buy Hopper

You know, I've said before, somebody actually asked me why would I say that, but I said before that when Blackwell starts shipping in Blonde, you couldn't give Hoppers away. And this is what I mean. And this makes sense. If you're still looking to buy a Harvard, don't be afraid. It's OK. But on the cheap revenue, destroy it. My sales guys are going, oh, no. Don't say that.

There are circumstances where Hopper is fine. That's the best thing I can say about Hopper. There are circumstances where you're fine. Not many. But I had to take a swing. And so that's kind of my point. When the technology is moving this fast, and because the workload is so intense, and you're building these things that are factories, we'd really like you to invest in the right versions.

Just to put it in perspective, this is what a 100 megawatt factory looks like. This 100 megawatt factory, based on hoppers, you have 45,000 dies, 1,400 racks, and it produces 300 million joules per second. And this is what it looks like in Blackwell. You have 86, yeah, I know.

That doesn't make any sense. OK, so we're not trying to sell you a loss. Our sales guys are going, Justin, you're selling a loss? This is better. OK? And so anyways, the more you buy, the more you save. It's even better than that. Now, the more you buy, the more you make. So anyhow, remember, everything is in the context of AI factories. And although we talk about the chips, you always start from scale-up. We talk about the chips, but you always start from scale-up, the full scale-up. What can you scale up to, to the maximum?

I want to show you now what an AI factory looks like. But AI factories are so complicated. I just gave you an example of one rack. It has 600,000 parts. It has 3,000 pounds. Now, you've got to take that and connect it with a whole bunch of others. And so we are starting to build what we call a digital twin of every data center. Before you build a data center, you have to build a digital twin. Let's take a look at this. This is just incredibly beautiful.

The world is racing to build state-of-the-art, large-scale AI factories. Bringing up an AI gigafactory is an extraordinary feat of engineering, requiring tens of thousands of workers, from suppliers, architects, contractors, and engineers, to build, ship, and assemble nearly 5 billion components, and over 200,000 miles of fiber, nearly the distance from the Earth to the Moon.

Omniverse DCs

The NVIDIA Omniverse Blueprint for AI Factory Digital Twins enables us to design and optimize these AI factories long before physical construction starts. Here, NVIDIA engineers use the Blueprint to plan a 1GW AI factory, integrating 3D and layout data of the latest NVIDIA DGX SuperPods, and advanced power and cooling systems from Virtua and Schneider Electric. and optimized topology from NVIDIA AIR, a framework for simulating network logic, layout, and protocols.

This work is traditionally done in silos. The Omniverse Blueprint lets our engineering teams work in parallel and collaboratively, letting us explore various configurations to maximizing TCO and power usage effectiveness. NVIDIA uses Cadence Reality Digital Twin, accelerated by CUDA and Omniverse libraries, to simulate air and liquid cooling systems. And Schneider Electric with eTap, an application to simulate power block efficiency and reliability.

Real-time simulation lets us iterate and run large-scale what-if scenarios in seconds versus hours. We used a digital twin to communicate instructions to the large body of teams and suppliers, reducing execution errors and accelerating time to bring up. And when planning for retrofits or upgrades, we can easily test and simulate cost and downtime, ensuring a future-proof AI factory.

This is the first time anybody from Bill Stavisky said, oh, that's so beautiful. All right. I've got to race here, because it turns out I've got a lot to tell you. And so if I go a little too fast, it's not because I don't care about you. It's just I've got a lot of information to tell you.

All right. So first, our road map. We're now in full production of Blackwell. Computer companies all over the world are ramping these incredible machines at scale. And I'm just so, so pleased and so grateful that all of you worked hard on transitioning into this new architecture. And now, in the second half of this year, we'll easily transition into the upgrade. So we have the BlackBall Ultra, NB-Link 72. It's one and a half times more flat ops. It's got a new instruction for attention. It's one and a half times more memory. All that memory is useful for things like KD Cache. It's two times more bandwidth for networking bandwidth. And so now that we have the same architecture, we'll just kind of gracefully glide into that. And that's all BlackBall.

OK? That's coming in the second half of this year. Now, there's a reason why this is the only product announcement in any company where everybody's going, yeah, next. And in fact, that's exactly the response I was hoping to get. And here's why.

Look, we're building AI factories and AI infrastructure. It's going to take years of planning. This isn't like buying a laptop. This isn't discretionary spend. This is spend that we have to go plan on. And so we have to plan on having, of course, the land and the power. And we have to get to our capex rating and get engineering teams

And we have to lay it out a couple, two, three years in advance, which is the reason why I show you our roadmap a couple, two, three years in advance. So that we don't surprise you in May. You know, hi, in another month we're going to go to this incredible new system. I'll show you an example in a second. And so we plan this out in multiple years. The next click, when you're out is named after an astronomer, and her grandkids are here. Her name is Meryl Rubin, and she discovered dark matter.

Vera Rubin

Now, Vera Rubin is incredible because the CPU's view is twice the performance of Grace and more memory, more bandwidth, and yet just a little tiny 50W CPU is really quite incredible. And Rubin, brand new GPU, CX9, brand new networking smart NIC, NVLink 6, brand new NVLink, brand new memories, HBM4. Basically, everything is brand new, except for the chassis. And this way, we can take a whole lot of risk in one direction.

and not risk a whole bunch of other things related to the infrastructure. And so Vera Group and E-Link 144 is the second half of next year. Now, one of the things that I made a mistake on, and so I just need you to make this pivot. We're going to do this one time.

Blackwell is building two GPUs in one Blackwell chip. We call that one chip a GPU, and that was wrong. And the reason for that is it screws up all the NVLink nomenclature and things like that. So going forward, without going back to Blackwell to fix it, going forward, when I say NVLink 144, it just means that it's connected to 144 GPUs. And each one of those GPUs is a GPU guy. And it can be assembled.

in some package. How it's assembled can change from time to time. And so each GPU dies a GPU. Each NVLink's connected to the GPU. And so Vera-Rubin NVLink 144. And then this now sets the stage for the second half of the year, the following year, we call Rubin Ultra. Hang on.

Rubin Ultra

This one is where you should go. All right, so this is Vera Rubin, Rubin Ultra, second half of 27. It's NVLink 576, extreme scale up. Each rack is 600 kilowatts, two and a half million parts.

And, obviously, a whole lot of GTH. And everything is x-factored more. So 14 times more, more flops, 15 x-flops instead of 1x-flops, as I mentioned earlier, it's now 15 x-flops scaled up x-flops. And it's 300, what, 4.6 petabytes, so 4,600 terabytes per second

Scale-up bandwidth. I don't mean aggregate, I mean scale-up bandwidth. And, of course, lots of brand-new memory with Twitch and VH9. And so, notice, 16 sites, 4 GPUs in one package. Extremely large information. Now just put that in perspective. This is what Gray's Blackboard looks like, and this is what Rubin looks like. ISO dimension.

And so this is another way of saying, before you scale out, you have to scale up. Does that make sense? Before you scale up, scale out, and you scale up. And then after that, you scale up, scale out with amazing technology that I'll show you in just a second. So first, you scale up. And then now that gives you a sense of the pace at which we're moving, this is the amount of scale up flux. This is scale up flux.

Hopper is 1x, Blackwell is 68x, Ruben is 900x. Scale of course. And then if I turn it into essentially your TCO, which is power on top, power curve, and then underneath is the area underneath the curve. that I was talking to you about. It's a square of energy group, which is basically flops times bandwidth. OK? So the way you think about a very easy gut feel, gut check on whether your AI factories are making progress is watts divided by those numbers. And you can see that Reuben is going to drive the cost down tremendously. OK? So that's very quickly NVIDIA's roadmap.

Once a year, once a year of life. Like clock ticks. Once a year. Okay, how do we scale up? Well, we introduced, we were preparing to scale out. That was scale up, that was MBLink. Our scale up network is InfiniBand and Spectrum X. Most were quite surprised that we came into the Ethernet world. And the reason why we decided to do Ethernet is if we could help Ethernet become like InfiniBand, have the qualities of InfiniBand, then the network itself would be a lot easier for everybody to use and manage.

decided to invest in Spectrum. We found Spectrum X, and we brought to it the properties of congestion control and very low latency and amount of software that's part of our computing fabric. And as a result, we made Spectrum X incredibly high-performance. We scaled up the largest single GQ cluster ever as one giant cluster with Spectrum X. That was colossus.

Cisco and Spectrum-X

And so there are many other examples of it. SpectrumX is unquestionably a huge home run for us. One of the areas that I'm very excited about is SpectrumX is not just for AI clouds, but SpectrumX also makes it possible for us to help every enterprise. become an AI company. And so, was it last week or the week before, Chuck Robbins and Cisco and NVIDIA announced a partnership for Cisco, the world's largest enterprise networking company, to take Spectrum X and integrate it into their product line so that they could help the world's enterprises become AI companies.

We're at 100,000 with CX-8, CX-7. Now CX-8 is coming, CX-9 is coming, and during Rubin's timeframe, we would like to scale up the number of GPUs to many hundreds of thousands. Now, the challenge with scaling up, which we've used in many hundreds of thousands, is the connection of the scale-up. The connection of scale-up is copper. We should use copper as far as we can. And that's, you know, call it a meter or two.

And that's incredibly good connectivity, very high reliability, very good energy efficiency, very low cost. And so we use copper as much as we can on scale up. But on scale out, where the data centers are now the size of the stadium, we're going to need something much at a long distance running. And that's where silicon photonics come in.

The challenge of silicon photonics has been that the transceivers consume a lot of energy. To go from electrical to photonic has to go through a series, go through a transceiver and a series, and several series. And so each one of these, each one of these, each one of these,

Am I alone? What happened to my networking? Can I have this up here? Yeah, let's bring it up so I can show you what I'm talking about. OK, so first of all, we're announcing NVIDIA's first co-packaged option. silicon photonic system. It is the world's first 1.6 terabit per second CPO. It is based on a technology called micro ring resonator modulator. And it is completely built with this incredible process technology at TSMC that we've been working with for some time.

Silicon Photonics

And we partnered with just a giant ecosystem of technology providers to invent what I'm about to show you. This is really crazy technology. Crazy, crazy technology. Now the reason why we decided to invest in MRM is so that we could prepare ourselves using mRNA has incredible density and power, better density and power compared to moxander, which is used for telecommunications, when you drive from one data center to another data center in telecommunications. Or even in the transceivers that we use, we use moxander. Because the density requirement is not very high, until now. And so if you look at these transceivers, this is the example of a transceiver.

They did a pretty good job tangling this up. Thank you. This is where you've got the turkeys and the... It's not as easy as you think. These are squirrelly little things. All right, so this, this right here, this is 30 watts. Just so people remember, this is 30 watts. And if you get it on, if you buy it on the high line, it's $1,000. This is the plug. On this side, on this side,

is electrical. On this side is optical. So optics come in through the yellow. You plug this into a switch. It's electrical on this side. There's transceivers, lasers. It's a technology called Moxander. And incredible. And so we use this to go from the GPU to the switch to the next switch.

And then the next push down, and the next push down to the GDU, for example. And so each one of these, if we had 100,000 GDUs, We would have 100,000 of this side, and then another 100,000, which connects the switch to the switch. And then on the other side, I'll attribute that to the other end. If we have 250,000, we'll add another layer of switches. And so each GPU, every GPU, 250,000, every GPU would have six transceivers.

Every GPU would have six of these plugs. And these six plugs would add 180 watts per GPU. 180 watts per GPU, and it's $6,000 per GPU. And so the question is, how do we scale up now to millions of GPUs? Because if we have a million GPUs, multiply by six, It would be six million transceivers. times 30 watts, 180 megawatts of transceivers. They didn't do any math, they just moved signals around.

And so the question is how do we, how could we afford, and as I mentioned earlier, energy is our most important commodity. Everything is related ultimately to energy, so this is gonna limit our revenues, our customers' revenues. by subtracting out 180 megawatts of power. And so this is the amazing thing that we did. We invented the world's first

And this is what it looks like. There's a little waveguide. You see that on that waveguide? It goes to a ring. That ring resonates, and it controls the amount of reflectivity of the waveguide as it goes around. And it limits and modulates the energy, the amount of light that goes through. And it shuts it off by absorbing it or passing on. It turns the light, this direct, continuous laser beam, into ones and zeros.

And that's the miracle. And that technology is then, that photonic IC is stacked with the electronic IC, which is then stacked with a bunch of micro lenses, which is stacked with this thing called fiber array. These things are all manufactured using this technology at TSMC called Coup. They call it Coup. And it's using a 3D coax technology.

working with all of these technology providers, a whole bunch of different names I just showed you earlier, and it turns them into this incredible machine. So let's take a look at the video. so

That's the technology behind it. And the utility of these switches are an InfiniBand switch. The silicon is working fantastically. The second half of this year we will ship the silicon photonic switch in the second half of this year, and the second half of next year we'll ship the SpectroMax.

Because of the MRM choice, because of the incredible technology risks that over the last five years that we did, and filed hundreds of patents, and we licensed it to our partners so that we can all build them, now we're in a position to put silicon photonics and co-packaged options no transceivers, direct fiber into our switches with a radius of 512. This is the 512 quarts. This would just simply not be possible any other way. So this now set us up to be able to scale up to these multi-hundred-thousand GPUs and multi-million GPUs. And the benefit, just you imagine, is it's incredible.

In a data center, we could save tens of megawatts. Let's say 10 megawatts, or let's say 60 megawatts. 6 megawatts is 10 Rubin ultrarounds. 6 megawatts is 10 Rubin ultraracts. And 60, that's a lot. 100 Rubin ultraracts of power that we can now put into movies. So this is our roadmap once a year.

Once a year, in architecture, every two years, a new e-product line, every single year, ex-fastens up, and we try to take silicon risk, or networking risk, or system chassis risk in pieces so that we can move the industry forward as we pursue these incredible technologies. Vera Rubin, and I really appreciate the grandkids for being here. This is our opportunity to recognize her and to honor her for the incredible work that she did. Our next generation will be named after climate.

DGX-Spark

Okay, NVIDIA's roadmap. Let me talk to you about enterprise computing. This is really important. In order for us to bring AI to the world's enterprise, first we have to go through a different part of the middle. The beauty of gouging out. Okay, in order for us to take AI to enterprise, take a step back for a second and remind yourself of this.

Remember, AI and machine learning has reinvented the entire computing stack. The processor is different, the operating system is different, the applications on top are different. The way the applications are different, the way you orchestrate them are different, and the way you run them are different. Let me give you one example. The way you access data will be fundamentally different than the past.

Instead of retrieving precisely the data that you want, and you read it to try to understand it, in the future, we will do what we do with perplexity. Instead of doing retrieval that way, I just ask perplexity what I want. Ask it a question, and it will tell you the answer. This is the way enterprise IT will work in the future as well. We'll have AI agents, which are part of our digital workforce. There's a billion knowledge workers in the world. There are probably going to be 10 billion digital workers working with us side by side.

100% of software engineers in the future, there are 30 million of them around the world, 100% of them are going to be AI-assisted. I'm certain of that. 100% of NVIDIA software engineers will be AI-assisted by the end of this year. And so AI agents will be everywhere. How they run, what enterprises run, and how we run it will be fundamentally different. And so we need a new line of computers.

And this is what started it all. This is the NVIDIA VGX-1. 20 CPU cores. 128 gigabytes of GPU memory. One petaflops of computation. $150,000. 3,500 watts. Let me now introduce you to the new DGX. This is NVIDIA's new DGX, and we call it DGX Spark.

DGX Smart. We had a little bit of a surprise. 20 CPU cores. We partnered with MediaTek to build this for us. They did a fantastic job. It's been a great joy working with Usai and the MediaTek team. I really appreciate their partnership. Built us a chip-to-chip, MV-like CPU to GPU. And now the GPU has 128 gigabytes. And this is fun. One pedophiles. So this is like the original DGX-1 with Pym particles. You would have thought that that's a joke that would land at GTC.

OK, well, here's 30 million software engineers in the world. And this is 10, 20 million data scientists. And this is clearly the gear of choice. Look at this. In every bag, this is what you should find. Right? This is the development platform of every software engineer in the world. If you have a family member, spouse, somebody you care about, who's a software engineer, or AI researcher, or a big scientist, and you would like to give them, you know what? A perfect Christmas present.

Tell me, tell me this isn't what they want. Huh? And so, ladies and gentlemen, today, we'll let you receive it. We will ship, we will reserve, we will reserve the first DJX sparks for the attendees of GTC, so go reserve it. You already have one of these, now you just gotta get one of these.

Alright, the next... So that's... Thank you, Jacob. The next one is also a brand new computer. One that the world's never had before. So we're announcing a whole new line of computers. This is a new personal computer. A new personal workstation. I know, it's crazy. Check this out. Gray's black hole. Look how cool. This is what a PC should look like. 20 petaflops. Unbelievable. 72 CPU cores.

chip-to-chip interface, HPO memory, and just in case, some PCI-Express pads for your GPUs. So this is called DGX Station. DGX Spark and DGX Station are going to be available by all of the OEMs. HP, Dell, Lenovo, Asus. It's going to be manufactured for data scientists and researchers all over the world. This is the computer of the age of AI.

AI in Storage

This is what computers should look like, and this is what computers will run in the future. We have a whole line of them, our enterprise line, from little tiny ones, to workstation ones, to server ones, to super computer ones, and these will be available by all of our partners. We will also revolutionize the rest of the computing stack. Remember, computing has three pillars. There's computing, you're looking at it. There's networking, as I mentioned earlier. Spectral packs going to the world's enterprise, an AI network. And the third is storage. Storage has to be completely reinvented. Rather than a retrieval-based, storage system is going to be a semantics-based retrieval system, a semantics-based storage system. And so the storage system has to be continuously embedding information in the background. Taking raw data, embedding it into knowledge, and then later when you access it, you don't retrieve it. You just talk to it. You ask it questions. You give it problems.

And one of the examples, I wish we had a video of it, but Aaron and Fox even put one up in the cloud, working with us to put it up in the cloud. And it's basically, you know, a super smart storage system. And in the future, you're going to have something like that in every single enterprise. That is the enterprise storage of the future. And working with the entire storage industry, really fantastic partners, DDM, and Dell, and HP Enterprise, and Hitachi, and IBM, and NetApp, and Nutanix, and Pure Storage, and Mast, and Weka. Basically, the entire world storage industry will be offering this stack. For the very first time, your storage system will be GPU-accelerated.

Enterprise AI

And so somebody thought I didn't have enough slides. And so Michael thought I didn't have enough slides. So he said, Jensen, just in case you don't have any slides, can I just put this in there? And so this is Michael's slides. But he sent this to me. He goes, just in case you don't have any slides. And I said, I got too many slides. But this is such a great slide. And let me tell you why. In one single slide, he's explaining that Dell is going to be offering a whole line of media

Enterprise, IT, AI, infrastructure, systems, and all the software that runs on top of it. So you can see that we're in the process of revolutionizing the world's analytics. We're also announcing today this incredible model that everybody can run. And so I showed you earlier R1, a reasoning model. I showed you versus Model 3, a non-reasoning model.

Obviously, R1 is much smarter. But we can do it even better than that. And we can make it possible to be enterprise ready for any company. And it's not completely open source. It's part of our system we call NEMS. And you can download it. You can run it anywhere. You can run it on DGX Spark. You can run it on DGX Station. You can run it on any of the servers that the OEMs make. You can run it in the cloud. You can integrate it into any of your agentic AI frameworks. And we're working with companies all over the world. And I'm going to go through these. So watch very carefully. I've got some great partners in the audience. I want to recognize them.

Accenture, Julie Sweeney and her team are building their AI factory and their AI framework. Amnox, the world's largest artificial intelligence software company. AT&T, John Stanky and his team are building an AT&T AI system, a genetic system, Larry Fagan. our BlackRock team building theirs, Anyroon. In the future, not only will we hire ASIC designers, we're gonna hire a whole bunch of digital ASIC designers from Anyroon and Kavens that will help us design our chips. And so Kavens is building their AI framework, and as you can see in every single one of those, there's NVIDIA models, NVIDIA NIMS, NVIDIA libraries integrated throughout so that you can run it on-prem and without any cloud.

Capital One, one of the most advanced financial services companies in using technology, has NVIDIA all over it. Des Moines, Jason and his team. E&Y, Jen and his team. NASDAQ, Dina and her team. Integrating NVIDIA technology into their AI frameworks. And then Christian and his team at SAP. Bill McDermott and his team at ServiceNow. That was great.

First, this is one of those females where the first slide took 30 minutes, and then all the other slides took 30 minutes. So next, let's go somewhere else. Let's go talk about robotics, shall we? Let's talk about robots. Well, the time has come. The time has come for robots.

Robots

Robots have the benefit of being able to interact with the physical world and do things that otherwise digital information cannot. We know very clearly that the world has severe shortages of human laborers, human workers. By the end of this decade, the world is going to be at least 50 million workers short. We'd be more than delighted to pay them each $50,000 to go to work. We're probably going to have to pay robots $50,000 a year to go to work. And so this is going to be a very, very large initiative. There are all kinds of robotic systems. Your infrastructure will be robotic. Billions of cameras in warehouses and factories. 10, 20 million factories around the world. Every car is already a robot, as I mentioned earlier. And then now we're building general robots. Let me show you how we're doing it.

Everything that moves will be autonomous. Physical AI will embody robots of every kind in every industry. Three computers built by NVIDIA enable a continuous loop of robot AI simulation, training, testing, and real-world experience. Training robots requires huge volumes of data. Internet-scale data provides common sense and reasoning, but robots need action and control data, which is expensive to capture. With blueprints built on NVIDIA Omniverse and Cosmos, developers can generate massive amounts of diverse, synthetic data for training robot policies.

First, in Omniverse, Developers aggregate real-world sensor, or demonstration data, according to their different domains, robots, and tasks. Then use Omniverse to condition Cosmos, multiplying the original captures into large volumes of photoreal, diverse data. Developers use Isaac Lab to post-train the robot policies with the augmented dataset. and let the robots learn new skills by cloning behaviors through imitation learning or through trial and error with reinforcement learning AI feedback.

Practicing in a lab is different than the real world. New policies need to be field tested. Developers use Omniverse for software and hardware in the loop testing, simulating the policies in a digital twin with real-world environmental dynamics, with domain randomization, physics feedback, and high-fidelity sensor simulation. Real-world operations require multiple robots to work together. MEGA, an omniverse blueprint, lets developers test fleets of post-train policies at scale.

Here, Foxconn tests heterogeneous robots in a virtual NVIDIA Blackwell production facility. As the robot brains execute their missions, they perceive the results of their actions through sensor simulation, then plan their next action. Megalit's developers test many robot policies, enabling the robots to work as a system, whether for spatial reasoning, navigation, mobility, or dexterity.

Amazing things are born in simulation. Today, we're introducing NVIDIA Isaac Groot N1. Groot N1 is a generalist foundation model for humanoid robots. It's built on the foundations of synthetic data generation and learning and simulation. Group N1 features a dual system architecture for thinking fast and slow, inspired by principles of human cognitive processing. The slow thinking system lets the robot perceive and reason about its environment and instructions, and plan the right actions to take. The fast-thinking system translates the plan into precise and continuous robot actions.

Group N1's generalization lets robots manipulate common objects with ease and execute multi-step sequences collaboratively. And with this entire pipeline of synthetic data generation and robot learning, humanoid robot developers can post-train Group N1 across multiple embodiments and tasks across many environments. Around the world, in every industry, developers are using NVIDIA's three computers to build the next generation of embodied AI.

Physical, AI, and robotics are moving so fast. Everybody pay attention to this space. This could very well likely be the largest industry of all. At its core, we have the same challenges. As I mentioned before, there are three that we focus on. They are rather systematic. One, how do you solve the data problem? How, where do you create the data necessary to train the AI? Two, what's the model architecture? And then three, what's the scaling loss? How can we scale either the data, the compute, or both so that we can make AI smarter, smarter, smarter? How do we scale?

And those two, those fundamental problems exist in robotics as well. We created a system called Omniverse. It's our operating system as a physical AI. So you've heard me talk about Omniverse for a long time. We added two technologies to it. Today I'm going to show you two things. One of them is so that we can scale AI with generative capabilities, a generative model that understands the physical world. We call it Cosmos. Using Omniverse, to condition cosmos, and using cosmos to generate an infinite number of environments, allows us to create data that is grounded, grounded, controlled by us, and yet be systematically infinite at the same time. OK, so you see monitors, we use candy colors to give you an example.

of us controlling the robot and the scenario perfectly. And yet, Cosmos can create all these virtual environments. The second thing, just as we were talking about earlier, one of the incredible scaling capabilities of language models today is reinforcement learning verifiable rewards. The question is, what's the verifiable rewards in robotics?

And as we know very well, it's the laws of physics. Verifiable physics rule. And so we need an incredible physics engine. Well, most physics engines have been designed for a variety of reasons. They could be designed because we want to use it for large machineries, or maybe we design it for virtual worlds, video games, and such. But we need a physics engine that is designed for very fine grain,rigid and soft bodies, designed for being able to train tactile feedback and fine motor skills, actuator controls. We needed to be GPU accelerated so that these virtual worlds could live in super linear time, super real time, and train these AI models incredibly fast.

And we need it to be integrated harmoniously into a framework that is used by roboticists all over the world. And so today we're announcing something really, really special. It is a partnership of three companies. DeepMind, Disney Research, and NVIDIA, and we call it Blue.

Physical AI with Blue

Let's give them a round of applause. Thank you.

All right, let's start that over, shall we? Let's not ruin it for them. Hang on a second. Somebody talk to me. I need feedback. What happened? I just need a human to talk to me. Come on, that's a great show. I need a human to talk to me. Janine, I know it's not your fault, but talk to me. We've just got two minutes left.

They're rewrapping it? I don't know what that means. Okay. How are you doing? How do you like the new physics engine? You like it, huh? Yeah, I bet. Tackle feedback, rigid body, soft body simulation, super real-time. Can you imagine just how much we're looking at as complete real-time simulation? This is how we're going to train robots in the future.

Just so you know, Blue has two computers, two MVP computers inside. Look how smart you are. Yes, you're smart. OK. All right. Hey, Blue, listen. How about let's take them home? Let's finish this keynote. It's lunchtime. Are you ready? Let's finish it up. We have another announcement to give. You're good. You're good. Just stand right here. Stand right here. Stand right here. All right, good. Right there.

Okay, we have another amazing news. I told you the progress of our robotics has been making enormous progress. And today we're announcing that Root N1 is open source. I want to thank all of you for coming today.

Let's wrap up. I want to thank all of you for coming to GTC. We talked about several things. One, Blackwell is in full production, and the ramp is incredible. Customer demand is incredible, and for good reason, because there's an inflection point in AI. The amount of computation we have to do in AI is so much greater as a result of reasoning AI and the training of reasoning AI systems and agenting systems.

Second, Blackwell NVLink 72 with Dynamo is 40 times the AI factor performance of Hopper. And inference is going to be one of the most important workforces in the next decade as we scale out AI. Third, we have annual annual rhythm of roadmaps that has been laid out for you so that you can plan your AI infrastructure. And then we have two, we have three AI infrastructures we're building. AI infrastructure for the cloud, AI infrastructure for enterprise, and AI infrastructure for robots. We have one more treat for you. Wait.

♪♪

so so

So,

Very cool.

Issue #27: Jensen’s GTC Keynote Part 2 - The Rest of the Story

Table of Contents