• The GPU
  • Posts
  • Issue #29: The Open-Source AI Acceleration Cloud with Together AI

Issue #29: The Open-Source AI Acceleration Cloud with Together AI

Open, transparent AI that you can take with you wherever you go.

The story of cloud is typically a story of vendor lock-in.

Enticing credit programs. “Transparent” pricing. Complex deployments.

Once you’re in, good luck getting out.

This company doesn’t play that game.

They’re going all-in on open source. No walled gardens, no sneaky pricing. Just open, transparent AI that you can take with you wherever you go. And they’re building the ultimate developer-friendly, purpose-built AI cloud.

One that’s research-driven, massively scalable, and remarkably cost-efficient.

And it’s not just marketing fluff.

Their inference engine outperforms AWS, Azure, and GCP equivalents, and it’s 11x cheaper than GPT-4 and 4x faster than vLLM.

That’s not just good. It's a challenge.

Who are they?

Welcome to Together AI.

The GPU Audio Companion Issue #29

Want the GPU breakdown without the reading? The Audio Companion does it for you—but only if you’re subscribed. Fix that here.

Company Background

Together AI isn’t your typical cloud platform.

It’s “The AI Acceleration Cloud”, built by a team of Stanford AI researchers and the group that scaled Apple’s Siri from the ground up.

The result? Full optimisation of every layer of the modern AI stack: hardware, software, and services.

They’re not just building tools either. They’re building the underlying infrastructure that will power the next wave of open-source innovation.

Their mission?

Make AI open, transparent, and community-driven.

Models. Data. Control. All in the hands of the user.

This is a serious deviation from the usual story of vendor lock-in, FinOps, and opaque end-of-month bills.

One that makes sense when you consider Together AI’s core principles:

  • Open Source First: No lock-in, full transparency, and community collaboration.

  • Developer Empowerment: Your models are yours. Train, fine-tune, and deploy without worrying about vendor lock-in.

  • Cost Efficiency: Together’s inference is 11x cheaper than GPT-4 while delivering the same level of performance.

  • Performance Matters: Optimised inference with FlashAttention-3, custom-built optimised kernels, and advanced speculative decoding.

Executive Team

Together AI is led by a team of AI visionaries and engineering experts:

The Edge

Together AI’s edge comes from one simple fact: they do it all themselves.

Recent Moves

  • Together GPU Clusters accelerated by NVIDIA Blackwell platform: Unveiled the deployment of NVIDIA Blackwell GPUs, launching their new Instant GPU Clusters that deliver up to 64 NVIDIA GPUs per deployment, entirely self-service.

  • NVIDIA Cloud Partner Status: Together AI joined the NVIDIA Cloud Partner Network, unlocking early access to Blackwell, and enabling the deployment of a 36,000 GPU cluster featuring the GB200 NVL72s, backed by 200MW+ of data centre capacity.

  • $305 Million Series B Funding: Together AI raised $305 million in a Series B round led by General Catalyst and Prosperity7, and participation from Salesforce Ventures, NVIDIA, DAMAC Capital, Kleiner Perkins, and more.

  • DeepSeek-R1 and Reasoning Clusters: Ranked amongst the fastest serverless API providers for the DeepSeek-R1 model, as measured by Artificial Analysis, and has recently launched Reasoning Clusters, delivering dedicated infrastructure for reasoning model inference at scale.

Read more about Together AI’s recent moves on their blog, here:

What’s Next?

Together AI is a statement against closed, proprietary ecosystems.

And with plans to massively scale NVIDIA GB200 NVL72 and HGX B200 capacity and ship upcoming reasoning models, including the expected Llama 4, atop the NVIDIA Blackwell platform, they’re doubling down on their commitment to accelerating AI with open-source innovation.

Why?

Together AI wants to provide every business and developer with a fully optimised platform for building and running their own AI with complete, end-to-end control.

All with no strings attached, no lock-in.

Just pure, unadulterated AI training and inference power, at a fraction of the cost of the hyperscalers.

Because why settle for walled gardens when you can own the whole field?

Keep The GPU Sharp and Independent

Good analysis isn’t free. And bad analysis? That’s usually paid for.

I want The GPU to stay sharp, independent, and free from corporate fluff. That means digging deeper, asking harder questions, and breaking down the world of GPU compute without a filter.

If you’ve found value in The GPU, consider upgrading to a paid subscription or supporting it below:

Buy Me A Coffeehttps://buymeacoffee.com/bbaldieri

It helps keep this newsletter unfiltered and worth reading.

Reply

or to participate.