Issue #29: The Open-Source AI Acceleration Cloud with Together AI

The story of cloud is typically a story of vendor lock-in.

Enticing credit programs. “Transparent” pricing. Complex deployments.

Once you’re in, good luck getting out.

This company doesn’t play that game.

They’re going all-in on open source. No walled gardens, no sneaky pricing. Just open, transparent AI that you can take with you wherever you go. And they’re building the ultimate developer-friendly, purpose-built AI cloud.

One that’s research-driven, massively scalable, and remarkably cost-efficient.

And it’s not just marketing fluff.

Their inference engine outperforms AWS, Azure, and GCP equivalents, and it’s 11x cheaper than GPT-4 and 4x faster than vLLM.

That’s not just good. It's a challenge.

Who are they?

Welcome to Together AI.

The GPU Audio Companion Issue #29

Want the GPU breakdown without the reading? The Audio Companion does it for you—but only if you’re subscribed. Fix that here.

Company Background

Together AI isn’t your typical cloud platform.

It’s “The AI Acceleration Cloud”, built by a team of Stanford AI researchers and the group that scaled Apple’s Siri from the ground up.

The result? Full optimisation of every layer of the modern AI stack: hardware, software, and services.

They’re not just building tools either. They’re building the underlying infrastructure that will power the next wave of open-source innovation.

Their mission?

Make AI open, transparent, and community-driven.

Models. Data. Control. All in the hands of the user.

This is a serious deviation from the usual story of vendor lock-in, FinOps, and opaque end-of-month bills.

One that makes sense when you consider Together AI’s core principles:

Open Source First: No lock-in, full transparency, and community collaboration.
Developer Empowerment: Your models are yours. Train, fine-tune, and deploy without worrying about vendor lock-in.
Cost Efficiency: Together’s inference is 11x cheaper than GPT-4 while delivering the same level of performance.
Performance Matters: Optimised inference with FlashAttention-3, custom-built optimised kernels, and advanced speculative decoding.

Meet the Team and Learn about our Values

Meet Together AI. We are a research-driven artificial intelligence company. We believe the future of AI is open source. We build everything with the purpose of benefitting society.

www.together.ai/about#values

Executive Team

Together AI is led by a team of AI visionaries and engineering experts:

Vipul Ved Prakash (Founder & CEO): Serial entrepreneur with a knack for building transformative tech.
Ce Zhang (Founder & CTO): AI researcher and core systems expert, leading product and engineering.
Chris Ré (Founder): Stanford professor, pioneer in machine learning and data processing.
Percy Liang (Founder): LLM specialist, pushing the boundaries of generative AI.
Tri Dao (Founding Chief Scientist): Developer of cutting-edge model architectures and training methods.
Kai Mak (Chief Revenue Officer): Commercial strategy and growth mastermind.
Rajan Sheth (Chief Marketing Officer, Interim)
Ryan Pollock (Director of Product Marketing): Communicating Together AI’s value proposition to the market.
Jamie de Guerre (Founding SVP Product): Product vision and strategy lead.
Nicolette Lea (Director People & Ops): Building the team and culture.
Arielle Fidel (VP Sales and BD): Scaling partnerships and enterprise deals.
Sarung Tripathi (VP, Customer Experience): Keeping users happy and engaged.
Charles Srisuwananukorn (Founding VP Engineering): Leading the technical execution.
Leon Song (VP of Research): Pushing research boundaries in AI and ML.
Albert Meixner (SVP of Engineering): Overseeing cloud and infrastructure development.

The Edge

Together AI’s edge comes from one simple fact: they do it all themselves.

End-to-End Control: From custom-built inference engines to fully managed AI clusters.
Open-Source Dominance: 200+ models at your fingertips. No walled gardens, no lock-in.
Flexible Deployment: Serverless endpoints, dedicated instances, and enterprise setups.
Transparent Pricing: No hidden fees, no surprises. Pay for what you use, scale as needed.

Together AI Solutions | Fastest Tools for Building Private Models for Enterprise

Together AI is the best partner to work with to train and fine-tune models and to host inference for your enterprise. Our approach gives you the greatest control while safe-guarding privacy for your data.

www.together.ai/solutions

Recent Moves

Together GPU Clusters accelerated by NVIDIA Blackwell platform: Unveiled the deployment of NVIDIA Blackwell GPUs, launching their new Instant GPU Clusters that deliver up to 64 NVIDIA GPUs per deployment, entirely self-service.
NVIDIA Cloud Partner Status: Together AI joined the NVIDIA Cloud Partner Network, unlocking early access to Blackwell, and enabling the deployment of a 36,000 GPU cluster featuring the GB200 NVL72s, backed by 200MW+ of data centre capacity.
$305 Million Series B Funding: Together AI raised $305 million in a Series B round led by General Catalyst and Prosperity7, and participation from Salesforce Ventures, NVIDIA, DAMAC Capital, Kleiner Perkins, and more.
DeepSeek-R1 and Reasoning Clusters: Ranked amongst the fastest serverless API providers for the DeepSeek-R1 model, as measured by Artificial Analysis, and has recently launched Reasoning Clusters, delivering dedicated infrastructure for reasoning model inference at scale.

Read more about Together AI’s recent moves on their blog, here:

Together Blog

Stay up to date on the happenings at Together AI. Read about Together AI’s latest research, newest product releases and recent customer stories.

www.together.ai/blog

What’s Next?

Together AI is a statement against closed, proprietary ecosystems.

And with plans to massively scale NVIDIA GB200 NVL72 and HGX B200 capacity and ship upcoming reasoning models, including the expected Llama 4, atop the NVIDIA Blackwell platform, they’re doubling down on their commitment to accelerating AI with open-source innovation.

Why?

Together AI wants to provide every business and developer with a fully optimised platform for building and running their own AI with complete, end-to-end control.

All with no strings attached, no lock-in.

Just pure, unadulterated AI training and inference power, at a fraction of the cost of the hyperscalers.

Because why settle for walled gardens when you can own the whole field?

Keep The GPU Sharp and Independent

Good analysis isn’t free. And bad analysis? That’s usually paid for.

I want The GPU to stay sharp, independent, and free from corporate fluff. That means digging deeper, asking harder questions, and breaking down the world of GPU compute without a filter.

If you’ve found value in The GPU, consider upgrading to a paid subscription or supporting it below:

☕ Buy Me A Coffee → https://buymeacoffee.com/bbaldieri

It helps keep this newsletter unfiltered and worth reading.

Ben Baldieri

Independent insights on GPU compute, AI infrastructure, and data centres. No fluff. Just sharp analysis on the trends shaping AI and digital infrastructure.

buymeacoffee.com/bbaldieri