Nvidia: How CUDA and deep learning built a trillion-dollar platform

Executive overview

Nvidia entered 2006 as a dominant gaming GPU company with no obvious path to the next growth wave. Jensen Huang made a bet-the-company decision to build CUDA — a full parallel computing platform for general-purpose GPU workloads — targeting a scientific computing market too small to justify the investment by any rational measure.

The payoff was not a business plan; it was AlexNet. In 2012 a deep learning team at Toronto ran their neural network on Nvidia GPUs using CUDA and shattered every image-recognition benchmark. Machine learning ran on parallel matrix math — the exact problem Nvidia's architecture had been optimised for all along.

Owning the full software stack on proprietary hardware, given away free, created a switching-cost moat that compounds with every new CUDA developer.

From gaming to general-purpose computing

Six-month chip ship cycles left every competitor, including Intel, far behind
Nvidia wrote its own GPU drivers — unusual for a chip company — building deep low-level software talent early
Programmable shaders (GeForce 3) created the first direct Nvidia–developer relationship; all developers were game developers at this stage
CUDA development began in 2006: a full compiler, SDK, libraries, and developer evangelism stack — none of it charged to users
CUDA is closed-source and runs only on Nvidia hardware; OpenCL competitors cannot run CUDA workloads
1,100+ Nvidia employees carry "CUDA" in their job titles today; three million registered CUDA developers

AlexNet and the deep learning unlock

AlexNet (Krizhevsky, Sutskever, Hinton — University of Toronto) won the 2012 ImageNet competition with a 15% error rate vs ~26% for prior state of the art
It implemented a convolutional deep neural network on Nvidia GPUs using CUDA — the first major real-world CUDA use case outside scientific computing
Deep learning algorithms had existed for decades but were computationally impossible on CPU serial architectures; Nvidia's massively parallel cores made them practical
Neural networks are "embarrassingly parallel" — every computation is independent — a perfect fit for a GPU with 10,000+ cores
cuDNN (CUDA Deep Neural Networks library) followed, collapsing the barrier for data scientists to write high-performance neural nets without hardware expertise
Key researchers (Fei-Fei Li, Jeff Hinton, Brian Katanzaro, Andrew Ng) were immediately recruited by Google, Facebook, Baidu — validating the commercial importance of the platform

The business model: free software, proprietary hardware

CUDA has never been charged for; enterprises pay for Nvidia hardware at very high gross margins
Consumer gaming cards: ~$500–3,000; enterprise data centre cards (A100/H100): $20,000–30,000+
Nvidia changed its terms of service in 2018 to prevent consumer cards from being racked in data centres — explicit market segmentation
Data centre revenue grew from ~$3B (2020) to over $10.5B (2022), matching the gaming segment for the first time
Gross margin expanded from ~30% in 1999 to 66% by 2022 — software-level economics on hardware
Operating margin of 37% — better than Apple's, comparable to the best software businesses
Capex runs at ~$1B/year vs Apple ($10B), Microsoft/Google ($25B) and TSMC ($30B); the fabless model keeps capital intensity near zero

Four near-death moments and Jensen's refusals to sell

2001–2004: Tech bubble crash; stock gutted. Jensen does not sell.
2008: AMD acquires ATI (a legitimate rival); Nvidia whiffs on earnings; stock drops 80%. Jensen does not sell; doubles down on CUDA.
2011: Another earnings miss; stock falls 50%. Continued CUDA investment with no visible revenue.
2018: Crypto mining boom inflates GPU demand; crypto winter hits; revenue declines; stock drops 50%. Jensen stays the course.
Nvidia only broke back through its 2007 market cap peak of ~$20B in 2016 — nearly a decade after AlexNet proved the thesis

Misadventures: Tegra and mobile

2008: Nvidia launched Tegra, an ARM-based system-on-a-chip targeting smartphones — directly competing with Qualcomm
First Tegra product shipped in the Microsoft Zune HD
Nvidia acquired mobile baseband company Icera (2011) then shut it down; the founders went on to found Graphcore
Tegra found a home in the Nintendo Switch and the original Tesla Model S infotainment screen
Nvidia never achieved profitable scale in the Android value chain; the mobile GPU IP AMD acquired from ATI was eventually sold to Qualcomm (rebranded Adreno)

Data centre, Mellanox, and the DPU thesis

2020: Acquired Mellanox (Israeli data centre networking company) for ~$7B — high-bandwidth, low-latency intra-data-centre switching
Mellanox enabled a third compute tier: CPU (general purpose) → GPU (accelerated computing) → DPU / data processing unit (data movement and transformation within data centres)
Nvidia now frames the entire data centre as the unit of compute, not the individual card
Announced Grace (ARM-based data centre CPU) to pair with the Hopper GPU architecture
Attempted acquisition of Arm Holdings collapsed under regulatory pressure; the strategic logic (extending CUDA to Arm-designed silicon) was later partially realised through Grace

Gaming: still growing, now smarter

Ray tracing in real time at 60fps (RTX cards) — physics-accurate lighting rendered per frame
DLSS (Deep Learning Super Sampling): renders at lower resolution, uses a trained neural net to upscale to 4K/8K at output — high frame rates and high resolution without the brute-force cost
DLSS integrates game development directly with Nvidia's AI stack; game developers who don't support it are at a disadvantage
Add-in board partners (ASUS, MSI, Zotac etc.) still manufacture and brand most consumer cards — Nvidia's Founder's Edition is a reference-design minority
Crypto mining demand artificially crippled on consumer cards via firmware; dedicated CMP (Crypto Mining Processor) cards created to capture that segment separately

Competitive moat and bear cases

Scale economies: 1,100+ CUDA engineers amortised over 3M developers and the hardware they buy; no competitor can replicate this without a comparable market
Switching costs: codebases written in CUDA cannot run on AMD or any other hardware — rewriting is a years-long project for large teams
Bear case — custom silicon: Google TPUs, Tesla's in-house inference chips, Apple's M-series GPUs all chip away at specific workloads; none have displaced Nvidia in the data centre training market
Bear case — specialised startups: Cerebras (wafer-scale chip, $2M per unit, 60x power draw) and Graphcore target AI training specifically; neither has achieved scale to threaten Nvidia's enterprise position
Counter: recreating 15 years of CUDA libraries, developer tooling, and enterprise relationships is an enormous undertaking even for a trillion-dollar company

Omniverse and the physical world bet

Omniverse is Nvidia's enterprise simulation platform — a "digital twin" layer where physical assets (warehouses, robots, vehicles, climate systems) can be modelled before real-world deployment
Not a consumer metaverse; designed to run autonomously, primarily without human interaction
Amazon warehouse robots, climate modelling, autonomous vehicle training are showcased use cases
The bull case for the stock at 2022 valuations requires believing autonomous vehicles, industrial robotics, and the omniverse represent real near-term markets — not just optionality

Nvidia: How CUDA and deep learning built a trillion-dollar platform

Executive overview

From gaming to general-purpose computing

AlexNet and the deep learning unlock

The business model: free software, proprietary hardware

Four near-death moments and Jensen's refusals to sell

Misadventures: Tegra and mobile

Data centre, Mellanox, and the DPU thesis

Gaming: still growing, now smarter

Competitive moat and bear cases

Omniverse and the physical world bet

More like this — when you're ready for early access.

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.

Executive overview

From gaming to general-purpose computing

AlexNet and the deep learning unlock

The business model: free software, proprietary hardware

Four near-death moments and Jensen's refusals to sell

Misadventures: Tegra and mobile

Data centre, Mellanox, and the DPU thesis

Gaming: still growing, now smarter

Competitive moat and bear cases

Omniverse and the physical world bet

More like this — when you're ready for early access.

More in Founder Stories

What a $7B founder learned building Glean from scratch

From four failed co-founder splits to a $66M solo startup

The real cost of avoiding hard conversations in leadership

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.