The original is one click away. Open original ↗
Nvidia: How CUDA and deep learning built a trillion-dollar platform
Executive overview
Nvidia entered 2006 as a dominant gaming GPU company with no obvious path to the next growth wave. Jensen Huang made a bet-the-company decision to build CUDA — a full parallel computing platform for general-purpose GPU workloads — targeting a scientific computing market too small to justify the investment by any rational measure.
The payoff was not a business plan; it was AlexNet. In 2012 a deep learning team at Toronto ran their neural network on Nvidia GPUs using CUDA and shattered every image-recognition benchmark. Machine learning ran on parallel matrix math — the exact problem Nvidia's architecture had been optimised for all along.
Owning the full software stack on proprietary hardware, given away free, created a switching-cost moat that compounds with every new CUDA developer.
From gaming to general-purpose computing
- Six-month chip ship cycles left every competitor, including Intel, far behind
- Nvidia wrote its own GPU drivers — unusual for a chip company — building deep low-level software talent early
- Programmable shaders (GeForce 3) created the first direct Nvidia–developer relationship; all developers were game developers at this stage
- CUDA development began in 2006: a full compiler, SDK, libraries, and developer evangelism stack — none of it charged to users
- CUDA is closed-source and runs only on Nvidia hardware; OpenCL competitors cannot run CUDA workloads
- 1,100+ Nvidia employees carry "CUDA" in their job titles today; three million registered CUDA developers
AlexNet and the deep learning unlock
- AlexNet (Krizhevsky, Sutskever, Hinton — University of Toronto) won the 2012 ImageNet competition with a 15% error rate vs ~26% for prior state of the art
- It implemented a convolutional deep neural network on Nvidia GPUs using CUDA — the first major real-world CUDA use case outside scientific computing
- Deep learning algorithms had existed for decades but were computationally impossible on CPU serial architectures; Nvidia's massively parallel cores made them practical
- Neural networks are "embarrassingly parallel" — every computation is independent — a perfect fit for a GPU with 10,000+ cores
- cuDNN (CUDA Deep Neural Networks library) followed, collapsing the barrier for data scientists to write high-performance neural nets without hardware expertise
- Key researchers (Fei-Fei Li, Jeff Hinton, Brian Katanzaro, Andrew Ng) were immediately recruited by Google, Facebook, Baidu — validating the commercial importance of the platform
The business model: free software, proprietary hardware
- CUDA has never been charged for; enterprises pay for Nvidia hardware at very high gross margins
- Consumer gaming cards: ~$500–3,000; enterprise data centre cards (A100/H100): $20,000–30,000+
- Nvidia changed its terms of service in 2018 to prevent consumer cards from being racked in data centres — explicit market segmentation
- Data centre revenue grew from ~$3B (2020) to over $10.5B (2022), matching the gaming segment for the first time
- Gross margin expanded from ~30% in 1999 to 66% by 2022 — software-level economics on hardware
- Operating margin of 37% — better than Apple's, comparable to the best software businesses
- Capex runs at ~$1B/year vs Apple ($10B), Microsoft/Google ($25B) and TSMC ($30B); the fabless model keeps capital intensity near zero
Four near-death moments and Jensen's refusals to sell
- 2001–2004: Tech bubble crash; stock gutted. Jensen does not sell.
- 2008: AMD acquires ATI (a legitimate rival); Nvidia whiffs on earnings; stock drops 80%. Jensen does not sell; doubles down on CUDA.
- 2011: Another earnings miss; stock falls 50%. Continued CUDA investment with no visible revenue.
- 2018: Crypto mining boom inflates GPU demand; crypto winter hits; revenue declines; stock drops 50%. Jensen stays the course.
- Nvidia only broke back through its 2007 market cap peak of ~$20B in 2016 — nearly a decade after AlexNet proved the thesis
Misadventures: Tegra and mobile
- 2008: Nvidia launched Tegra, an ARM-based system-on-a-chip targeting smartphones — directly competing with Qualcomm
- First Tegra product shipped in the Microsoft Zune HD
- Nvidia acquired mobile baseband company Icera (2011) then shut it down; the founders went on to found Graphcore
- Tegra found a home in the Nintendo Switch and the original Tesla Model S infotainment screen
- Nvidia never achieved profitable scale in the Android value chain; the mobile GPU IP AMD acquired from ATI was eventually sold to Qualcomm (rebranded Adreno)
Data centre, Mellanox, and the DPU thesis
- 2020: Acquired Mellanox (Israeli data centre networking company) for ~$7B — high-bandwidth, low-latency intra-data-centre switching
- Mellanox enabled a third compute tier: CPU (general purpose) → GPU (accelerated computing) → DPU / data processing unit (data movement and transformation within data centres)
- Nvidia now frames the entire data centre as the unit of compute, not the individual card
- Announced Grace (ARM-based data centre CPU) to pair with the Hopper GPU architecture
- Attempted acquisition of Arm Holdings collapsed under regulatory pressure; the strategic logic (extending CUDA to Arm-designed silicon) was later partially realised through Grace
Gaming: still growing, now smarter
- Ray tracing in real time at 60fps (RTX cards) — physics-accurate lighting rendered per frame
- DLSS (Deep Learning Super Sampling): renders at lower resolution, uses a trained neural net to upscale to 4K/8K at output — high frame rates and high resolution without the brute-force cost
- DLSS integrates game development directly with Nvidia's AI stack; game developers who don't support it are at a disadvantage
- Add-in board partners (ASUS, MSI, Zotac etc.) still manufacture and brand most consumer cards — Nvidia's Founder's Edition is a reference-design minority
- Crypto mining demand artificially crippled on consumer cards via firmware; dedicated CMP (Crypto Mining Processor) cards created to capture that segment separately
Competitive moat and bear cases
- Scale economies: 1,100+ CUDA engineers amortised over 3M developers and the hardware they buy; no competitor can replicate this without a comparable market
- Switching costs: codebases written in CUDA cannot run on AMD or any other hardware — rewriting is a years-long project for large teams
- Bear case — custom silicon: Google TPUs, Tesla's in-house inference chips, Apple's M-series GPUs all chip away at specific workloads; none have displaced Nvidia in the data centre training market
- Bear case — specialised startups: Cerebras (wafer-scale chip, $2M per unit, 60x power draw) and Graphcore target AI training specifically; neither has achieved scale to threaten Nvidia's enterprise position
- Counter: recreating 15 years of CUDA libraries, developer tooling, and enterprise relationships is an enormous undertaking even for a trillion-dollar company
Omniverse and the physical world bet
- Omniverse is Nvidia's enterprise simulation platform — a "digital twin" layer where physical assets (warehouses, robots, vehicles, climate systems) can be modelled before real-world deployment
- Not a consumer metaverse; designed to run autonomously, primarily without human interaction
- Amazon warehouse robots, climate modelling, autonomous vehicle training are showcased use cases
- The bull case for the stock at 2022 valuations requires believing autonomous vehicles, industrial robotics, and the omniverse represent real near-term markets — not just optionality
More like this — when you're ready for early access.
Join the waitlist for a personal account and content recommendations based on what you're working on.
No spam. Unsubscribe at any time.
You're on the list. We'll be in touch before launch.