Nvidia: How CUDA and deep learning built a trillion-dollar platform

Executive overview

Nvidia entered 2006 as a dominant gaming GPU company with no obvious path to the next growth wave. Jensen Huang made a bet-the-company decision to build CUDA — a full parallel computing platform for general-purpose GPU workloads — targeting a scientific computing market too small to justify the investment by any rational measure.

The payoff was not a business plan; it was AlexNet. In 2012 a deep learning team at Toronto ran their neural network on Nvidia GPUs using CUDA and shattered every image-recognition benchmark. Machine learning ran on parallel matrix math — the exact problem Nvidia's architecture had been optimised for all along.

Owning the full software stack on proprietary hardware, given away free, created a switching-cost moat that compounds with every new CUDA developer.

From gaming to general-purpose computing

  • Six-month chip ship cycles left every competitor, including Intel, far behind
  • Nvidia wrote its own GPU drivers — unusual for a chip company — building deep low-level software talent early
  • Programmable shaders (GeForce 3) created the first direct Nvidia–developer relationship; all developers were game developers at this stage
  • CUDA development began in 2006: a full compiler, SDK, libraries, and developer evangelism stack — none of it charged to users
  • CUDA is closed-source and runs only on Nvidia hardware; OpenCL competitors cannot run CUDA workloads
  • 1,100+ Nvidia employees carry "CUDA" in their job titles today; three million registered CUDA developers

AlexNet and the deep learning unlock

  • AlexNet (Krizhevsky, Sutskever, Hinton — University of Toronto) won the 2012 ImageNet competition with a 15% error rate vs ~26% for prior state of the art
  • It implemented a convolutional deep neural network on Nvidia GPUs using CUDA — the first major real-world CUDA use case outside scientific computing
  • Deep learning algorithms had existed for decades but were computationally impossible on CPU serial architectures; Nvidia's massively parallel cores made them practical
  • Neural networks are "embarrassingly parallel" — every computation is independent — a perfect fit for a GPU with 10,000+ cores
  • cuDNN (CUDA Deep Neural Networks library) followed, collapsing the barrier for data scientists to write high-performance neural nets without hardware expertise
  • Key researchers (Fei-Fei Li, Jeff Hinton, Brian Katanzaro, Andrew Ng) were immediately recruited by Google, Facebook, Baidu — validating the commercial importance of the platform

The business model: free software, proprietary hardware

  • CUDA has never been charged for; enterprises pay for Nvidia hardware at very high gross margins
  • Consumer gaming cards: ~$500–3,000; enterprise data centre cards (A100/H100): $20,000–30,000+
  • Nvidia changed its terms of service in 2018 to prevent consumer cards from being racked in data centres — explicit market segmentation
  • Data centre revenue grew from ~$3B (2020) to over $10.5B (2022), matching the gaming segment for the first time
  • Gross margin expanded from ~30% in 1999 to 66% by 2022 — software-level economics on hardware
  • Operating margin of 37% — better than Apple's, comparable to the best software businesses
  • Capex runs at ~$1B/year vs Apple ($10B), Microsoft/Google ($25B) and TSMC ($30B); the fabless model keeps capital intensity near zero

Four near-death moments and Jensen's refusals to sell

  • 2001–2004: Tech bubble crash; stock gutted. Jensen does not sell.
  • 2008: AMD acquires ATI (a legitimate rival); Nvidia whiffs on earnings; stock drops 80%. Jensen does not sell; doubles down on CUDA.
  • 2011: Another earnings miss; stock falls 50%. Continued CUDA investment with no visible revenue.
  • 2018: Crypto mining boom inflates GPU demand; crypto winter hits; revenue declines; stock drops 50%. Jensen stays the course.
  • Nvidia only broke back through its 2007 market cap peak of ~$20B in 2016 — nearly a decade after AlexNet proved the thesis

Misadventures: Tegra and mobile

  • 2008: Nvidia launched Tegra, an ARM-based system-on-a-chip targeting smartphones — directly competing with Qualcomm
  • First Tegra product shipped in the Microsoft Zune HD
  • Nvidia acquired mobile baseband company Icera (2011) then shut it down; the founders went on to found Graphcore
  • Tegra found a home in the Nintendo Switch and the original Tesla Model S infotainment screen
  • Nvidia never achieved profitable scale in the Android value chain; the mobile GPU IP AMD acquired from ATI was eventually sold to Qualcomm (rebranded Adreno)

Data centre, Mellanox, and the DPU thesis

  • 2020: Acquired Mellanox (Israeli data centre networking company) for ~$7B — high-bandwidth, low-latency intra-data-centre switching
  • Mellanox enabled a third compute tier: CPU (general purpose) → GPU (accelerated computing) → DPU / data processing unit (data movement and transformation within data centres)
  • Nvidia now frames the entire data centre as the unit of compute, not the individual card
  • Announced Grace (ARM-based data centre CPU) to pair with the Hopper GPU architecture
  • Attempted acquisition of Arm Holdings collapsed under regulatory pressure; the strategic logic (extending CUDA to Arm-designed silicon) was later partially realised through Grace

Gaming: still growing, now smarter

  • Ray tracing in real time at 60fps (RTX cards) — physics-accurate lighting rendered per frame
  • DLSS (Deep Learning Super Sampling): renders at lower resolution, uses a trained neural net to upscale to 4K/8K at output — high frame rates and high resolution without the brute-force cost
  • DLSS integrates game development directly with Nvidia's AI stack; game developers who don't support it are at a disadvantage
  • Add-in board partners (ASUS, MSI, Zotac etc.) still manufacture and brand most consumer cards — Nvidia's Founder's Edition is a reference-design minority
  • Crypto mining demand artificially crippled on consumer cards via firmware; dedicated CMP (Crypto Mining Processor) cards created to capture that segment separately

Competitive moat and bear cases

  • Scale economies: 1,100+ CUDA engineers amortised over 3M developers and the hardware they buy; no competitor can replicate this without a comparable market
  • Switching costs: codebases written in CUDA cannot run on AMD or any other hardware — rewriting is a years-long project for large teams
  • Bear case — custom silicon: Google TPUs, Tesla's in-house inference chips, Apple's M-series GPUs all chip away at specific workloads; none have displaced Nvidia in the data centre training market
  • Bear case — specialised startups: Cerebras (wafer-scale chip, $2M per unit, 60x power draw) and Graphcore target AI training specifically; neither has achieved scale to threaten Nvidia's enterprise position
  • Counter: recreating 15 years of CUDA libraries, developer tooling, and enterprise relationships is an enormous undertaking even for a trillion-dollar company

Omniverse and the physical world bet

  • Omniverse is Nvidia's enterprise simulation platform — a "digital twin" layer where physical assets (warehouses, robots, vehicles, climate systems) can be modelled before real-world deployment
  • Not a consumer metaverse; designed to run autonomously, primarily without human interaction
  • Amazon warehouse robots, climate modelling, autonomous vehicle training are showcased use cases
  • The bull case for the stock at 2022 valuations requires believing autonomous vehicles, industrial robotics, and the omniverse represent real near-term markets — not just optionality

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.