Nvidia and the AI era: how GPU dominance became inevitable

Executive overview

By late 2022, large language models running on transformers burst into mainstream use, turning Nvidia's decade-long bet on GPU-accelerated data centres into the most profitable position in modern tech history. The company had quietly assembled every layer of the stack — chips, networking, software — while competitors watched.

The preparation was already done: CUDA, Mellanox, and the Hopper architecture were built years before the demand arrived.

The research chain that produced the AI moment

AlexNet (2012) ran convolutional neural networks on two consumer GeForce GPUs using CUDA, proving parallel compute could unlock AI
The Toronto team — Hinton, Krizhevesky, Ilya Sutskever — was scooped up by Google; Sutskever later co-founded OpenAI
Google's 2017 transformer paper ("Attention Is All You Need") made sequence models trainable in parallel, at scale
Transformers are O(n²) in compute, but GPUs can run all comparisons simultaneously — the bottleneck became memory, not speed
GPT parameter counts scaled from 120M (GPT-1) to 175B (GPT-3) to ~1.7T (GPT-4); model quality improved discontinuously with scale
OpenAI converted to a for-profit entity in 2019 and took $1B from Microsoft to afford the compute required

Why the data centre is the computer

Von Neumann CPUs execute one instruction at a time; GPUs run tens of thousands in parallel — a "giant Archimedes lever" on Moore's Law
Training large models requires hundreds of gigabytes of on-chip memory, forcing multiple GPUs to be networked as one logical computer
The H100 has 18,500 CUDA cores, 640 tensor cores, and 80 streaming multiprocessors; it is 9× faster than the A100 for AI training
CoWoS (chip-on-wafer-on-substrate) 2.5D packaging stacks high-bandwidth memory close to the logic die — currently 10–15% of TSMC's total capacity
TSMC capacity for CoWoS is the binding constraint on H100 supply, not Nvidia's willingness to sell

Nvidia's three-part data centre platform

Mellanox / InfiniBand (acquired 2020, $7B): the only high-bandwidth rack-to-rack networking stack that can treat a full data centre as one computer
Grace CPU (announced Sept 2022): an ARM-based CPU designed from scratch to orchestrate massive GPU clusters, not for laptops
Hopper GPU architecture (H100): split from the gaming Lovelace line, enabling Nvidia to monopolise CoWoS capacity at TSMC for AI chips
Together these three form the DGX system — a fully integrated AI supercomputer; a single DGX H100 starts at $500K, the GH200 SuperPod (256 racks) is hundreds of millions
DGX Cloud launched through Azure, Oracle and Google — a virtualised DGX rented via Nvidia's own interface at $37K/month for A100 access

CUDA: the software moat

Released 2006; today a compiler, runtime, debugger, profiler, native language (CUDA C++), and industry-specific libraries
Backwards-compatible across every Nvidia GPU shipped since 2006 — 500M CUDA-capable GPUs in the wild
Developer count: 100K (2010) → 1M (2016) → 2M (2018) → 4M (2023); roughly 10,000 person-years of cumulative investment
~1,600 Nvidia employees have "CUDA" in their LinkedIn title; competitors' open-source equivalents (ROCm, OpenCL) are years behind
The Apple-vs-Android analogy: Nvidia controls the tightly coupled hardware-software stack; PyTorch is the open ecosystem that rivals are trying to route through

Financial results and competitive position

Q1 FY24 (reported May 2023): revenue $7.2B, up 19% QoQ
Q2 FY24 guidance: $11B — up 53% QoQ, 65% YoY; stock rose 25% in after-hours
Q2 FY24 actual: total revenue $13.5B (+88% QoQ, +100% YoY); data centre alone $10.3B (+141% QoQ, +171% YoY)
Gross margin: ~70%, forecast 72% — vs 24% pre-CUDA era
~50% of data centre revenue comes from cloud service providers (AWS, Azure, Google, Meta); CSPs buy bare chips and integrate themselves
China revenue was 25% of total before export controls; Nvidia created the A800/H800 (capped NVLink bandwidth) to comply — still selling at volume
Jensen's revised TAM framing: $1 trillion in installed data centre hardware, $250B annual refresh spend — a grounded claim vs the earlier "1% of everything" slide

Bull and bear cases

Bull: accelerated computing is still a fraction of total workloads; Jensen is correct that every application will gain a generative AI layer; Nvidia moves at a six-month ship cycle competitors cannot match; data centre capex lock-in is decade-long
Bear: every large tech company (Google TPU, Amazon Trainium, Microsoft/AMD rumours) is incentivised to break the moat; PyTorch aggregates developers in a way that could eventually disintermediate hardware; a confidence crisis in AI could slow enterprise capex; inference workloads are less differentiated than training
Nvidia is not Cisco or Intel — it controls the software stack and has direct developer relationships; the closer analogy is Microsoft, or old-school IBM in its mainframe era

Nvidia and the AI era: how GPU dominance became inevitable

Executive overview

The research chain that produced the AI moment

Why the data centre is the computer

Nvidia's three-part data centre platform

CUDA: the software moat

Financial results and competitive position

Bull and bear cases

More like this — when you're ready for early access.

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.

Executive overview

The research chain that produced the AI moment

Why the data centre is the computer

Nvidia's three-part data centre platform

CUDA: the software moat

Financial results and competitive position

Bull and bear cases

More like this — when you're ready for early access.

More in AI

Building $10,000 software MVPs with AI in under an hour

How to actually make money with AI: five brutal truths

How to choose the right home for your AI workflow

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.