Google's AI origins: how the transformer creator became the AI underdog

Executive overview

Google invented the modern AI era — PageRank, language models, deep learning at scale, the transformer — yet was caught flat-footed by ChatGPT in 2022. The company that trained the world's first AI researchers, built the TPU, acquired DeepMind, and published "Attention Is All You Need" spent years treating AI as a sustaining innovation inside existing products rather than a new platform.

The deepest irony in tech: Google gave OpenAI the blueprint to disrupt Google.

From PageRank to language models: Google's AI prehistory

Larry Page's father held a PhD in machine learning; PageRank was always framed as an AI problem
Noam Shazir and Georges Harik built Phil (Probabilistic Hierarchical Inferential Learner) at Google in 2001, creating "Did You Mean" and powering the early AdSense ad-matching system
By the mid-2000s Phil consumed 15% of Google's entire data center infrastructure
Franz Och built a massive n-gram language model that won the DARPA machine translation challenge; Jeff Dean re-architected it from 12-hour sentence translation to 100ms, shipping it as Google Translate
Sebastian Thrun joined from Stanford in 2007 and formalised the practice of bringing AI academics into Google as part-time contractors — a program that directly produced Google X and Google Brain

Google Brain and the cat paper

Andrew Ng and Jeff Dean launched Google Brain in 2011 as the second project inside Google X, alongside Greg Corrado
Their distributed training system, DistBelief, ran asynchronously across thousands of CPU cores — considered theoretically unsound, it worked anyway
The "cat paper" (2012): a 9-layer neural network trained on 10 million unlabeled YouTube frames spontaneously developed a "cat neuron" with no supervision
Sundar Pichai and Google employees cite that TGIF demo as the moment everything changed
The same technology enabled YouTube's recommender system — which Ben and David argue kicked off the true AI era in 2012, a decade before ChatGPT

AlexNet and the GPU revelation

Jeff Hinton, Alex Krizhevsky, and Ilya Sutskever entered the 2012 ImageNet competition using two off-the-shelf Nvidia GTX 580 gaming cards and CUDA
AlexNet achieved a 15% error rate, a 40% relative improvement over the next best entry — a step-change that had never occurred before
This proved GPUs were the right substrate for deep learning and put Nvidia on the path from $10B gaming company to AI infrastructure leader
Hinton auctioned his company DNN Research from his hotel room at a casino in Lake Tahoe; Google won at $44M over bids from Baidu, Microsoft, and DeepMind

The DeepMind acquisition

DeepMind was founded in 2010 by chess prodigy and neuroscientist Demis Hassabis, postdoc Shane Legg (who popularised the term AGI), and Mustafa Suleiman
Funded via a Singularity Summit pitch to Peter Thiel for ~$2M seed; Elon Musk became an investor after Demis told him Mars wouldn't save humanity from misaligned AI
Larry Page learned about DeepMind from a video Elon was watching on a private jet
Facebook offered up to $800M; Elon offered Tesla stock; Google closed at $550M with a promise that DeepMind could stay in London working on foundational research
Within months of acquisition, DeepMind reduced Google data center cooling energy by 40%
AlphaGo (2016) beat world Go champion Lee Sedol with novel moves no human had played — the board game has more positions than atoms in the universe, making brute force impossible

The TPU

Jeff Dean calculated that if voice search rolled out to all Android phones at typical usage, Google would need to double its entire data center footprint just for that one feature
Jonathan Ross designed the Tensor Processing Unit on 20% time, fitting it in a hard drive form factor so it could slot into existing server racks without physical rearchitecture
Built and deployed in 15 months; the AlphaGo match ran on a single machine with four TPUs
Google now operates an estimated 2–3 million TPUs — nearly matching Nvidia's annual GPU shipments
Over half the cost of running an AI data center is chip depreciation; Google pays Broadcom ~50% gross margin vs. the ~80% Nvidia charges everyone else — a structural unit-economics advantage

The transformer and the five lost years

In 2017, eight Google Brain researchers published "Attention Is All You Need" — as of 2025, the seventh most-cited paper of the 21st century with 173,000+ citations
The key insight: instead of sequential processing (LSTMs), pay attention to the entire input context simultaneously — more accurate and highly parallelisable
Noam Shazir rewrote the implementation from scratch; the result crushed existing Google Translate models and scaled predictably with more data and compute
Google built BERT and integrated transformer models into search quality, but did not treat it as a platform-level shift
All eight authors eventually left Google to start or join AI companies; Noam left to found Character AI, which Google later re-acquired via a $2.7B licensing deal
Elon Musk's exit from OpenAI's board in early 2018 forced the organisation to convert to a for-profit entity; Reid Hoffman introduced Sam Altman to Satya Nadella at Sun Valley, resulting in Microsoft's $1B investment and the OpenAI partnership

ChatGPT, Code Red, and the Bard disaster

ChatGPT launched on November 30, 2022 as a hastily built chat interface to GPT-3.5; servers overloaded, a Stripe paywall went up over a weekend, 100 million users in two months
Google issued a Code Red: AI had shifted overnight from sustaining to disruptive innovation
February 2023: Google rushed Bard to market; the launch video contained a factual error; stock dropped 8% that day
The underlying Lambda model lacked RLHF (reinforcement learning from human feedback); Bard was visibly inferior to ChatGPT
Sundar Pichai made two decisive moves in mid-2023: (1) merge Google Brain and DeepMind into Google DeepMind under Demis Hassabis; (2) standardise on a single model, Gemini, across all products

Gemini and the current AI posture

Jeff Dean and Noam Shazir (returned via the Character AI deal) became co-technical leads for Gemini; Sergey Brin returned as an active employee
Gemini 1.5 (February 2024): one million token context window, the largest on the market at launch
Gemini 2.5 Pro shipped March 2025; AI mode launched on Google Search the same month
Google is now processing nearly one quadrillion inference tokens per month across all services — a 50x increase in one year
450 million monthly Gemini users (including Nano Banana / embedded AI); 150 million Google One subscribers, growing ~50% year-over-year

Waymo: 20-year overnight success

Project Chauffeur launched inside Google X in 2009; completed the "Larry 1000" (hand-picked difficult California routes) in 18 months with a tiny team
Did not use deep learning for its first five years; added convolutional neural nets in 2013 after Google adopted GPUs
Spun out as an Alphabet subsidiary in 2016; first fully driverless commercial rides in Phoenix, October 2020
Now operating in Phoenix, San Francisco, LA, Austin, and Atlanta; hundreds of thousands of paid rides per week; over 100 million driverless miles
A Waymo study found 91% fewer serious-injury crashes versus average human drivers
Total investment to date: $10–15B — roughly one month of Google's profits, against a potential Google-sized market opportunity from crash-cost reduction alone

Google Cloud

Started as an opinionated platform-as-a-service (App Engine, 2008); pivoted to infrastructure-as-a-service in 2012
Struggled with enterprise go-to-market until Thomas Kurian (former Oracle president) joined in late 2018; headcount in enterprise sales grew from ~150 to thousands
Revenue: $4B (2017) → $13B (2020) → $26B (2022) → $50B+ annualised run rate today; turned profitable in 2023; fastest-growing major cloud at ~30% year-over-year
TPUs are available to external customers on Google Cloud — the only non-Nvidia AI chip accessible at hyperscaler scale

Bull and bear cases

Bull:

Google controls the front door to the internet and can route intent into AI products without losing users
The only company with all four AI pillars: foundational model, proprietary chips, hyperscale cloud, and mass-market applications
Structural cost advantage: lowest token-production cost among major model providers due to TPU economics
AI queries carry more intent signal than keyword search — ad rates should eventually be higher, not lower
YouTube training data and video distribution moat; personalisation data across Gmail, Maps, Docs, Chrome, Android
Waymo could become a Google-scale business independent of search

Bear:

AI has not yet yielded a viable ad model; value capture lags value creation by a wide margin
Google owned 90% of search; it will not own 90% of AI — the market has multiple credible players
High-value search verticals (travel, health, legal) are migrating to AI chat, eroding the ad inventory that funds everything
The incumbent's dilemma: protecting search revenue while aggressively cannibalising it is a needle that may prove impossible to thread
Public and ecosystem sentiment no longer favours Google the way it did in the startup or mobile eras

Powers analysis (Hamilton Helmer framework, scoped to AI)

Scale economies (strong): amortising training costs over ~1 quadrillion monthly tokens; structural chip cost advantage; biggest fixed-cost base spread over the most inference
Cornered resource (present): Google Search distribution; YouTube training data; private global fiber backhaul
Branding (present, net positive): trust advantage over newer AI labs for most users
Switching costs (emerging): low today for consumer chat; growing in enterprise; will increase as personal AI integrates with Gmail/Calendar/Drive
Network economies, counter positioning, process power: not present in current AI products

Google's AI origins: how the transformer creator became the AI underdog

Executive overview

From PageRank to language models: Google's AI prehistory

Google Brain and the cat paper

AlexNet and the GPU revelation

The DeepMind acquisition

The TPU

The transformer and the five lost years

ChatGPT, Code Red, and the Bard disaster

Gemini and the current AI posture

Waymo: 20-year overnight success

Google Cloud

Bull and bear cases

Powers analysis (Hamilton Helmer framework, scoped to AI)

More like this — when you're ready for early access.

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.

Executive overview

From PageRank to language models: Google's AI prehistory

Google Brain and the cat paper

AlexNet and the GPU revelation

The DeepMind acquisition

The TPU

The transformer and the five lost years

ChatGPT, Code Red, and the Bard disaster

Gemini and the current AI posture

Waymo: 20-year overnight success

Google Cloud

Bull and bear cases

Powers analysis (Hamilton Helmer framework, scoped to AI)

More like this — when you're ready for early access.

More in AI

Building $10,000 software MVPs with AI in under an hour

How to actually make money with AI: five brutal truths

How to choose the right home for your AI workflow

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.