Werner Vogels on scaling Amazon, building AWS, and engineering culture

Executive overview

Most engineering organisations slow down as they grow because shared infrastructure becomes a bottleneck — not because the engineers get worse. Amazon solved this by decomposing into fully independent teams, each owning their data and roadmap. The same insight that fixed Amazon's internal scaling problem became the founding logic of AWS.

The real bottleneck is always shared resources — remove them and teams regain speed.

Career path and joining Amazon

  • Spent 10 years as a research scientist at Cornell building large-scale distributed systems.
  • Co-founded two startups on the side; one sold successfully, one failed.
  • Invited to give a talk at Amazon; expected a simple retailer, found a massive technology operation.
  • Joined in September 2004; became CTO six months later.

Inserting engineering rigour at Amazon

  • Early Amazon reached scale through pragmatism, not design — no textbook existed for what they were doing.
  • First focus area: performance measurement. Median latency tells you nothing; the 99th or 99.9th percentile is what matters for engineering decisions.
  • Second focus area: reliability. "Game days" — deliberately isolating a data centre — exposed that rules that looked solid on paper failed in practice with manual failovers.
  • A key surprise: bringing a data centre back online and syncing it was harder than the failure itself.
  • Third focus: efficiency and cost association.

From monolith to microservices

  • Amazon's original architecture was a monolith with a shared database managed by a DBA team — every schema change required their approval, killing engineering velocity.
  • Moved to service-oriented architecture: carve off pieces of the monolith, add an API, own your data.
  • Mistake: three services (customers, items, orders) were too coarse — within two years each was as large as the original monolith.
  • Decomposed further into microservices: each with its own scaling, reliability, and security profile (e.g. a login service scales differently than an address book service).
  • Second problem: each team now had to replicate its own database across three data centres — teams were spending engineering time on infrastructure, not innovation.
  • Solution: drop databases, storage, networking, and compute into a shared services platform with an API — this became the internal precursor to AWS.

How AWS was born

  • Companies building on Amazon's public catalogue API all eventually failed — not due to bad ideas, but because they couldn't acquire servers fast enough or fund the hardware.
  • Amazon had solved this internally; the insight was to rebuild those capabilities as external services.
  • Amazon S3 launched spring 2006 ("storage for the internet"); EC2 launched autumn 2006.
  • Growth was faster than expected: an internal storage capacity target was blown through in the first three months.
  • Early assumption that AWS would be self-service with no salespeople was wrong — solution architects, technical account managers, and customer support are all essential.
  • At the time of the interview: ~130 services; ~95% of features came directly from customer requests.

Four types of CTO role

  1. Infrastructure manager — reports to the CIO, manages large infrastructure in enterprises.
  2. Technical co-founder — holds the technical vision in early-stage companies; risky because it conflates engineering leadership with people management.
  3. Big thinker — drives next-generation innovation and experimentation (e.g. Bell Labs model).
  4. External-facing technologist — deep technical engagement with customers; finds patterns across them; feeds insights back into the product roadmap.

Werner's current role at Amazon is primarily the fourth type.

VP of engineering vs. CTO

  • VP of engineering: wakes up asking "do I have the best team in the best position to deliver?" — a people discipline.
  • CTO: asks "are we building the right technologies with the right tools?" — a technology discipline.
  • These are distinct jobs; conflating them in early-stage companies is a common failure mode.

Amazon's engineering culture

  • Teams of 10–12 people; small enough that everyone knows what everyone else is doing; compatible with effective standups.
  • Hire for ownership mentality, not followers — people who want to control their own product destiny.
  • Amazon's 14 Leadership Principles (customer obsession, ownership, dive deep, etc.) drive hiring; culture fit interviews matter as much as technical interviews.
  • A bad culture hire disrupts a small team far more than a bad technical hire.
  • Hierarchy is largely unnatural; self-organising teams that operate like independent startups scale better.

Working Backwards product process

  • Risk in technology-heavy companies: engineers take charge and build technology rather than products.
  • The Working Backwards process starts from the customer, not the technology:
    1. Write a press release describing exactly what you are building, in clear simple terms.
    2. Write the 20 most frequently asked questions and answer them clearly.
    3. Write the customer interaction document — how will customers actually use this?
    4. Write the user manual and glossary.
  • Iterate on documents 10–15 times until the scope is precise — then build exactly that and nothing more.
  • Engineers' instinct to add version-two features into version one is explicitly blocked.

Six-page narratives instead of slides

  • No PowerPoint or Keynote in meetings at Amazon.
  • Every meeting begins with 30 minutes of silent reading of a six-page narrative.
  • Hard to write a clear document without clarity of thought — bad thinking is exposed before it wastes meeting time.
  • After reading, everyone is on the same page; discussion quality is much higher.
  • The PR/FAQ from Working Backwards is often attached as an addendum.

Launching new services

  • AWS launches with a minimum feature set — rock solid, but minimal.
  • Reason: customers will use the product in every way except the one you intended; observe before building more.
  • Example: Lambda launched as an event-driven environment; enterprises immediately adopted it because they pay only for execution, not idle compute — a use case not initially anticipated.
  • Customer behaviour then reorders the roadmap (e.g. item-level access management was more important to DynamoDB customers than secondary indices).

Security as a default engineering concern

  • A data breach somewhere in the industry every week; Werner argues technologists should be embarrassed and are not.
  • Security retrofitted after the fact is a nightmare — it must be designed in from day one.
  • Security belongs in the CI/CD pipeline: every new open-source library addition should trigger an inspection; automated tools should continuously test for regulatory compliance (HIPAA, financial regulators).
  • Continuous deployment is actually better for security than large batch releases — five lines of code can be verified; 50,000 cannot.
  • Collecting customer data creates a grave responsibility; combining two or three innocuous datasets can produce sensitive profiles.

Common mistakes when adopting AWS

  • Treating AWS as a traditional data centre (just virtual machines and storage) misses the major productivity gains from higher-level managed services.
  • Failing to decide what kind of company you are building:
    • High-growth / get-big-fast: prioritise speed, customer acquisition, and investor capital; cost control is secondary.
    • Sustainable / long-term: requires tighter cost architecture with a clear link between cost and customer acquisition (e.g. Basecamp / Signal and Noise model).
  • Jeff Bezos frames this as mercenaries (in it for the money) vs. missionaries (in it for the product) — both valid, but they require fundamentally different technical architectures.

Future direction of software development

  • More companies will skip containers entirely and start directly with serverless environments.
  • Containers bring infrastructure management back — virtual machines still need to be managed underneath them.
  • Fargate (managed container service) removes that layer; the trend is toward zero infrastructure management.
  • The bigger five-year shift: security must become every engineer's and every executive's primary concern, not a specialist team's afterthought.

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.