Software is changing again: Karpathy on LLMs, agents, and building for the AI era

Executive overview

Software is undergoing its second fundamental shift in a few years. Three distinct paradigms now coexist: hand-written code (1.0), neural network weights (2.0), and natural-language prompts (3.0). LLMs are best understood not as utilities or fabs but as operating systems — complex software ecosystems in their 1960s era, currently available via time-sharing.

The core opportunity is building partial autonomy apps: products where humans stay in the loop, GUIs make AI output auditable, and an autonomy slider lets teams dial up automation incrementally. Agents are a decade-long project, not a 2025 event.

The generation–verification loop is the bottleneck; everything should be built to make it faster.

The three software paradigms

  • Software 1.0: explicit code written by humans to program computers
  • Software 2.0: neural network weights shaped by data and optimisers; Hugging Face is its GitHub
  • Software 3.0: natural-language prompts that program LLMs; English is now a programming language
  • Each paradigm has eaten into the previous one — as at Tesla Autopilot, C++ was deleted as neural nets grew
  • Fluency across all three is increasingly necessary; the right paradigm depends on the task

LLMs as operating systems

  • LLM labs spend large capex to train models (building the grid) and sell metered OPEX access — utility-like behaviour
  • But LLMs are not simple commodities; they are complex, expanding software ecosystems
  • Closer analogy: operating systems — a few closed-source providers (like Windows/macOS) and an open-source alternative (Llama ≈ Linux)
  • Context windows are working memory; the LLM orchestrates memory and compute, like an OS orchestrating a CPU
  • We are in the 1960s of this new computing era: centralised, time-shared, no personal computing revolution yet
  • LLM outages cause intelligence brownouts — already critical infrastructure

The psychology of LLMs

  • LLMs are stochastic simulations of people — autoregressive transformers trained on human text
  • Superpowers: encyclopedic memory, cross-domain knowledge retrieval
  • Jagged intelligence: superhuman in some domains, yet insists 9.11 > 9.9
  • Anterograde amnesia: context windows are wiped; they don't accumulate organisational knowledge over time
  • Gullible: susceptible to prompt injection, data leakage, manipulation
  • Program around the deficits; don't pretend they don't exist

Partial autonomy apps

  • Cursor is the template: custom GUI, orchestrated LLM calls, context management, autonomy slider (autocomplete → full-repo agent)
  • Perplexity follows the same pattern: auditable sources, quick search through deep research
  • The generation–verification loop must be made as fast as possible
  • GUIs accelerate verification by using human visual processing rather than reading text
  • Keep the AI on a short leash: small, concrete prompts; incremental diffs; avoid 1,000-line changes
  • Self-driving cars took 12 years from a perfect demo to near-solved — agents will take similar time
  • Build Iron Man suits (augmentation with optional autonomy), not autonomous robots

Building for agents as a new class of user

  • Agents are a new consumer of digital infrastructure — human-like but not human, not traditional APIs
  • llm.txt: simple markdown on your domain telling LLMs what the site is about — far more reliable than HTML parsing
  • Stripe and Vercel are early movers: publishing docs in markdown, replacing every "click" instruction with the equivalent curl command
  • Agents cannot click; every instruction requiring a click is a dead end for LLMs today
  • Tools like GitIngest concatenate repos into LLM-readable text; Deep Wiki adds AI-generated analysis on top
  • Meeting LLMs halfway is still worth doing even as browser-use agents improve — it is cheaper and more reliable
  • Model Context Protocol (Anthropic) provides a standard interface for agents to interact with services

Vibe coding and democratising software

  • Natural language as the programming interface means everyone is now a potential programmer
  • Karpathy built a working iOS app in Swift without knowing Swift; a functional menu-image generator in hours
  • The hard part is not the code — it is DevOps, auth, payments, deployment: still click-heavy and human-oriented
  • Vibe coding is a gateway drug to software development, not a replacement for engineering
  • Making documentation LLM-readable (e.g. feeding the full Manim docs to an LLM) unlocks out-of-the-box use

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.