How one founder built 400x more code using AI agents and token maxing

Executive overview

After 13 years away from coding, Gary Tan rebuilt a full-featured blog platform in 5 days for $200 — work that previously took 18 months and $4M. The leverage came from treating AI agents not as autocomplete but as a fleet of workers directed by human taste and judgment.

The core insight: token maxing — spending aggressively on compute to exhaust every useful input — is the new rent. Economising on it is the mistake.

Rebuilding in days what once took years

  • Gary's List: rebuilt Posterous (a top-200 website) the third time in ~5 days for $200 in Claude Code credits
  • First build: ~$4M, 6–7 people, 18 months. Second: ~$100K, 2 people, 3 months. Third: $200, 1 person, 5 days
  • The platform does more than publish — it acts as an autonomous investigative journalist, ingesting dozens of sources, cross-referencing them, and producing fully cited long-form articles
  • For ~$5–10 of Opus API calls, it replicates work a human researcher would need weeks to complete

Token maxing as a philosophy

  • Token maxing: deliberately spending more compute to get a more complete, higher-quality output — not optimising for cheapness
  • When building agentic software, don't settle for one source when you can cross-reference 20; don't accept 80% completeness when you can afford 100%
  • Analogy: token spend is like San Francisco rent — expensive not to pay it; the serendipity (or throughput) it unlocks justifies the cost
  • Applies beyond code: research, writing, QA, any knowledge work can be token maxed
  • The human still supplies agency — the care about what gets built, the taste, the judgment

The G-Stack workflow

  • G-Stack: Gary's open-source skill/prompt library for Claude Code, born from noticing he kept typing the same things
  • Core skills: Office Hours (product validation), CEO Plan (10-star / 10x ambition check), Plan-Eng-Review (architecture + test coverage), Designer, DX Review, and End-to-End (QA)
  • Typical flow: Office Hours → CEO review → Design → Developer review → E2E → Codex pass
  • ASCII diagrams first: asking Claude to diagram all data flows, state machines, and dependencies before writing code dramatically reduces bugs and context loss
  • G-Stack relies heavily on ask_user — the human operator must supply understanding of what is being built; no substitute exists for that

Thin harnesses, fat skills

  • Harness: the core agentic loop (take input → send to LLM → execute tool calls → loop). Don't rebuild this; use existing ones
  • Skills/markdown: where all the intelligence lives — the plain-English instructions that tell the agent what to do, handle edge cases, encode judgment
  • The hard problem in agentic engineering: deciding what belongs in LLM-land (flexible, handles ambiguity) vs. code-land (deterministic, brittle)
  • Markdown is code — it compiles differently but directs the machine with the same force

Testing discipline

  • Vibe coding without tests produces slop: works for 80% of cases, collapses under real users
  • Target 80–90% test coverage — not 100% (diminishing returns), but enough to catch integration failures
  • Claude Code will write the tests; the machine doesn't mind the tedium
  • QA via Playwright: Gary built a long-lived browser daemon (browse) with 70 CLI commands; qa skill tells it to check whatever changed on the branch

Claude Code vs. Codex vs. OpenClaw

  • Claude Code: ideal for the "ADHD CEO" — fast, energetic, great for product velocity; occasionally confident and wrong
  • Codex ("the 200 IQ nearly non-verbal CTO"): better for hard algorithmic problems; use it to find bugs Claude Code missed
  • OpenClaw (open-source Claude): Gary now spends ~40–50% of build time there; enables personal AI with your own data, prompts, and integrations
  • OpenClaw + G-Brain (RAG layer on his markdown corpus) = a personal knowledge system that understands context across all his projects
  • Current state: like a Ferrari — exhilarating but requires you to be your own mechanic

The personal AI moment

  • The personal computer revolution gave individuals control over compute; personal AI is the same shift happening now
  • If you rely on a hosted product, a PM you'll never meet wrote the algorithm and it serves their business model, not yours
  • Writing your own prompts puts you above the API line — you define what the agent optimises for
  • The defining question: will you control your tools, or will your tools control you?
  • This capability requires the latest models and real token spend — free tiers and Sonnet-level budgets don't unlock it

On lines of code and the 400x claim

  • Professional software engineers write ~30–50 production lines of code per day on average (per published literature); Gary was writing ~14 (part-time)
  • After stripping logical lines of code, Gary's current rate was 400x his 2013 baseline — but the 2013 baseline also turned out to be ~70% lower than he thought
  • The meaningful point: AI-directed code doesn't pad LOC the way human incentives do; it builds the wrong thing if misdirected, but it doesn't optimise for line count
  • Critics of the LOC metric are often the engineers who would benefit most from adopting this workflow

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.