The original is one click away. Open original ↗
How one founder built 400x more code using AI agents and token maxing
Executive overview
After 13 years away from coding, Gary Tan rebuilt a full-featured blog platform in 5 days for $200 — work that previously took 18 months and $4M. The leverage came from treating AI agents not as autocomplete but as a fleet of workers directed by human taste and judgment.
The core insight: token maxing — spending aggressively on compute to exhaust every useful input — is the new rent. Economising on it is the mistake.
Rebuilding in days what once took years
- Gary's List: rebuilt Posterous (a top-200 website) the third time in ~5 days for $200 in Claude Code credits
- First build: ~$4M, 6–7 people, 18 months. Second: ~$100K, 2 people, 3 months. Third: $200, 1 person, 5 days
- The platform does more than publish — it acts as an autonomous investigative journalist, ingesting dozens of sources, cross-referencing them, and producing fully cited long-form articles
- For ~$5–10 of Opus API calls, it replicates work a human researcher would need weeks to complete
Token maxing as a philosophy
- Token maxing: deliberately spending more compute to get a more complete, higher-quality output — not optimising for cheapness
- When building agentic software, don't settle for one source when you can cross-reference 20; don't accept 80% completeness when you can afford 100%
- Analogy: token spend is like San Francisco rent — expensive not to pay it; the serendipity (or throughput) it unlocks justifies the cost
- Applies beyond code: research, writing, QA, any knowledge work can be token maxed
- The human still supplies agency — the care about what gets built, the taste, the judgment
The G-Stack workflow
- G-Stack: Gary's open-source skill/prompt library for Claude Code, born from noticing he kept typing the same things
- Core skills: Office Hours (product validation), CEO Plan (10-star / 10x ambition check), Plan-Eng-Review (architecture + test coverage), Designer, DX Review, and End-to-End (QA)
- Typical flow: Office Hours → CEO review → Design → Developer review → E2E → Codex pass
- ASCII diagrams first: asking Claude to diagram all data flows, state machines, and dependencies before writing code dramatically reduces bugs and context loss
- G-Stack relies heavily on
ask_user— the human operator must supply understanding of what is being built; no substitute exists for that
Thin harnesses, fat skills
- Harness: the core agentic loop (take input → send to LLM → execute tool calls → loop). Don't rebuild this; use existing ones
- Skills/markdown: where all the intelligence lives — the plain-English instructions that tell the agent what to do, handle edge cases, encode judgment
- The hard problem in agentic engineering: deciding what belongs in LLM-land (flexible, handles ambiguity) vs. code-land (deterministic, brittle)
- Markdown is code — it compiles differently but directs the machine with the same force
Testing discipline
- Vibe coding without tests produces slop: works for 80% of cases, collapses under real users
- Target 80–90% test coverage — not 100% (diminishing returns), but enough to catch integration failures
- Claude Code will write the tests; the machine doesn't mind the tedium
- QA via Playwright: Gary built a long-lived browser daemon (
browse) with 70 CLI commands;qaskill tells it to check whatever changed on the branch
Claude Code vs. Codex vs. OpenClaw
- Claude Code: ideal for the "ADHD CEO" — fast, energetic, great for product velocity; occasionally confident and wrong
- Codex ("the 200 IQ nearly non-verbal CTO"): better for hard algorithmic problems; use it to find bugs Claude Code missed
- OpenClaw (open-source Claude): Gary now spends ~40–50% of build time there; enables personal AI with your own data, prompts, and integrations
- OpenClaw + G-Brain (RAG layer on his markdown corpus) = a personal knowledge system that understands context across all his projects
- Current state: like a Ferrari — exhilarating but requires you to be your own mechanic
The personal AI moment
- The personal computer revolution gave individuals control over compute; personal AI is the same shift happening now
- If you rely on a hosted product, a PM you'll never meet wrote the algorithm and it serves their business model, not yours
- Writing your own prompts puts you above the API line — you define what the agent optimises for
- The defining question: will you control your tools, or will your tools control you?
- This capability requires the latest models and real token spend — free tiers and Sonnet-level budgets don't unlock it
On lines of code and the 400x claim
- Professional software engineers write ~30–50 production lines of code per day on average (per published literature); Gary was writing ~14 (part-time)
- After stripping logical lines of code, Gary's current rate was 400x his 2013 baseline — but the 2013 baseline also turned out to be ~70% lower than he thought
- The meaningful point: AI-directed code doesn't pad LOC the way human incentives do; it builds the wrong thing if misdirected, but it doesn't optimise for line count
- Critics of the LOC metric are often the engineers who would benefit most from adopting this workflow
More like this — when you're ready for early access.
Join the waitlist for a personal account and content recommendations based on what you're working on.
No spam. Unsubscribe at any time.
You're on the list. We'll be in touch before launch.