How AI is changing product skills, startup opportunities, and the future of work

Executive overview

AI models improve so fast that today's capabilities are the worst you will ever use. Building on shifting foundations forces a fundamentally different product approach — short planning cycles, bottoms-up teams, and tolerance for fuzzy outputs.

The core skill shift is learning to evaluate model performance for specific use cases rather than assuming a model either works or doesn't. Fine-tuned, ensemble model architectures will become standard across all product teams, not just foundation model companies.

The AI model you use today is the worst AI model you will ever use for the rest of your life.

What's different about building at OpenAI

  • Every two months, computers can do something they've never done before — product thinking must reset constantly.
  • Traditional software gives defined inputs → defined outputs. LLMs give fuzzy inputs → probabilistically correct outputs.
  • Whether a model is right 60%, 95%, or 99.5% of the time determines the entire product architecture.
  • The planning value is in the act of aligning on direction and checking dependencies — not in the written roadmap, which will be wrong.
  • Model maximalism: don't build scaffolding around current limitations; a better model will arrive in two months.
  • Ship early, iterate in public — "iterative deployment" as a philosophy, co-evolving with users.

Writing evals as a core product skill

  • An eval is a test suite for a model — like unit tests, but measuring model capability on a specific use case.
  • Evals tell you whether the model is 60%, 95%, or 99.95% right on your use case — each threshold demands a different product.
  • Build evals alongside product design, not after; use hero use cases to define what "great" looks like.
  • Evals enable continuous improvement: fine-tune the model against them and hill-climb toward a working product.
  • Models are capped in usefulness by the quality of evals — most enterprise data is behind company walls and not in training sets.
  • Custom evals + fine-tuning on company-specific data is how you close the gap between a general model and a great product.

Where startups can build without OpenAI competing

  • No matter how large OpenAI grows, there are far more smart people outside its walls than inside.
  • OpenAI will not have the domain knowledge, headcount, or industry-specific data to build most vertical AI products.
  • Every industry has opportunities to build AI products that improve on the state of the art.
  • Three million developers use the OpenAI API; the goal is to power, not replace, the builder ecosystem.

How OpenAI ships quickly

  • Bottoms-up teams with strong product-minded engineers who feel empowered to make decisions.
  • No blocking review gates — teams shouldn't wait for Kevin or Sam to approve a launch.
  • Quarterly planning is a moment to align and check dependencies, not a binding commitment.
  • Mistakes are expected and acceptable; roll back, learn, move on.
  • Even bad naming (o3 mini high) doesn't matter if the product is useful — don't optimise for low-priority polish.

Ensemble models and fine-tuning in practice

  • OpenAI uses ensembles of models internally: different model sizes for different latency and cost requirements, different fine-tuned models for different sub-tasks.
  • Break complex problems into specific sub-tasks; use a specialised model for each rather than one broad prompt.
  • Customer support at 400M+ weekly active users runs largely automated with a small team — powered by a fine-tuned model on internal knowledge bases.
  • Fine-tuning is under-adopted externally; it will become standard as teams get comfortable customising models for specific use cases.
  • A useful mental model: a company is an ensemble of humans, each fine-tuned by education and experience — same logic applies to model ensembles.

Reasoning models and UI design

  • Reasoning models (o-series) introduced a new UX problem: the model needs to "think" for 10–25 seconds — too long to stare at, too short to leave.
  • Solution: show condensed summaries of thinking (1–2 sentences), not raw chain-of-thought — enough to learn from, not overwhelming.
  • Rule of thumb: ask how a human would behave in the equivalent situation, then design accordingly.
  • Multi-model consensus (several models attack the same problem, one integrates) mirrors how group brainstorming improves individual thinking.

Chat as the right interface for LLMs

  • Chat is the most universal, unconstrained communication medium humans use — it's exactly fit to the flexibility of LLMs.
  • Past chat interfaces failed because no model was good enough to handle the full complexity of human language; that constraint is now gone.
  • Prescribed, narrower interfaces are better for high-volume, specific tasks — but chat remains the baseline catch-all.

How product teams will change

  • Researcher/ML engineer roles will embed into most product teams as fine-tuning becomes a standard workflow.
  • PM count should stay lean — too many PMs fills the world with decks instead of shipped product.
  • Key PM traits at OpenAI: high agency, comfort with ambiguity, leading through influence, decisiveness when no one else will make a call.
  • Vibe coding (tab-accepting AI code suggestions, iterating with the model) should replace Figma mockups for rapid prototyping.
  • The workflows of product teams should be largely unrecognisable in one year — most teams are not moving fast enough toward AI-native work.

Skills that will matter

  • Curiosity, independent thinking, and self-confidence will matter more than specific technical skills — they generalise across any future configuration.
  • Writing evals is a near-term skill with compounding value for anyone building AI products.
  • Teaching models using in-prompt examples (few-shot prompting) is a lightweight version of fine-tuning — underused today.
  • The need for expert prompt engineering should decline as models improve; the goal is AI that works for everyone without special knowledge.

AI and personalised tutoring

  • Personalised AI tutoring may be the highest-impact near-term application of LLMs.
  • Every study shows personalised tutoring produces multi-standard-deviation improvements in learning speed.
  • ChatGPT is free, models are capable enough, and Android devices are widespread — the missing piece is a compelling product.

On Libra (Facebook's crypto project)

  • The goal was sound: enable instant, free money transfers inside WhatsApp for the billions of people paying 20% remittance fees.
  • Execution mistake: too many new things at once — new blockchain, basket of currencies, WhatsApp integration — combined with Facebook's reputation being at its lowest.
  • Regret: the world would be better if it existed. The underlying tech lives on in Aptos and Sui (Mysten Labs), which were built on the open-sourced codebase.
  • The current regulatory environment and Meta's improved reputation may make it viable to build now.

Model trajectory

  • Models are simultaneously getting smarter, faster, cheaper, and more reliable (fewer hallucinations) with every generation.
  • Cost has dropped ~100x over roughly two years for comparable or better capability.
  • Iteration cycles have compressed from 6–9 months (GPT-3 era) to roughly 3–4 months (o-series).
  • If Moore's Law was a 2x every 18 months, AI capability improvement is running at roughly 10x per year — a far steeper exponential.

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.