How Speak built an AI English tutor and became Korea's top education app

Executive overview

Speaking a second language is nearly impossible to learn without expensive human tutors or immersion. Speak replaced the tutor with AI-powered speech recognition, making fluency practice accessible and affordable at scale.

The founders spent a year doing deep AI research before building, then iterated relentlessly on product-market fit by going in-person to Korea to watch real users. Retaining over 50% of subscribers on day 30 proves the model works.

The core insight: pick one market, go there in person, and watch people use the product — then fix what you see.

Founders and origin

  • Connor sold his first app, Flashcards Plus, at 21; the exit gave him freedom to pursue passion over money
  • Andrew earned three degrees by age 12 and spent 3.5 years of a Stanford neuroscience PhD before dropping out
  • Both joined the Thiel Fellowship, became roommates, and spent a year doing deep ML research before building Speak
  • Early focus: speech recognition that understood not just words but accents — achieved state-of-the-art results using unlabelled YouTube data

Finding product-market fit

  • Launched in every global market initially; nothing converted well enough — users liked the product but didn't love it
  • Decided to pick one market and commit: flew to Korea, Japan, and Europe to interview users in person
  • Korea was chosen — high English-learning spend (at one point 1% of GDP), dense competition, and very opinionated users
  • Early user sessions revealed a simple gap: users wanted to speak more; existing product had too little speaking practice
  • First day in the Korean App Store: $18 in revenue — and they celebrated

The commute insight that unlocked growth

  • Biggest retention blocker: users wanted to practise on buses and subways but couldn't speak aloud in public
  • Counter-intuitive fix: built a silent reading/listening mode for commutes to create daily habit formation
  • Once the habit was established, users returned to active speaking practice at home
  • Result: conversion, retention, and every other metric spiked immediately after launching the commute mode

Product, ML, and content flywheel

  • Early versions used off-the-shelf speech recognition — good enough to launch and collect training data
  • That data fine-tuned their custom models, improving accuracy and enabling better product features over time
  • Now have an internal ML team building conversational AI features powered by modern language models
  • Content is treated as a product: A/B tested and iterated continuously, not built once and left alone
  • Marketing: "AI tutor" messaging confused users; "speak 100 sentences in 20 minutes" was concrete and converted
  • Key retention metric: over 50% of subscribers are still active on day 30 after starting

Expansion and long-term vision

  • After proving Korea, expanding to Japan and the US — applying the same local-first, in-person research playbook
  • Ultimate goal: make fluency practice as accessible as any app, at any price point, in any language
  • Technology vision extends beyond language: the voice-AI infrastructure they are building applies to any domain where humans communicate with machines

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.