The original is one click away. Open original ↗
How Speak built an AI English tutor and became Korea's top education app
Executive overview
Speaking a second language is nearly impossible to learn without expensive human tutors or immersion. Speak replaced the tutor with AI-powered speech recognition, making fluency practice accessible and affordable at scale.
The founders spent a year doing deep AI research before building, then iterated relentlessly on product-market fit by going in-person to Korea to watch real users. Retaining over 50% of subscribers on day 30 proves the model works.
The core insight: pick one market, go there in person, and watch people use the product — then fix what you see.
Founders and origin
- Connor sold his first app, Flashcards Plus, at 21; the exit gave him freedom to pursue passion over money
- Andrew earned three degrees by age 12 and spent 3.5 years of a Stanford neuroscience PhD before dropping out
- Both joined the Thiel Fellowship, became roommates, and spent a year doing deep ML research before building Speak
- Early focus: speech recognition that understood not just words but accents — achieved state-of-the-art results using unlabelled YouTube data
Finding product-market fit
- Launched in every global market initially; nothing converted well enough — users liked the product but didn't love it
- Decided to pick one market and commit: flew to Korea, Japan, and Europe to interview users in person
- Korea was chosen — high English-learning spend (at one point 1% of GDP), dense competition, and very opinionated users
- Early user sessions revealed a simple gap: users wanted to speak more; existing product had too little speaking practice
- First day in the Korean App Store: $18 in revenue — and they celebrated
The commute insight that unlocked growth
- Biggest retention blocker: users wanted to practise on buses and subways but couldn't speak aloud in public
- Counter-intuitive fix: built a silent reading/listening mode for commutes to create daily habit formation
- Once the habit was established, users returned to active speaking practice at home
- Result: conversion, retention, and every other metric spiked immediately after launching the commute mode
Product, ML, and content flywheel
- Early versions used off-the-shelf speech recognition — good enough to launch and collect training data
- That data fine-tuned their custom models, improving accuracy and enabling better product features over time
- Now have an internal ML team building conversational AI features powered by modern language models
- Content is treated as a product: A/B tested and iterated continuously, not built once and left alone
- Marketing: "AI tutor" messaging confused users; "speak 100 sentences in 20 minutes" was concrete and converted
- Key retention metric: over 50% of subscribers are still active on day 30 after starting
Expansion and long-term vision
- After proving Korea, expanding to Japan and the US — applying the same local-first, in-person research playbook
- Ultimate goal: make fluency practice as accessible as any app, at any price point, in any language
- Technology vision extends beyond language: the voice-AI infrastructure they are building applies to any domain where humans communicate with machines
More like this — when you're ready for early access.
Join the waitlist for a personal account and content recommendations based on what you're working on.
No spam. Unsubscribe at any time.
You're on the list. We'll be in touch before launch.