The original is one click away. Open original ↗
How large language models work and what they mean for developers
Executive overview
Pre-trained models like GPT are powerful but hallucinate confidently — a fundamental flaw of next-word prediction training. Fine-tuning and human feedback dramatically improve reliability and usefulness for specific applications. The range of viable use cases is now limited by imagination, not technology.
Building a differentiated LLM product requires three things: better prompting, rigorous evaluation, and ongoing fine-tuning from real user data.
What large language models are and why they matter now
- Language models predict the next word from prior context — a concept decades old
- Scale in parameters and training data drove a qualitative leap in capability
- GPT-3 was the inflection point where reasoning and world knowledge became apparent
- Models don't "understand" language — but they behave as if they do to succeed at prediction
The hallucination problem
- Models are trained to complete text, not to be honest — they confidently produce wrong answers
- Nat Friedman: output alternates between "spooky and kooky"
- Best current mitigation: inject factual context directly into the prompt so the model uses it rather than improvising
- Tone and personality problems (e.g. obsequiousness) are also addressable via fine-tuning
Fine-tuning and reinforcement learning from human feedback
- Fine-tuning = extra training on input/output pairs that specialise the base model for a task
- ChatGPT's viral success over the base DaVinci model came down to a fine-tuning exercise
- RLHF (reinforcement learning from human feedback): humans rank outputs; that preference signal trains a further improvement
- A 1–2B parameter model with instruction tuning and RLHF outperformed full GPT-3 in user preference
- Anthropic showed RLHF-equivalent results using a second model as evaluator — removing the human bottleneck
Building LLM products: the three core challenges
- Prototyping: prompts are highly iterative; version management matters early
- Evaluation: LLM app quality is subjective — traditional accuracy metrics don't apply
- Customisation: everyone uses the same base models; differentiation comes from fine-tuning on your own data and user feedback
The data flywheel in practice
- In-production usage generates the best fine-tuning data: edits, send/no-send decisions, response rates
- Capture implicit feedback (did the user send the email?) not just explicit thumbs up/down
- This loop compounds: better model → better product → more usage → better training data
How developers' roles will change
- GitHub Copilot is the standout application: significant fractions of code now written by LLMs
- Senior developers benefit more than juniors — they're faster at editing and reading completions
- Near term: same work done faster
- Longer term: developers shift toward product management — writing specs, not boilerplate
- Developers may be among the first professions to see large fractions of their role automated
Upcoming breakthroughs
- Context window expansion: current token limits are a hard ceiling on capability; larger windows unlock much more
- Agents and actions: LLMs that can call tools, search the web, and iterate on results — treating the model as an agent, not just a text generator
Network effects and competitive dynamics
- Barriers to training frontier models are capital and talent, not secret sauce — methods are largely published
- Feedback data gives a flywheel advantage for narrow applications, but general models can't over-specialise
- Multiple competitive models are likely to persist; no single model will dominate all use cases
AGI timelines and ethical stakes
- Expert median estimate: AGI by ~2040; some credible researchers say 2030 is plausible
- Even pre-AGI models will cause significant societal and economic disruption
- Models bake in biases from training data and the teams that built them
- Short-term risks (social disruption, misplaced trust) are as pressing as long-term existential concerns
- The benefits are large — but require deliberate, careful navigation
More like this — when you're ready for early access.
Join the waitlist for a personal account and content recommendations based on what you're working on.
No spam. Unsubscribe at any time.
You're on the list. We'll be in touch before launch.