How large language models work and what they mean for developers

Executive overview

Pre-trained models like GPT are powerful but hallucinate confidently — a fundamental flaw of next-word prediction training. Fine-tuning and human feedback dramatically improve reliability and usefulness for specific applications. The range of viable use cases is now limited by imagination, not technology.

Building a differentiated LLM product requires three things: better prompting, rigorous evaluation, and ongoing fine-tuning from real user data.

What large language models are and why they matter now

  • Language models predict the next word from prior context — a concept decades old
  • Scale in parameters and training data drove a qualitative leap in capability
  • GPT-3 was the inflection point where reasoning and world knowledge became apparent
  • Models don't "understand" language — but they behave as if they do to succeed at prediction

The hallucination problem

  • Models are trained to complete text, not to be honest — they confidently produce wrong answers
  • Nat Friedman: output alternates between "spooky and kooky"
  • Best current mitigation: inject factual context directly into the prompt so the model uses it rather than improvising
  • Tone and personality problems (e.g. obsequiousness) are also addressable via fine-tuning

Fine-tuning and reinforcement learning from human feedback

  • Fine-tuning = extra training on input/output pairs that specialise the base model for a task
  • ChatGPT's viral success over the base DaVinci model came down to a fine-tuning exercise
  • RLHF (reinforcement learning from human feedback): humans rank outputs; that preference signal trains a further improvement
  • A 1–2B parameter model with instruction tuning and RLHF outperformed full GPT-3 in user preference
  • Anthropic showed RLHF-equivalent results using a second model as evaluator — removing the human bottleneck

Building LLM products: the three core challenges

  • Prototyping: prompts are highly iterative; version management matters early
  • Evaluation: LLM app quality is subjective — traditional accuracy metrics don't apply
  • Customisation: everyone uses the same base models; differentiation comes from fine-tuning on your own data and user feedback

The data flywheel in practice

  • In-production usage generates the best fine-tuning data: edits, send/no-send decisions, response rates
  • Capture implicit feedback (did the user send the email?) not just explicit thumbs up/down
  • This loop compounds: better model → better product → more usage → better training data

How developers' roles will change

  • GitHub Copilot is the standout application: significant fractions of code now written by LLMs
  • Senior developers benefit more than juniors — they're faster at editing and reading completions
  • Near term: same work done faster
  • Longer term: developers shift toward product management — writing specs, not boilerplate
  • Developers may be among the first professions to see large fractions of their role automated

Upcoming breakthroughs

  • Context window expansion: current token limits are a hard ceiling on capability; larger windows unlock much more
  • Agents and actions: LLMs that can call tools, search the web, and iterate on results — treating the model as an agent, not just a text generator

Network effects and competitive dynamics

  • Barriers to training frontier models are capital and talent, not secret sauce — methods are largely published
  • Feedback data gives a flywheel advantage for narrow applications, but general models can't over-specialise
  • Multiple competitive models are likely to persist; no single model will dominate all use cases

AGI timelines and ethical stakes

  • Expert median estimate: AGI by ~2040; some credible researchers say 2030 is plausible
  • Even pre-AGI models will cause significant societal and economic disruption
  • Models bake in biases from training data and the teams that built them
  • Short-term risks (social disruption, misplaced trust) are as pressing as long-term existential concerns
  • The benefits are large — but require deliberate, careful navigation

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.