How large language models work and what they mean for developers

Executive overview

Pre-trained models like GPT are powerful but hallucinate confidently — a fundamental flaw of next-word prediction training. Fine-tuning and human feedback dramatically improve reliability and usefulness for specific applications. The range of viable use cases is now limited by imagination, not technology.

Building a differentiated LLM product requires three things: better prompting, rigorous evaluation, and ongoing fine-tuning from real user data.

What large language models are and why they matter now

Language models predict the next word from prior context — a concept decades old
Scale in parameters and training data drove a qualitative leap in capability
GPT-3 was the inflection point where reasoning and world knowledge became apparent
Models don't "understand" language — but they behave as if they do to succeed at prediction

The hallucination problem

Models are trained to complete text, not to be honest — they confidently produce wrong answers
Nat Friedman: output alternates between "spooky and kooky"
Best current mitigation: inject factual context directly into the prompt so the model uses it rather than improvising
Tone and personality problems (e.g. obsequiousness) are also addressable via fine-tuning

Fine-tuning and reinforcement learning from human feedback

Fine-tuning = extra training on input/output pairs that specialise the base model for a task
ChatGPT's viral success over the base DaVinci model came down to a fine-tuning exercise
RLHF (reinforcement learning from human feedback): humans rank outputs; that preference signal trains a further improvement
A 1–2B parameter model with instruction tuning and RLHF outperformed full GPT-3 in user preference
Anthropic showed RLHF-equivalent results using a second model as evaluator — removing the human bottleneck

Building LLM products: the three core challenges

Prototyping: prompts are highly iterative; version management matters early
Evaluation: LLM app quality is subjective — traditional accuracy metrics don't apply
Customisation: everyone uses the same base models; differentiation comes from fine-tuning on your own data and user feedback

The data flywheel in practice

In-production usage generates the best fine-tuning data: edits, send/no-send decisions, response rates
Capture implicit feedback (did the user send the email?) not just explicit thumbs up/down
This loop compounds: better model → better product → more usage → better training data

How developers' roles will change

GitHub Copilot is the standout application: significant fractions of code now written by LLMs
Senior developers benefit more than juniors — they're faster at editing and reading completions
Near term: same work done faster
Longer term: developers shift toward product management — writing specs, not boilerplate
Developers may be among the first professions to see large fractions of their role automated

Upcoming breakthroughs

Context window expansion: current token limits are a hard ceiling on capability; larger windows unlock much more
Agents and actions: LLMs that can call tools, search the web, and iterate on results — treating the model as an agent, not just a text generator

Network effects and competitive dynamics

Barriers to training frontier models are capital and talent, not secret sauce — methods are largely published
Feedback data gives a flywheel advantage for narrow applications, but general models can't over-specialise
Multiple competitive models are likely to persist; no single model will dominate all use cases

AGI timelines and ethical stakes

Expert median estimate: AGI by ~2040; some credible researchers say 2030 is plausible
Even pre-AGI models will cause significant societal and economic disruption
Models bake in biases from training data and the teams that built them
Short-term risks (social disruption, misplaced trust) are as pressing as long-term existential concerns
The benefits are large — but require deliberate, careful navigation

Building $10,000 software MVPs with AI in under an hour

Brett Malinowski May 14, 2026

AI tools & automation 9

MVP & prototyping 8

Automation & tools 6

One person with Claude Code can replace a three-person agency team
Partner with niche creators who already have audience and distribution
Use pre-built components for payments and chat — don't build infrastructure from scratch

AI strategy & adoption

YouTube

How to actually make money with AI: five brutal truths

Dan Martell May 14, 2026

AI strategy & adoption 9

Business models 8

Automation & tools 5

AI is a hammer — you still need to find the nail
Validate with manual "Wizard of Oz" delivery before automating anything
Future orgs are workflow-based; humans own outcomes, agents own tasks