The original is one click away. Open original ↗
AI engineering fundamentals: what actually improves AI products
Executive overview
Most companies chase the wrong things when building AI products — new models, frameworks, vector databases. The real gains come from talking to users, preparing better data, and writing better prompts.
The gap between what teams obsess over and what actually moves the needle is the core problem in AI product development today.
What people think improves AI apps vs. what actually does
- Staying current with AI news, adopting new frameworks, agonizing over vector databases — minimal impact
- Talking to users, building reliable platforms, preparing better data, optimizing end-to-end workflows — highest impact
- Before adopting any new technology, ask: how much improvement does it add, and how hard is it to switch out?
Pre-training, post-training, and fine-tuning explained
- Pre-training encodes statistical patterns across massive text datasets; the model learns to predict the next token
- Language modeling traces back to Shannon's 1951 entropy paper — Sherlock Holmes used the same statistical logic to decode ciphers
- Post-training (including fine-tuning) adjusts model weights for specific use cases using supervised learning or reinforcement learning
- Pre-trained models are increasingly at data limits; most frontier lab investment is now in post-training
- Supervised fine-tuning uses human-labeled demonstrations; distillation trains smaller models to emulate larger ones
Reinforcement learning and RLHF
- Reinforcement learning trains models by rewarding better outputs — signal can come from humans, AI judges, or verifiable answers
- Humans are better at comparisons than absolute scores — RLHF exploits this by ranking response pairs
- A reward model is trained on human preferences, then used to score and guide the base model
- Verifiable rewards (e.g., math problems with known answers) are increasingly used to reduce reliance on human labelers
- Data labeling companies face structural risk: few customers, many suppliers, limited pricing leverage
RAG (retrieval augmented generation)
- RAG provides the model with relevant context at query time so it can answer questions it wasn't trained on
- Retrieval quality depends almost entirely on data preparation, not on which vector database you choose
- Chunk size matters: too large reduces diversity; too small loses context — find the sweet spot
- Add metadata, summaries, and hypothetical questions to chunks to improve retrieval relevance
- Rewriting source content into Q&A format can yield large performance gains
- Documentation written for humans often lacks context AI needs — annotate for AI explicitly
Evals: when to invest and when to skip
- Evals guide product development by surfacing which user segments or features underperform
- Not every feature needs evals — weigh the engineering cost against the expected performance gain
- At scale or where failures are catastrophic, evals are non-negotiable
- The number of evals should match coverage needs, not a fixed target — some complex products need hundreds
- For agentic pipelines, evaluate each step (e.g., search query diversity, result relevance, answer completeness)
AI adoption inside companies
- Internal AI tooling (coding agents, internal chatbots, knowledge bases) shows mixed results on productivity
- Managers prefer headcount over AI tools; executives prefer AI tools — different incentive structures
- Productivity gains are real but hard to measure — number of PRs merged is not a valid proxy
- In one randomized trial (n=30–40 engineers), highest performers gained most from AI tools; lowest performers showed least improvement
- Some senior engineers resist AI tools due to high standards — AI-generated code doesn't meet their bar
- Engineering orgs are restructuring: senior engineers shifting to PR review and process design; junior engineers producing more code
ML engineer vs. AI engineer
- ML engineers build models from scratch
- AI engineers use existing models as a service to build products — much lower barrier to entry
- Knowing ML internals still helps, but is no longer required to ship AI products
System thinking as the durable skill
- AI is good at well-defined, isolated tasks; struggles with debugging across interconnected components
- The value of CS education is system thinking — understanding how components interact — not syntax
- AI tools give confidence to try new tools, but cannot yet diagnose cross-system root causes
- Problem-solving will not be automated; as AI handles more tasks, the problems themselves get bigger
Where things are heading in the next few years
- Organizational silos between engineering, product, and marketing will blur — evals require all three
- Team restructuring underway: fewer junior engineers producing code, more senior engineers reviewing
- Base model capability gains are slowing; most improvement will come from post-training and application layer
- Test-time compute (spending more inference compute on reasoning, sampling multiple answers) boosts perceived performance without changing the base model
- Multimodal (audio, video) is exciting but harder than text — voice involves latency, interruption detection, and regulatory questions
- Idea generation is the new bottleneck — specialization has left many people unable to think in big-picture use cases
Finding ideas worth building
- For a week, track everything that frustrates you at work
- Ask: could this be done differently so it stops being frustrating?
- Build a micro-tool around that friction — AI makes this faster than ever
- Bottom-up hackathons surface ideas, but people often freeze without a framework for generating them
More like this — when you're ready for early access.
Join the waitlist for a personal account and content recommendations based on what you're working on.
No spam. Unsubscribe at any time.
You're on the list. We'll be in touch before launch.