AI engineering fundamentals: what actually improves AI products

Executive overview

Most companies chase the wrong things when building AI products — new models, frameworks, vector databases. The real gains come from talking to users, preparing better data, and writing better prompts.

The gap between what teams obsess over and what actually moves the needle is the core problem in AI product development today.

What people think improves AI apps vs. what actually does

Staying current with AI news, adopting new frameworks, agonizing over vector databases — minimal impact
Talking to users, building reliable platforms, preparing better data, optimizing end-to-end workflows — highest impact
Before adopting any new technology, ask: how much improvement does it add, and how hard is it to switch out?

Pre-training, post-training, and fine-tuning explained

Pre-training encodes statistical patterns across massive text datasets; the model learns to predict the next token
Language modeling traces back to Shannon's 1951 entropy paper — Sherlock Holmes used the same statistical logic to decode ciphers
Post-training (including fine-tuning) adjusts model weights for specific use cases using supervised learning or reinforcement learning
Pre-trained models are increasingly at data limits; most frontier lab investment is now in post-training
Supervised fine-tuning uses human-labeled demonstrations; distillation trains smaller models to emulate larger ones

Reinforcement learning and RLHF

Reinforcement learning trains models by rewarding better outputs — signal can come from humans, AI judges, or verifiable answers
Humans are better at comparisons than absolute scores — RLHF exploits this by ranking response pairs
A reward model is trained on human preferences, then used to score and guide the base model
Verifiable rewards (e.g., math problems with known answers) are increasingly used to reduce reliance on human labelers
Data labeling companies face structural risk: few customers, many suppliers, limited pricing leverage

RAG (retrieval augmented generation)

RAG provides the model with relevant context at query time so it can answer questions it wasn't trained on
Retrieval quality depends almost entirely on data preparation, not on which vector database you choose
Chunk size matters: too large reduces diversity; too small loses context — find the sweet spot
Add metadata, summaries, and hypothetical questions to chunks to improve retrieval relevance
Rewriting source content into Q&A format can yield large performance gains
Documentation written for humans often lacks context AI needs — annotate for AI explicitly

Evals: when to invest and when to skip

Evals guide product development by surfacing which user segments or features underperform
Not every feature needs evals — weigh the engineering cost against the expected performance gain
At scale or where failures are catastrophic, evals are non-negotiable
The number of evals should match coverage needs, not a fixed target — some complex products need hundreds
For agentic pipelines, evaluate each step (e.g., search query diversity, result relevance, answer completeness)

AI adoption inside companies

Internal AI tooling (coding agents, internal chatbots, knowledge bases) shows mixed results on productivity
Managers prefer headcount over AI tools; executives prefer AI tools — different incentive structures
Productivity gains are real but hard to measure — number of PRs merged is not a valid proxy
In one randomized trial (n=30–40 engineers), highest performers gained most from AI tools; lowest performers showed least improvement
Some senior engineers resist AI tools due to high standards — AI-generated code doesn't meet their bar
Engineering orgs are restructuring: senior engineers shifting to PR review and process design; junior engineers producing more code

ML engineer vs. AI engineer

ML engineers build models from scratch
AI engineers use existing models as a service to build products — much lower barrier to entry
Knowing ML internals still helps, but is no longer required to ship AI products

System thinking as the durable skill

AI is good at well-defined, isolated tasks; struggles with debugging across interconnected components
The value of CS education is system thinking — understanding how components interact — not syntax
AI tools give confidence to try new tools, but cannot yet diagnose cross-system root causes
Problem-solving will not be automated; as AI handles more tasks, the problems themselves get bigger

Where things are heading in the next few years

Organizational silos between engineering, product, and marketing will blur — evals require all three
Team restructuring underway: fewer junior engineers producing code, more senior engineers reviewing
Base model capability gains are slowing; most improvement will come from post-training and application layer
Test-time compute (spending more inference compute on reasoning, sampling multiple answers) boosts perceived performance without changing the base model
Multimodal (audio, video) is exciting but harder than text — voice involves latency, interruption detection, and regulatory questions
Idea generation is the new bottleneck — specialization has left many people unable to think in big-picture use cases

Finding ideas worth building

For a week, track everything that frustrates you at work
Ask: could this be done differently so it stops being frustrating?
Build a micro-tool around that friction — AI makes this faster than ever
Bottom-up hackathons surface ideas, but people often freeze without a framework for generating them

AI engineering fundamentals: what actually improves AI products

Executive overview

What people think improves AI apps vs. what actually does

Pre-training, post-training, and fine-tuning explained

Reinforcement learning and RLHF

RAG (retrieval augmented generation)

Evals: when to invest and when to skip

AI adoption inside companies

ML engineer vs. AI engineer

System thinking as the durable skill

Where things are heading in the next few years

Finding ideas worth building

More like this — when you're ready for early access.

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.

Executive overview

What people think improves AI apps vs. what actually does

Pre-training, post-training, and fine-tuning explained

Reinforcement learning and RLHF

RAG (retrieval augmented generation)

Evals: when to invest and when to skip

AI adoption inside companies

ML engineer vs. AI engineer

System thinking as the durable skill

Where things are heading in the next few years

Finding ideas worth building

More like this — when you're ready for early access.

More in AI

Building $10,000 software MVPs with AI in under an hour

How to actually make money with AI: five brutal truths

How to choose the right home for your AI workflow

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.