Soft skills, synthetic data, and building AI at the frontier

Executive overview

AI models are rapidly automating hard technical skills — coding, writing, data analysis — while the distinctly human capabilities of creativity, listening, prioritisation, and people management become more valuable. Karina Nguyen, a researcher who helped build Claude 3 at Anthropic and Canvas and Tasks at OpenAI, argues the bottleneck in both AI progress and product development is no longer raw intelligence but the soft skills required to direct it.

Model training is more art than science — data quality, behavioural design, and robust evals matter more than raw scale.

How model training actually works

The pre-training "data wall" is a myth; the real scaling frontier is post-training via reinforcement learning over infinite tasks.
Models learn to compress and model the world, not just memorise text — pre-trained models are surprisingly diverse because they absorb the full range of human expression.
Benchmarks are now saturating (e.g. GPQA PhD-level Q&A approaching 60–70%), so the bottleneck has shifted to better evaluations, not more data.
Training is debugged like software: conflicting data signals cause model confusion (e.g. a model told it has no body but also taught how to set alarms).
Small distilled models are now outperforming older large models — cost of intelligence is dropping fast, broadening access.

Synthetic data and rapid model iteration

Synthetic data means using capable models (e.g. O1) to generate training tasks, then using those to train new behaviours — enabling fast, cheap iteration without human labellers for every case.
Expert human data still matters for highly specialised knowledge (chemistry, biology) that models can't self-generate reliably.
Canvas was built around three synthetically trained behaviours: when to trigger the canvas view, how to make targeted document edits, and how to add comments to specific sections.
Tasks was built similarly — the core design problem was specifying a JSON schema: what information must the model extract from a reminder prompt, and in what format.
Both products went from zero to one in two to five months using this approach.

Evals: the new product spec

Product development is shifting from "write a PRD, review the build" to "define what correct looks like, then measure against it continuously."
Evals range from deterministic pass/fail checks (did the model schedule a task at 7 p.m. when asked?) to human preference win-rates across model versions.
A robust eval is one where a prompted baseline scores lowest — if fine-tuned models can't beat a plain prompt, the eval isn't discriminating enough.
Product managers now create labelled spreadsheets of current vs ideal model behaviour; these double as training data when fed to a capable model.
The risk in optimising for one behaviour is "brain damage" elsewhere — every training decision involves trade-offs across the full capability profile.

What researchers do vs. applied engineers

Applied research is product-oriented: synthetic data generation, eval design, rapid model iteration tied to a specific feature.
Longer-horizon research focuses on new methods — e.g. how to inject diversity into synthetic datasets, or developing new reasoning capabilities like the O1 chain-of-thought paradigm.
Signs of life in a research method trigger a fork: generalise it as a method, or productise it quickly.

Skills that will matter more

Creative thinking — generating and filtering ideas; models struggle to discriminate between good and bad aesthetics or genuinely novel framings.
Listening and user empathy — the moat is increasingly in rapid iteration driven by user feedback, not in the technical build itself.
Prioritisation — allocating limited compute (or team attention) to the highest-conviction research path is a human judgment call with large leverage.
People management and collaboration — Canvas succeeded because of the quality of cross-functional collaboration; emotional intelligence is hard to train into models.
Strategy (connecting dots across large information sets) will increasingly be AI-assisted, but the human judgment about which dots matter stays important.
Hard skills — coding, design, structured writing — are the most exposed to substitution as models reach top-percentile performance in those areas.

Where AI is heading in three years

Cost of intelligence continues to fall; small models will deliver capabilities previously requiring large ones.
Healthcare and education are the highest-leverage near-term applications — diagnostic triage, personalised learning.
Computer-use agents (like Operator) are the next paradigm: models operating a virtual browser to complete tasks end-to-end on your behalf.
The hard problems in agents aren't technical perception alone — it's deriving user intent precisely enough to know when to ask a follow-up vs. proceed.
The interface paradigm is shifting from synchronous chat to asynchronous agents that work in the background and build trust over time through a running model of your preferences.
Scientific self-improvement — models suggesting their own next experiments based on empirical results — is a near-term research frontier.

Anthropic vs. OpenAI: cultural differences

More similar than different; both communities share people and goals.
Anthropic: intense craft around model personality, behaviour, and ethics; very hard prioritisation discipline, especially at smaller scale.
OpenAI: more bottom-up, risk-tolerant product culture; greater research freedom and surface area for experimentation.
Claude's distinct personality is a direct reflection of the careful, detail-oriented character of the people who built it.

Form factors and future interfaces

Form factor decisions drive adoption as much as model capability — file uploads, Canvas, and Tasks all succeed by mapping powerful model capabilities onto familiar UI patterns (documents, reminders).
The 100K context window unlocked file uploads as a form factor; the leap wasn't the context window itself but the realisation that uploading a book or financial report was a natural, familiar action.
Future content transformation: generating a sci-fi story in Canvas, then rendering it as audio or interactive 3D narrative in real time.
Agents that learn browsing patterns and preferences could simulate a personalised version of any expert — the "Lenny bot" model applied universally.

Soft skills, synthetic data, and building AI at the frontier

Executive overview

How model training actually works

Synthetic data and rapid model iteration

Evals: the new product spec

What researchers do vs. applied engineers

Skills that will matter more

Where AI is heading in three years

Anthropic vs. OpenAI: cultural differences

Form factors and future interfaces

More like this — when you're ready for early access.

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.

Executive overview

How model training actually works

Synthetic data and rapid model iteration

Evals: the new product spec

What researchers do vs. applied engineers

Skills that will matter more

Where AI is heading in three years

Anthropic vs. OpenAI: cultural differences

Form factors and future interfaces

More like this — when you're ready for early access.

More in AI

Building $10,000 software MVPs with AI in under an hour

How to actually make money with AI: five brutal truths

How to choose the right home for your AI workflow

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.