How o3 uses tools inside reasoning to solve harder problems

Executive overview

Most AI models use tools reactively — triggered by a keyword or explicit request, after reasoning has already concluded. o3 integrates tool use directly inside the reasoning process, letting it search, verify, and adjust mid-thought before producing an answer.

This makes o3 more accurate and capable on complex tasks, not just smarter on benchmarks. Three scaling levers now exist: training compute, inference compute, and in-reasoning tool use.

The core insight: giving a model access to tools during reasoning — not after — is the next major capability jump, equivalent in impact to adding chain-of-thought.

The three scaling levers

Training compute — data volume, compute, and time used to build the base model
Inference compute — active compute allocated while the model responds; more = better answers
In-reasoning tool use — tools called mid-thought, not reactively at the end
Earlier models treated tool use as reactive: keyword-triggered or user-prompted
o3 reasons about when and why to use a tool, then uses it, then continues reasoning
This reduces hallucinations and increases accuracy on tasks requiring multiple steps

Where o3 outperforms other models

Sticky-note to-do list: o3 rotates, crops, and extracts text from an image as part of reasoning before producing the final list — other models would just describe the image
Thumbnail feedback: passes in draft thumbnails, reasons through YouTube best practices, returns specific colour and layout recommendations in a comparison table
Market research (broad questions): o3 is better than deep research for high-level, analytically driven queries; deep research wins for targeted questions with lots of provided context
Gnarly bugs: when GPT-4.1, then Claude 3.7 thinking, then Gemini 2.5 Pro all fail, o3 often fixes the bug in one or two shots by reasoning across the entire codebase logic

Choosing the right model for the task

High-level research → o3 (broad, analytical, fast)
Targeted research with long output → deep research tools (Perplexity, Gemini, Claude, Grok)
Writing → Claude 3.7 / 3.5 Sonnet; GPT-4.1 with precise instructions
UI design → Claude 3.7 thinking
Large codebase or file → Gemini 2.5 Pro (1M context window)
Complex bugs, targeted feedback, business/product fit analysis → o3

Expert outsourcing with AI projects

Convert expert knowledge (YouTube videos, reports, podcasts) into a Claude or GPT project
Pair a high-quality knowledge base with a precise system prompt to create an always-available specialist
Use cases: thumbnail strategy, cold outreach, sales — any domain with an identifiable expert whose thinking can be captured
Custom GPTs are less preferred; Claude projects or GPT projects with structured instructions work better

Building $10,000 software MVPs with AI in under an hour

Brett Malinowski May 14, 2026

AI tools & automation 9

MVP & prototyping 8

Automation & tools 6

One person with Claude Code can replace a three-person agency team
Partner with niche creators who already have audience and distribution
Use pre-built components for payments and chat — don't build infrastructure from scratch

AI strategy & adoption

YouTube

How to actually make money with AI: five brutal truths

Dan Martell May 14, 2026

AI strategy & adoption 9

Business models 8

Automation & tools 5

AI is a hammer — you still need to find the nail
Validate with manual "Wizard of Oz" delivery before automating anything
Future orgs are workflow-based; humans own outcomes, agents own tasks

AI strategy & adoption

YouTube

How to choose the right home for your AI workflow

Dylan Davis May 13, 2026

AI strategy & adoption 9

Automation & tools 6

AI defaults to building apps — that's usually the wrong choice
85–90% of workflows belong inside a project or skill, not deployed code
Deploying an app triggers per-token API costs that subscriptions don't cover

How o3 uses tools inside reasoning to solve harder problems

Executive overview

The three scaling levers

Where o3 outperforms other models

Choosing the right model for the task

Expert outsourcing with AI projects

More like this — when you're ready for early access.

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.

Executive overview

The three scaling levers

Where o3 outperforms other models

Choosing the right model for the task

Expert outsourcing with AI projects

More like this — when you're ready for early access.

More in AI

Building $10,000 software MVPs with AI in under an hour

How to actually make money with AI: five brutal truths

How to choose the right home for your AI workflow

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.