How to evaluate AI products before spending money

Executive overview

Most businesses waste money on AI tools they could have ruled out in a week. The problem is evaluating AI by gut feel or vendor-provided metrics rather than custom, task-specific criteria.

A three-step framework fixes this: define what good looks like, build binary evaluations for your use case, then test during the free trial before committing.

You don't need AI to be perfect — you need a meaningful improvement over your current baseline.

The three-step evaluation framework

Define your primary metric — the specific ROI you expect (time saved, revenue generated, volume handled).
Set secondary sub-goals that contribute to the primary metric.
Create binary evals — yes/no questions that measure whether the AI achieved a specific outcome.
Avoid off-the-shelf vendor evals; they are generic and rarely match your actual needs.
Build your own evals — the process sharpens your understanding of what matters.
Run evaluations during the free trial period to avoid spending money on tools that don't fit.

Why binary evaluations matter

Spectrum-based evals (1–10 scores) generate subjective debate and slow decisions.
Binary evals are unambiguous: the AI either achieved the outcome or it didn't.
Every use case can support binary evals if you define the criteria precisely enough.

Example: contract review assistant

Primary goal — AI catches at least 95% of contract issues faster than lawyers.

Does the AI identify the riskiest clauses in 19 of 20 contracts? Yes/no.
Does it flag missing standard terms (termination, liability) consistently? Yes/no.

Secondary goal — reduce contract review time below 15 minutes (from 90 minutes).

Does a lawyer reviewing the AI summary finish in under 15 minutes? Yes/no.
Does the AI summary surface all key sections on the first page? Yes/no.

Example: customer service chatbot

Primary goal — reduce support ticket volume by 40%.

Does the AI resolve at least 4 of 10 issues without human intervention? Yes/no.
Does it answer at least 8 of 20 common questions autonomously? Yes/no.

Secondary goal — keep customer satisfaction above 80%.

Do 8 of 10 customers rate the AI conversation as good or better? Yes/no.
Do fewer than 2 of 10 customers immediately try to bypass the AI? Yes/no.

Example: sales email personaliser

Primary goal — double cold email reply rate from 2% to 4%.

Do 100 AI-personalised emails generate 4 or more replies? Yes/no.
Does every email include a personalisation element from the AI's research? Yes/no.

Secondary goal — ensure opening lines are unique per prospect.

Across 50 emails to similar personas, are all openers distinct? Yes/no.
Does each opener reference something specific from the prospect's LinkedIn or website? Yes/no.

Building $10,000 software MVPs with AI in under an hour

Brett Malinowski May 14, 2026

AI tools & automation 9

MVP & prototyping 8

Automation & tools 6

One person with Claude Code can replace a three-person agency team
Partner with niche creators who already have audience and distribution
Use pre-built components for payments and chat — don't build infrastructure from scratch

AI strategy & adoption

YouTube

How to actually make money with AI: five brutal truths

Dan Martell May 14, 2026

AI strategy & adoption 9

Business models 8

Automation & tools 5

AI is a hammer — you still need to find the nail
Validate with manual "Wizard of Oz" delivery before automating anything
Future orgs are workflow-based; humans own outcomes, agents own tasks

AI strategy & adoption

YouTube

How to choose the right home for your AI workflow

Dylan Davis May 13, 2026

AI strategy & adoption 9

Automation & tools 6

AI defaults to building apps — that's usually the wrong choice
85–90% of workflows belong inside a project or skill, not deployed code
Deploying an app triggers per-token API costs that subscriptions don't cover

How to evaluate AI products before spending money

Executive overview

The three-step evaluation framework

Why binary evaluations matter

Example: contract review assistant

Example: customer service chatbot

Example: sales email personaliser

More like this — when you're ready for early access.

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.

Executive overview

The three-step evaluation framework

Why binary evaluations matter

Example: contract review assistant

Example: customer service chatbot

Example: sales email personaliser

More like this — when you're ready for early access.

More in AI

Building $10,000 software MVPs with AI in under an hour

How to actually make money with AI: five brutal truths

How to choose the right home for your AI workflow

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.