Technical SEO for AI: robots.txt, crawlers, and llms.txt

Executive overview

Millions of sites are invisible to AI because they accidentally block AI crawlers in their robots.txt. Six technical checks determine whether AI can find, render, and cite your content. Fast-loading, clean HTML pages get crawled; slow or JavaScript-dependent ones get dropped.

Fixing crawler access is the highest-leverage technical move for AI visibility.

Robots.txt and AI crawlers

  • Check yourdomain.com/robots.txt for disallow rules on GPTBot, OAISearchBot, ClaudeBot, or Google Extended.
  • Blocks are often inherited from templates or added by platforms — Cloudflare enables AI-blocking by default.
  • Ahrefs Site Audit flags robots.txt rules that block AI crawlers.
  • llms.txt is a proposed standard to describe your site to AI systems, but no major LLM provider officially supports it yet — not a priority.

JavaScript rendering

  • ChatGPT's crawler cannot render JavaScript; Gemini and Copilot can.
  • Single-page apps built with React or Angular may serve an empty shell to ChatGPT.
  • Server-side rendering sends fully built HTML to crawlers, solving the problem.
  • Quick test: disable JavaScript in your browser — if content disappears, AI crawlers miss it too.

Page speed and HTML structure

  • AI systems fetch, parse, and chunk pages in real time; slow pages get dropped before scoring.
  • Core Web Vitals optimisation covers most of what AI retrieval needs.
  • Use a logical heading hierarchy: H1 for the title, H2 for main sections, H3 for subsections.
  • Each section should stand alone — AI may chunk content at any heading boundary.

Schema markup

  • Schema (structured data) has mixed evidence for AEO; no confirmed lift in AI citation rates.
  • Worth adding on new pages as a good habit; not worth prioritising specifically for AEO.

Hallucinated URLs and 404s

  • AI assistants send visitors to 404 pages 2.87× more often than Google search.
  • ChatGPT is the biggest offender: ~1% of clicked URLs return a 404.
  • Check analytics for AI referrer traffic hitting 404 status pages.
  • Set up redirects from consistent hallucinated URLs to the most relevant real page.

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.