# static-html-for-ai — full content > A reference site demonstrating how to optimize static HTML pages for AI > crawlers. The full content of the site's primary pages is concatenated > below for direct ingestion into an LLM context window. --- ## Page: How to Optimize Static HTML for AI Crawlers (2026 Guide) URL: https://ai.michaelmcgrory.org/ Author: Michael McGrory, Solutions Engineer (Partnerships) at Cloudflare. Last updated: 2026-05-02. ### Key takeaways - Most AI crawlers do not execute JavaScript. Static HTML or server-side rendering is the baseline requirement. - The pages most likely to be cited are original research, definitions, comparisons, and how-tos — in that order. - GPTBot raw requests grew +305% YoY May 2024 to May 2025; ChatGPT-User grew +2,825%; PerplexityBot grew +157,490% (Cloudflare, 2025). - Add `llms.txt`, a markdown mirror per page, JSON-LD, and Content Signals in `robots.txt`. - Recency matters: AI assistants shift cited publication dates forward by up to 4.78 years when reranking, per a 2025 study. ### Summary If you want a static HTML page to be crawled and cited by AI assistants like ChatGPT, Claude, Perplexity, and Google's AI Overviews, you need three things in place: 1. The page content has to live in the initial HTML response, not behind JavaScript. 2. The page has to be structured so that a language model can extract a complete answer to a specific question from a single chunk. 3. The site has to declare its preferences and signals to crawlers via `robots.txt`, `sitemap.xml`, and (increasingly) `llms.txt`. ### Crawler taxonomy Training crawlers (GPTBot, ClaudeBot, CCBot, Bytespider, Meta-ExternalAgent, Google-Extended) bulk-crawl for model corpora and honor `robots.txt`. Index/search crawlers (OAI-SearchBot, PerplexityBot, Claude-SearchBot, Googlebot) build retrievable indexes for RAG and honor `robots.txt`. User-triggered fetchers (ChatGPT-User, Claude-User, Perplexity-User, Meta-ExternalFetcher) fire on demand when a user asks a question and largely ignore `robots.txt` by design. ### Page types most likely to be cited 1. Original research and proprietary data 2. Definitional / glossary pages 3. Comparison pages 4. How-to and step-by-step guides 5. Pricing or cost pages with concrete numbers in plain text 6. FAQ and Q&A pages 7. Reference and API documentation 8. Programmatic pages with consistent schemas 9. Recent news and time-stamped analysis (especially YMYL) 10. Free tools and calculators ### Page-level signals - Server-side rendered HTML - One `

` phrased as the user's question - Direct answer in the lead paragraph plus a Key takeaways block - Short paragraphs (2–4 sentences) and lists - Stats and quotes in plain text with units, dates, and inline source - Visible Last updated date and `