SEO

robots.txt

Short definition

robots.txt is a text file at the root of a website that tells search-engine and AI crawlers which URLs they can or cannot fetch.

In depth

robots.txt is the oldest standard in crawler control. It is voluntary — well-behaved bots obey it, malicious ones ignore it — but every major engine respects it. For GEO it is critical because AI engines run separate crawlers (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Google-Extended, Applebot-Extended, etc.). Blocking these crawlers means the site cannot be cited; allowing them with sitemap references invites indexing. The file lives at /robots.txt and is fetched on first visit.

Example

A site's robots.txt explicitly allows GPTBot, ClaudeBot, and PerplexityBot and points to /sitemap.xml. Within months it appears in answers across all three engines.

Related glossary

Keep going from here.

Glossary

llms.txt

llms.txt is a plain-text file at the root of a website that gives large language models a curated, machine-readable map of the site's most important content.

Read →

Glossary

Sitemap (XML Sitemap)

A sitemap is a machine-readable list of every important URL on a website, submitted to search engines to ensure complete and up-to-date indexing.

Read →

Glossary

AI Crawlers

AI crawlers are the bots run by AI companies (OpenAI's GPTBot, Anthropic's ClaudeBot, Perplexity's PerplexityBot, Google-Extended) that fetch web content for training and live retrieval.

Read →

Glossary

GEO (Generative Engine Optimization)

GEO (Generative Engine Optimization) is the practice of structuring a website so that AI engines like ChatGPT, Perplexity, Claude, and Google AI Overviews cite it as a source.

Read →

robots.txt

In depth

Example

Related terms

Keep going from here.