Block AI training bots. Allow AI search bots. Generate a clean robots.txt in seconds — tell GPTBot, CCBot, and Google-Extended to stay out while keeping Perplexity and ChatGPT browsing in. No signup. Preview updates live.
Free tool. No signup. Copy or download instantly.
Why your robots.txt needs an AI update
10+
AI crawlers you can control with one file
2
categories: training bots vs. search bots
< 2 min
to generate, download and deploy
100 %
free — no account, no limits
What is robots.txt — and why do AI bots change everything?
robots.txt is a text file at the root of your domain (yourdomain.com/robots.txt) that tells crawlers which pages they may and may not access. Well-behaved bots respect it. AI has created two distinct crawler categories: training bots (scrape your content to build AI models) and search bots (index your content so AI can cite you in live answers). This generator helps you manage both — block one, allow the other.
What robots.txt does
• Blocks AI training bots from using your content to build commercial models.
• Allows AI search bots to index your site so you appear in AI-generated answers.
• Restricts private areas (admin, login, checkout) from all crawlers.
What robots.txt does not do
• It does not block bad-faith scrapers — malicious actors ignore it.
• It does not guarantee AI citations or prevent use of already-scraped content.
• It does not replace llms.txt — use both for full GEO control.
Basics
Enter your domain — it will appear as a reference comment in the generated file.
Policy preset
Choose a preset or configure each bot manually below.
AI Training Bots
These bots scrape your content to build AI training datasets. Block them to protect your content from commercial model training.
GPTBotOpenAI
Scrapes content to train ChatGPT and OpenAI models.
Google-ExtendedGoogle
Trains Gemini and other Google AI models. Different from Googlebot.
CCBotCommon Crawl
Builds open datasets used to train many LLMs including GPT.
anthropic-aiAnthropic
Used to train Claude AI models.
BytespiderByteDance
ByteDance crawler for AI training data collection.
DiffbotDiffbot
AI-powered data extraction and training dataset builder.
AI Search & Answer Bots
These bots index your content so AI engines can cite you in live answers. Allow them to improve your GEO visibility.
PerplexityBotPerplexity
Powers AI-generated answers in Perplexity search.
ChatGPT-UserOpenAI
ChatGPT browsing mode — real-time web access for answers.
OAI-SearchBotOpenAI
OpenAI search indexing for AI-powered answer engines.
Meta-ExternalAgentMeta
Meta AI search, discovery, and answer generation.
AmazonbotAmazon
Powers Amazon AI features, Alexa, and product answers.
YouBotYou.com
You.com AI search engine crawler and indexer.
Classic Search Bots
Traditional search engine crawlers. Keeping them allowed is essential for SEO — only block with good reason.
GooglebotGoogle
Google Search crawler — essential for SEO rankings.
BingbotMicrosoft
Bing Search and Microsoft AI search indexer.
ApplebotApple
Powers Apple Search, Siri suggestions, and Spotlight.
DuckDuckBotDuckDuckGo
DuckDuckGo privacy-focused search indexing.
Blocked Paths
Paths blocked from all crawlers via User-agent: *. Use for private, admin, and non-indexable areas.
Quick add:
Options
Live preview
# Enter your website URL to start generating robots.txt
robots.txt
Enforcement
Controls crawler access. Blocks or allows specific bots and paths. Enforced by well-behaved crawlers.
vs
llms.txt
Advisory
Guides AI prioritization. Suggests which pages are most authoritative. Advisory — no enforcement mechanism.
robots.txt is enforcement: it controls crawler access. llms.txt is advisory: it guides AI prioritization. Use robots.txt to block training bots and restrict private pages. Use llms.txt to tell AI search bots which pages represent you best. For full GEO control you need both.
1Choose a policy preset or configure bots manually. Add any private paths to block.
2Copy or download the generated robots.txt file.
3Place the file at the root of your domain — https://yourdomain.com/robots.txt. On most static hosts (Astro, Next.js, Netlify, Cloudflare Pages) it goes in the public/ folder.
Where to place the file
The file must be reachable at https://yourdomain.com/robots.txt — never inside a subfolder. On Astro, Next.js, Netlify, and Cloudflare Pages it goes into the public/ directory. On WordPress, place it in the WordPress installation root.
Want AI engines to actually recommend you?
robots.txt is one piece of the puzzle. EchoDestiny monitors how ChatGPT, Perplexity, Gemini and Claude talk about your brand — and turns the findings into prioritised actions.
Frequently asked questions about robots.txt and AI crawlers
Is this robots.txt generator really free?
Yes — completely free. No signup, no email, no account required. Everything is generated inside your browser. Nothing you type is transmitted to any server.
What is the difference between AI training bots and AI search bots?
AI training bots (GPTBot, Google-Extended, CCBot, anthropic-ai) scrape your content to build training datasets — they make the AI model smarter, but do not help you get cited in live answers. AI search bots (PerplexityBot, ChatGPT-User, OAI-SearchBot) index your content so AI engines can reference you in real-time answers. You generally want to block training bots and allow search bots.
Does blocking GPTBot prevent ChatGPT from citing my site?
No — and this is an important distinction. GPTBot is the training crawler. Blocking it prevents your content from entering OpenAI's training datasets. ChatGPT-User and OAI-SearchBot are separate crawlers used for live browsing and search indexing. Blocking GPTBot does not block ChatGPT from citing you in real-time answers.
What does Google-Extended do? Is it the same as Googlebot?
No — they are separate user agents. Googlebot crawls content for Google Search rankings. Google-Extended is a distinct crawler used to train Gemini and other Google AI products. Blocking Google-Extended has no effect on your Google Search rankings.
What is the difference between robots.txt and llms.txt?
robots.txt is enforcement — it tells crawlers what they may and may not access. llms.txt is advisory — it suggests which pages are most authoritative for AI context. robots.txt controls access; llms.txt guides prioritization. You need both for full GEO visibility management.
Will blocking AI training bots affect my Google rankings?
No. Blocking AI training bots (GPTBot, Google-Extended, CCBot, anthropic-ai) does not affect Google Search rankings. These use different user agents from Googlebot, which handles Search. Classic SEO is entirely unaffected.
Should I block all AI bots to protect my content?
Blocking all AI bots protects your content from training datasets, but also prevents AI search engines (Perplexity, ChatGPT browsing, Meta AI) from indexing and citing you. If GEO visibility matters to your business, block only training bots and keep search bots allowed. The "Maximum GEO" preset does exactly this.
What paths should I add to the Blocked Paths section?
Add paths for content that should not be indexed by any crawler: /admin/, /login/, /checkout/, /api/ (if it exposes sensitive data), /wp-admin/ for WordPress sites, and any staging or private areas. Never block your main content pages — that will hurt both SEO and GEO visibility.
Where do I place the robots.txt file?
At https://yourdomain.com/robots.txt — always at the domain root, never in a subfolder. On Astro, Next.js, Netlify, and Cloudflare Pages, place it in the public/ folder. On WordPress, place it in the installation root directory.
What is Crawl-delay and should I use it?
Crawl-delay tells bots how many seconds to wait between requests. Useful if crawlers are causing server load. It is not honoured by Googlebot (Google has its own crawl rate settings), but it is respected by Bingbot, Yandex, and several other crawlers.