Does robots.txt actually prevent AI companies from scraping my content?

Major AI companies like OpenAI, Anthropic, and Google have publicly stated they respect robots.txt directives for their training crawlers (GPTBot, ClaudeBot, Google-Extended). However, robots.txt is a voluntary standard — it's not a technical barrier. Some less scrupulous crawlers may ignore it. For stronger protection, consider combining robots.txt with server-side access controls.

What's the difference between GPTBot and ChatGPT-User?

GPTBot is OpenAI's crawler used to gather training data for future GPT models. ChatGPT-User is the crawler used when a ChatGPT user asks the model to browse a specific URL in real time. You might want to block GPTBot (training) while allowing ChatGPT-User (browsing), or block both — it depends on your preference.

Where do I put the robots.txt file on my website?

The robots.txt file must be placed at the root of your domain, accessible at https://yourdomain.com/robots.txt. In most web frameworks, this means placing it in your public or static directory. For Next.js, put it in the /public folder. For Apache, place it in your document root.

Will blocking crawlers hurt my SEO?

Blocking search engine crawlers (Googlebot, Bingbot) will absolutely hurt your SEO — those pages won't be indexed. But blocking AI training crawlers (GPTBot, ClaudeBot, etc.) has no effect on search engine rankings. This tool makes it easy to block AI bots while keeping search engines fully allowed.

What is the Crawl-delay directive?

Crawl-delay tells crawlers to wait a specified number of seconds between requests. For example, 'Crawl-delay: 10' asks bots to wait 10 seconds between fetches. This helps reduce server load from aggressive crawlers. Note: Googlebot doesn't officially support Crawl-delay — use Google Search Console's crawl rate settings instead.

Can I block specific pages instead of entire directories?

Yes. You can disallow specific URLs like 'Disallow: /secret-page.html' or use wildcards like 'Disallow: /*.pdf$' to block all PDF files. The generator lets you enter custom paths — just put one path per line, and they'll be added as Disallow rules.

robots.txt Generator — Block AI Crawlers & Control Bot Access

Bot	Company	Purpose
GPTBot	OpenAI	Training data for GPT models
ChatGPT-User	OpenAI	Real-time browsing in ChatGPT
ClaudeBot	Anthropic	Training data for Claude models
Google-Extended	Google	AI training (Gemini/Bard)
Amazonbot	Amazon	Alexa answers & AI training
Bytespider	ByteDance	TikTok & AI training
CCBot	Common Crawl	Open dataset used by many AI labs

What Is a robots.txt File?

A robots.txt file is a plain text file placed at the root of your website (e.g., https://example.com/robots.txt) that tells web crawlers which pages or sections they're allowed or disallowed from accessing. It's part of the Robots Exclusion Protocol, a standard that virtually all major search engines and crawlers respect.

In 2024 and beyond, robots.txt has become even more important because of AI crawlers. Companies like OpenAI (GPTBot), Anthropic (ClaudeBot), Google (Google-Extended), and ByteDance (Bytespider) all send web crawlers to scrape content for training large language models. A well-crafted robots.txt lets you allow search engine indexing while blocking AI training crawlers — giving you control over how your content is used.

This robots.txt generator provides visual controls for both traditional search engine bots and modern AI crawlers. Toggle individual bots on or off, set custom disallow paths, configure sitemaps and crawl delays, and copy the generated file — all from a single interface. Everything runs in your browser with no data sent to any server.

How to Generate a robots.txt File

Choose a quick action — Start with "Allow All" (permissive), "Block All AI Crawlers" (blocks AI training bots while keeping search engines), or "Block All Crawlers" (blocks everything).
Toggle individual bots — Fine-tune which crawlers to block. Each bot is labeled with its company and purpose (e.g., GPTBot = OpenAI training data, Googlebot = Google Search).
Set disallow paths — Enter paths like /admin/, /private/, or /api/ that should be off-limits to crawlers.
Add your sitemap URL — Enter your sitemap URL (e.g., https://example.com/sitemap.xml) to help search engines discover your pages.
Copy and deploy — Copy the generated robots.txt content and save it as robots.txt in your website's root directory.

Key Features

AI crawler controls — Block or allow specific AI training bots including GPTBot, ChatGPT-User, ClaudeBot, Google-Extended, Amazonbot, Bytespider, and CCBot with individual toggles.
One-click presets — Quickly set common configurations: allow all, block all AI, block all crawlers, or block specific paths.
Search engine bot support — Manage access for Googlebot, Bingbot, Applebot, and FacebookBot alongside AI crawlers.
Sitemap and crawl-delay — Add your sitemap URL and set crawl delay to manage how aggressively bots crawl your site.
AI crawler reference table — Built-in reference showing each AI bot's company and purpose, so you know exactly what you're blocking.
100% client-side — Your configuration never leaves your browser. No server, no tracking.

Common Use Cases

Blocking AI training crawlers — Prevent OpenAI, Anthropic, Google, and other AI companies from scraping your content for model training while keeping search engine access.
Protecting sensitive directories — Block crawlers from accessing admin panels, API endpoints, staging environments, and other private paths.
SEO management — Control which sections of your site search engines can index, preventing duplicate content issues or indexing of low-value pages.
New site launches — Generate a restrictive robots.txt during development, then switch to a permissive one when ready to go live.
Compliance and content protection — Meet organizational policies around content licensing and data protection by controlling crawler access.

robots.txt Generator

Quick Actions

Block Individual Bots

Settings

Generated robots.txt

AI Crawler Reference

What Is a robots.txt File?

How to Generate a robots.txt File

Key Features

Common Use Cases

Frequently Asked Questions

Related Tools