robots.txt Generator
Generate a robots.txt file with visual controls for search engines and AI crawlers
What Is a robots.txt File?
A robots.txt file is a plain text file placed at the root of your website (e.g., https://example.com/robots.txt) that tells web crawlers which pages or sections they're allowed or disallowed from accessing. It's part of the Robots Exclusion Protocol, a standard that virtually all major search engines and crawlers respect.
In 2024 and beyond, robots.txt has become even more important because of AI crawlers. Companies like OpenAI (GPTBot), Anthropic (ClaudeBot), Google (Google-Extended), and ByteDance (Bytespider) all send web crawlers to scrape content for training large language models. A well-crafted robots.txt lets you allow search engine indexing while blocking AI training crawlers — giving you control over how your content is used.
This robots.txt generator provides visual controls for both traditional search engine bots and modern AI crawlers. Toggle individual bots on or off, set custom disallow paths, configure sitemaps and crawl delays, and copy the generated file — all from a single interface. Everything runs in your browser with no data sent to any server.
How to Generate a robots.txt File
- Choose a quick action — Start with "Allow All" (permissive), "Block All AI Crawlers" (blocks AI training bots while keeping search engines), or "Block All Crawlers" (blocks everything).
- Toggle individual bots — Fine-tune which crawlers to block. Each bot is labeled with its company and purpose (e.g., GPTBot = OpenAI training data, Googlebot = Google Search).
- Set disallow paths — Enter paths like
/admin/,/private/, or/api/that should be off-limits to crawlers. - Add your sitemap URL — Enter your sitemap URL (e.g.,
https://example.com/sitemap.xml) to help search engines discover your pages. - Copy and deploy — Copy the generated robots.txt content and save it as
robots.txtin your website's root directory.
Key Features
- AI crawler controls — Block or allow specific AI training bots including GPTBot, ChatGPT-User, ClaudeBot, Google-Extended, Amazonbot, Bytespider, and CCBot with individual toggles.
- One-click presets — Quickly set common configurations: allow all, block all AI, block all crawlers, or block specific paths.
- Search engine bot support — Manage access for Googlebot, Bingbot, Applebot, and FacebookBot alongside AI crawlers.
- Sitemap and crawl-delay — Add your sitemap URL and set crawl delay to manage how aggressively bots crawl your site.
- AI crawler reference table — Built-in reference showing each AI bot's company and purpose, so you know exactly what you're blocking.
- 100% client-side — Your configuration never leaves your browser. No server, no tracking.
Common Use Cases
- Blocking AI training crawlers — Prevent OpenAI, Anthropic, Google, and other AI companies from scraping your content for model training while keeping search engine access.
- Protecting sensitive directories — Block crawlers from accessing admin panels, API endpoints, staging environments, and other private paths.
- SEO management — Control which sections of your site search engines can index, preventing duplicate content issues or indexing of low-value pages.
- New site launches — Generate a restrictive robots.txt during development, then switch to a permissive one when ready to go live.
- Compliance and content protection — Meet organizational policies around content licensing and data protection by controlling crawler access.
Frequently Asked Questions
🔒 This tool runs entirely in your browser. No data is sent to any server.