Question 1

What does a robots.txt file do?

Accepted Answer

A robots.txt file tells crawlers which parts of your site they can and can't request. It sits at the exact root of your domain at /robots.txt and is the first thing polite bots read before crawling anything else on the site. The syntax is small. You write one or more User-agent blocks, each followed by Allow and Disallow lines. A Sitemap line near the top points bots at your XML index so they discover every page instead of guessing from internal links. In 2026, the file does more than manage Googlebot. It also controls AI training crawlers like GPTBot (OpenAI), ClaudeBot (Anthropic), CCBot (Common Crawl), PerplexityBot, and Google-Extended, each with a separate User-agent name. Our robots.txt generator writes a production file from a CMS preset and checkbox toggles. Verify it with our checker against a real path before you ship the change to production.

Question 2

How do I generate a robots.txt file?

Accepted Answer

Pick your CMS preset : WordPress, Shopify, Next.js, Astro, Strapi, or Custom for an empty starter. The preset drops in sensible defaults for that platform, such as blocking /wp-admin on WordPress or /checkout on Shopify so crawl budget doesn't get burned. Paste your Sitemap URL so crawlers discover pages fast. Use the AI-crawler checkboxes to block GPTBot, ClaudeBot, CCBot, PerplexityBot, anthropic-ai, and Google-Extended individually or all at once. Add any extra paths you want disallowed, one per line. Hit generate, copy the output, save it as robots.txt in your public root. On WordPress, paste it into your SEO plugin's robots.txt editor. On Next.js or Astro, drop it in /public. Once live, paste your URL into our robots.txt checker and test a few paths per bot to confirm the rules resolve the way you expect. If something is off, adjust the inputs and regenerate rather than hand-editing the output.

Question 3

What should I put in my robots.txt file?

Accepted Answer

Four things at minimum. First, a Sitemap line pointing to your XML index ( https://example.com/sitemap.xml ). This is the single fastest discovery boost for new pages and orphan URLs not linked from the homepage. Second, a User-agent: * block with any Disallow rules that apply to every bot: admin paths, internal search results, cart and checkout on ecommerce. Third, per-bot rules if you want different treatment for AI crawlers versus search engines. Fourth, nothing else. Most broken robots.txt files are bloated with stale rules copied from a ten-year-old tutorial. Keep it short, keep paths case-sensitive, and list one sitemap per domain. Our generator scaffolds a clean version for your CMS with the 2026 AI-crawler toggles included by default. Once it's live, test a handful of real URLs per bot with our robots.txt checker to confirm every rule resolves correctly and nothing important is accidentally blocked.

Question 4

How does a robots.txt file work?

Accepted Answer

When a crawler visits your site, it first requests /robots.txt. If the file returns a 200, the bot parses the rules, finds the User-agent block that matches its own name, and follows the Allow and Disallow lines in that block. If no named block matches, it uses the wildcard block (User-agent: *). The longer, more specific path match wins on overlap, so Allow: /blog/ beats Disallow: /bl when both are present. The bot caches the file for roughly 24 hours before re-fetching, so new rules don't apply instantly across the web. Robots.txt is advisory, not enforced. Compliant bots (Googlebot, Bingbot, GPTBot, ClaudeBot) respect it as policy. Rogue scrapers ignore it entirely and go straight for the content. See exactly which rule wins for any bot on any path with our robots.txt checker , and audit the rendered result with our crawler simulator after changes go live.

Question 5

How do I block AI crawlers in robots.txt?

Accepted Answer

Add a User-agent block for each bot you want to block, with Disallow: / underneath to block the whole site. The 2026 names worth including explicitly: GPTBot (OpenAI), ClaudeBot and anthropic-ai (Anthropic), CCBot (Common Crawl, trains many open models), PerplexityBot (Perplexity's answer engine), and Google-Extended (blocks training usage of Googlebot-crawled pages without hurting your Google search rankings or feature eligibility). Our robots.txt generator has one checkbox per crawler so you can decide per bot rather than blocking or allowing everything at once. Publishers usually block all of them to protect paid archives. Marketers often leave them allowed to get cited in ChatGPT and Claude answers where citations drive referral traffic. After you generate and deploy, test each blocked bot with our robots.txt checker on a specific path to confirm the Disallow resolves correctly for that User-agent name and block order.

Question 6

Do I need a robots.txt file?

Accepted Answer

Not strictly. If you don't have one, crawlers assume everything is allowed and crawl your site by following links. For a small static site with nothing to hide and no crawl-budget concerns, that's fine and you can skip it entirely. You'll probably still want one for three reasons. One, it's the standard place to declare your sitemap, which speeds up discovery of new pages across Google, Bing, and AI crawlers. Two, it gives you a switch for AI training crawlers (block GPTBot and CCBot, for instance) that you don't get by default without a file. Three, it lets you block high-traffic wasteful paths like internal search results or faceted filter URLs so Googlebot spends its budget on pages that matter. Generate a baseline with our robots.txt generator and the right CMS preset. Verify the rules resolve correctly per bot with our robots.txt checker before you move on.

Question 7

What's the difference between Allow and Disallow?

Accepted Answer

Disallow blocks a path for the named User-agent. Allow explicitly permits one. You rarely need Allow because anything not Disallowed is allowed by default. The one time Allow matters is when you want to block a folder but keep one file inside it crawlable. For example, Disallow: /wp-admin/ with Allow: /wp-admin/admin-ajax.php blocks the admin interface but lets Googlebot reach the AJAX endpoint WordPress needs for core functions. The longer, more specific match wins. Allow: /blog/ beats Disallow: /bl, and Disallow: /blog/drafts/ beats Allow: /blog/ when both apply to the same bot. Rule order inside a single User-agent block doesn't matter to modern bots; specificity does. Test the outcome with our robots.txt checker by pasting the exact path you care about and seeing which rule the bot lands on. For clean starter rules matched to your CMS, use our generator and the right preset.

Question 8

Where do I put the robots.txt file?

Accepted Answer

At the exact root of your domain. The URL must resolve at https://yourdomain.com/robots.txt with no subdirectory in front of it at all. Subdirectory paths like /blog/robots.txt are ignored entirely by every crawler, no matter how many rules the file contains. Subdomains count as separate domains, so blog.example.com needs its own robots.txt at its own root; the one on www.example.com does not apply. Platform-specific placement: WordPress users paste the content into their SEO plugin's robots.txt editor (Yoast, Rank Math, or All in One SEO) and the plugin serves it virtually. Shopify generates one automatically and locks most of it; you customize via robots.txt.liquid. Next.js and Astro projects drop robots.txt into /public. Static sites put it in the web root directly. Generate the file itself with our robots.txt generator , then confirm it's live and parses correctly for each named bot with our robots.txt checker .

Question 9

Should robots.txt include the sitemap URL?

Accepted Answer

Yes. It's one line and it's the single most useful non-rule entry in the file. A Sitemap: https://example.com/sitemap.xml declaration tells every crawler where your XML index lives, which speeds up discovery of new pages and helps bots find orphan URLs that aren't linked from the homepage or main navigation. Google and Bing both read this line and queue every URL inside the sitemap for crawling on their next pass. The sitemap URL should be absolute (full https://), point to an index that returns a 200, and list only canonical URLs that match your preferred domain version. One sitemap declaration per domain; if you have multiple sitemaps, put them in a sitemap index file and declare that index once. Our robots.txt generator adds the Sitemap line automatically from the Sitemap URL field. Validate the sitemap itself (status codes, duplicates, lastmod freshness across every URL) with our sitemap checker .

Question 10

How do I test my robots.txt after generating it?

Accepted Answer

Three-step verification. First, confirm the file is live by loading https://yourdomain.com/robots.txt in a browser; you should see your rules as plain text, not your homepage or a 404 error. Second, paste the URL into our robots.txt checker and run one check per bot you named in the file (Googlebot, GPTBot, ClaudeBot, PerplexityBot, and so on) against a specific Test path that should be blocked. The checker tells you which rule matched and which User-agent block it came from so you can fix the source. Third, for rendered-page validation, run the same URL through our crawler simulator to see whether the page is actually fetched and indexed the way you intend. If Search Console flags any URL as blocked after deployment, Google's cache can lag by up to 24 hours before the new rules take effect across the index, so re-check after a day before filing a bug.

Robots Txt Generator

Generate the whole content, not just check it.

What a robots.txt generator actually does

How to use this robots.txt generator

Why AI crawler toggles matter in 2026

CMS presets and why they differ

Sitemap declaration and crawl-delay

Common mistakes

Advanced tips

Generate the whole content, not just check it.

Frequently Asked Questions

Related free tools