What is a robots.txt file?

A robots.txt file is a text file placed at the root of a website that tells search engine crawlers which pages or sections of the site should or shouldn't be crawled. It follows the Robots Exclusion Protocol standard.

Where should I place my robots.txt file?

The robots.txt file must be placed in the root directory of your website, accessible at https://yourdomain.com/robots.txt. It must be at the top-level directory — placing it in a subdirectory will have no effect.

Can robots.txt block all crawlers?

Yes. Using 'User-agent: *' with 'Disallow: /' will instruct all compliant crawlers to avoid your entire site. However, robots.txt is advisory — malicious bots may ignore it.

Should I include a sitemap in robots.txt?

Yes, it's a best practice to include a Sitemap directive in your robots.txt file. This helps search engines discover and index your content more efficiently.

Yes. The robots.txt file is generated entirely in your browser. Nothing is sent to any server.

robots.txt Generator

What is robots.txt?

A robots.txt file is a plain text file that lives at the root of your website and instructs web crawlers which URLs they can and can’t access. It follows the Robots Exclusion Protocol (REP), a de facto standard created in 1994 that virtually all legitimate search engine bots respect.

When a search engine crawler like Googlebot arrives at your website, the first file it looks for is robots.txt. Based on the directives it finds, the crawler decides which pages to index and which to skip. This gives webmasters granular control over how their site is crawled and indexed.

How robots.txt Works

The file consists of one or more rule sets, each targeting a specific user-agent (crawler). Each rule set contains Allow and Disallow directives that specify URL paths the crawler may or may not visit.

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/

Sitemap: https://example.com/sitemap.xml

The * wildcard matches all crawlers. You can also write rules for specific bots like Googlebot, Bingbot, or Twitterbot.

How to Use This Tool

Select a user-agent from the dropdown or type a custom one
Add paths you want to allow or disallow
Enter your sitemap URL (optional but recommended)
Click “Generate” to build your robots.txt
Copy the result and upload it to your website root

Common robots.txt Directives

User-agent: Specifies which crawler the rules apply to
Disallow: Tells crawlers not to access a specific path
Allow: Explicitly permits access to a path (useful inside a broader Disallow)
Sitemap: Points crawlers to your XML sitemap
Crawl-delay: Requests a delay (in seconds) between successive requests (not supported by Google)

Best Practices

Always place robots.txt at your domain root (/robots.txt)
Include a Sitemap directive to help crawlers find all your pages
Don’t use robots.txt to hide sensitive content — use authentication or noindex meta tags instead
Test your robots.txt using Google Search Console’s robots.txt Tester
Keep rules simple and well-organized by user-agent
Remember that robots.txt is publicly accessible — anyone can read it

Common Mistakes

Blocking CSS/JS files: Search engines need access to render your pages. Avoid blocking assets referenced by public pages.
Using robots.txt for security: The file is advisory and public. Never rely on it to protect sensitive data.
Forgetting trailing slashes: /admin matches URLs starting with “admin” (including “administration”). Use /admin/ for the directory only.
Conflicting rules: If you have both Allow and Disallow for overlapping paths, the more specific rule wins in most crawlers.

robots.txt vs. Meta Robots Tag

While robots.txt controls crawling (whether a bot visits a URL), the <meta name="robots"> tag controls indexing (whether a page appears in search results). For complete control, use both: robots.txt to manage crawl budget and meta tags to manage indexing.

robots.txt Generator

You might also need

.gitignore Generator

Lorem Ipsum Generator

JSON Formatter & Validator