What is robots.txt?
A robots.txt file is a plain text file that lives at the root of your website and instructs web crawlers which URLs they can and can’t access. It follows the Robots Exclusion Protocol (REP), a de facto standard created in 1994 that virtually all legitimate search engine bots respect.
When a search engine crawler like Googlebot arrives at your website, the first file it looks for is robots.txt. Based on the directives it finds, the crawler decides which pages to index and which to skip. This gives webmasters granular control over how their site is crawled and indexed.
How robots.txt Works
The file consists of one or more rule sets, each targeting a specific user-agent (crawler). Each rule set contains Allow and Disallow directives that specify URL paths the crawler may or may not visit.
User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/
Sitemap: https://example.com/sitemap.xml
The * wildcard matches all crawlers. You can also write rules for specific bots like Googlebot, Bingbot, or Twitterbot.
How to Use This Tool
- Select a user-agent from the dropdown or type a custom one
- Add paths you want to allow or disallow
- Enter your sitemap URL (optional but recommended)
- Click “Generate” to build your robots.txt
- Copy the result and upload it to your website root
Common robots.txt Directives
- User-agent: Specifies which crawler the rules apply to
- Disallow: Tells crawlers not to access a specific path
- Allow: Explicitly permits access to a path (useful inside a broader Disallow)
- Sitemap: Points crawlers to your XML sitemap
- Crawl-delay: Requests a delay (in seconds) between successive requests (not supported by Google)
Best Practices
- Always place
robots.txtat your domain root (/robots.txt) - Include a
Sitemapdirective to help crawlers find all your pages - Don’t use
robots.txtto hide sensitive content — use authentication ornoindexmeta tags instead - Test your
robots.txtusing Google Search Console’s robots.txt Tester - Keep rules simple and well-organized by user-agent
- Remember that
robots.txtis publicly accessible — anyone can read it
Common Mistakes
- Blocking CSS/JS files: Search engines need access to render your pages. Avoid blocking assets referenced by public pages.
- Using robots.txt for security: The file is advisory and public. Never rely on it to protect sensitive data.
- Forgetting trailing slashes:
/adminmatches URLs starting with “admin” (including “administration”). Use/admin/for the directory only. - Conflicting rules: If you have both Allow and Disallow for overlapping paths, the more specific rule wins in most crawlers.
robots.txt vs. Meta Robots Tag
While robots.txt controls crawling (whether a bot visits a URL), the <meta name="robots"> tag controls indexing (whether a page appears in search results). For complete control, use both: robots.txt to manage crawl budget and meta tags to manage indexing.