Check and validate your robots.txt instantly
Paste your robots.txt content or enter a URL to detect syntax errors, missing directives, and crawl-blocking issues before they hurt your rankings.
Why your robots.txt file matters
Your robots.txt file is the first thing search engine crawlers read when they visit your site. A misconfigured file can block critical pages from indexing, waste crawl budget on irrelevant URLs, or accidentally hide your entire site from Google.
of websites have robots.txt files that block critical resources like CSS, JS, or images
faster crawling and indexing with a properly optimized robots.txt file
of crawl budget can be wasted when robots.txt allows crawling of blocked-then-unblocked pages
Understanding robots.txt directives
User-agent — Target specific crawlers
The User-agent directive specifies which crawler the rules apply to. Use '*' for all bots, or target specific ones like Googlebot, Bingbot, or GPTBot. Each block of rules must start with a User-agent line.
Disallow — Block paths from crawling
Disallow tells crawlers which URL paths they should not access. 'Disallow: /admin/' blocks the admin directory. 'Disallow: /' blocks the entire site. An empty Disallow means everything is allowed.
Allow — Override Disallow rules
Allow lets you permit access to specific paths within a disallowed directory. For example, 'Disallow: /private/' with 'Allow: /private/public-page' grants access to that one page.
Sitemap — Point crawlers to your sitemap
The Sitemap directive tells search engines where to find your XML sitemap. This helps them discover all your pages efficiently. Always use the full absolute URL (https://example.com/sitemap.xml).
Crawl-delay — Control crawl speed
Crawl-delay sets the number of seconds a crawler should wait between requests. Useful for servers with limited resources, but high values slow down indexing. Google ignores this directive — use Search Console instead.
Common mistakes to avoid
The most common errors are: blocking CSS/JS/images (prevents rendering), missing User-agent: * (no default rules), using relative sitemap URLs, blocking the entire site accidentally, and having duplicate conflicting rules.
Stop worrying about technical SEO — automate it
UnlimitedVisitors handles robots.txt, sitemaps, schema markup, and content optimization automatically. Focus on growing your business.
Frequently asked questions
What is a robots.txt file?
A robots.txt file is a plain text file placed at the root of your website (e.g. example.com/robots.txt) that tells search engine crawlers which pages or sections they can and cannot access. It follows the Robots Exclusion Protocol and is the first file crawlers check before indexing your site.
Does robots.txt block pages from appearing in Google?
Not exactly. Robots.txt prevents crawling, not indexing. If other sites link to a blocked page, Google may still index the URL (showing it without a snippet). To truly prevent indexing, use a 'noindex' meta tag or X-Robots-Tag HTTP header instead.
What happens if I don't have a robots.txt file?
If no robots.txt file exists, search engines assume they can crawl every page on your site. This is fine for most small sites, but larger sites benefit from robots.txt to manage crawl budget and prevent indexing of duplicate, admin, or staging pages.
Should I block AI crawlers like GPTBot in robots.txt?
It depends on your goals. Blocking GPTBot (OpenAI), Google-Extended (Gemini), or CCBot (Common Crawl) prevents your content from being used in AI training. However, blocking these crawlers may also reduce your visibility in AI-powered search results. Consider the trade-offs carefully.
What are the most common robots.txt mistakes?
The most common mistakes are: blocking CSS, JS, or image files (which prevents Google from rendering your pages), using 'Disallow: /' accidentally (blocking the entire site), missing a Sitemap directive, using relative URLs for sitemaps, and having conflicting Allow/Disallow rules that confuse crawlers.
How often should I review my robots.txt?
Review your robots.txt whenever you restructure your site, add new sections, change CMS platforms, or modify your URL structure. At minimum, audit it quarterly. A misconfigured robots.txt can silently deindex pages for weeks before you notice the traffic drop.