Robots.txt file

Guide the spiders through your site

Last Update:

The robots.txt file is uploaded to your website’s root folder. This file guides searchengine spiders by allowing or disallowing crawling of specific files and folders. It´s a URL blocking method and should handled with care.

Example:

User-agent: Googlebot 
Disallow: /folder1/ 
Allow: /folder1/myfile.html
Sitemap: http://www.yoursite.com/sitemap.xml

The user-agent can be a wildcard * so all spiders/bots are affected. User-agent: *

In the example above we disallow the indexing of ‘folder1’ except for one file in that particular folder: ‘myfile.html’

A good robots.txt for a site running on WordPress would be this:

User-agent: *
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /category/*/*
Disallow: */trackback

The WordPress core files are protected from indexing and also category and trackback pages won’t be listed. A lot more can be added to the robots.txt file but this pretty much describes the most important facts. Note: Do not block your /feed/ url as this can be used as a sitemap

You can also try the robots txt file generator

Robots.txt Checklist

  • Add your sitemap url to the robots.txt file: Sitemap: http://www.yoursite.com/sitemap.xml
  • If you’re using WordPress disallow the core folders
  • Is it named properly (case sensitive!) and placed in your root folder?
  • Disallow 301\302 redirections and cloaked urls (i.e. yoursite.com/outgoing/affiliate-offer)  >> Disallow: /outgoing/*
  • If you are using subdomains each subdomain needs its own robots file
  • One rule per line

The file is ready – what’s next?

  • Once you uploaded the file to your website’s root folder you can test it with Google’s robots testing tool

Robots.txt vs. Meta Robots

It`s recommended to exclude specific pages via <meta name=”robots” content=”noindex”> instead of blocking the file with robots.txt. If the url in question gets backlinks from other pages the link juice is lost because robots.txt blocks the spiders. The meta tag still follows links and rewards your page.

If you want to exclude complete folders i.e. /tmp/ /private/ or similar it makes sense to add them to robots.txt