A robots.txt file is a crucial element of all websites, and you should have one on yours. It is a file that tells search engines like Google how you would like them to handle your website. Specifically,it can tell them to avoid certain parts of your site such as "/plugins" or "/cgi-bin”.
There are a couple of things to remember:
· If you don't want Google to crawl parts of your website, you have to use the Disallow parameter. There is no need for an allow parameter, however, as the default position is that Google will crawl and index everything unless you disallow it.
· A robots.txt file is important for SEO, but it should not be used to replace other SEO tools. For example, if you don't want Google to index a certain page or follow the links on that page you should use the meta tags noindex and nofollow on the pages, rather than in the robots.txt file. If you just use the robots.txt file, Google will still index the page although it won't crawl the page.
Creating and Optimising a Robots.txt File
1. Open Notepad.
Your robots.txt file is a simple text file so you can create it using Notepad.
2. Decide on what to block.
Look at the directories of your website to determine which ones you want to block. This could be directories with temporary files, certain scripts, or files that include customer data.
The first line of your robots.txt file outlines which bots the instructions apply to. You can use an * to capture all bots, or you can set specific rules for different bots. Here are examples of both:
The first applies to all bots, while the second is just for Google.
Now you should list all the sections and pages of your website that you want the search engines to ignore. Here is an example:
That disallows everything in the cgi folder on the website's server.
You must use a separate line for each URL or directory, but you can use wildcards so that you don't have to write so many. For example:
This will disallow all pages on a website that end with asp.
Make sure that you don't inadvertently block Google and other search engines from parts of your website that you do want found in search.
Finally, it is a good idea to include your search pages, as Google often enters search phrases onto these pages and indexes the results. This can cause a number of problems, including duplicate content issues.
You should save your file as robots.txt. Note that it is case sensitive so don't use Robots.txt or any other variation.
Upload your robots.txt file to your website. You must put it in the root directory: YourWebsite.com/robots.txt.
One final thing to mention—don't list all your files in your robots.txt. It is much better for security if you just list directories.