You have to start a new batch of code for each engine, but if you want to list multiply disallow files you can one under another. For example –
User-Agent: Slurp (Inktomi's spider)
Disallow: xyz-gg.html
Disallow: xyz-al.html
Disallow: xxyyzz-gg.html
Disallow: xxyyzz-al.html
The above code disallows Inktomi to spider two pages optimized for Google (gg) and two pages optimized for AltaVista (al). If Inktomi were allowed to spider these pages as well as the pages specifically made for Inktomi, you may run the risk of being banned or penalized. Hence, it's always a good idea to use a robots.txt file.
The robots.txt file resides on your webspace, but where on your webspace? The root directory! If you upload your file to sub-directories it will not work. If you wanted to disallow all engines from indexing a file, you simply use the * character where the engines name would usually be. However beware that the * character won't work on the Disallow line.
Here are the names of a few of the big engines:
Excite - ArchitextSpider
AltaVista - Scooter
Lycos - Lycos_Spider_(T-Rex)
Google - Googlebot
Alltheweb - FAST-WebCrawler
Be sure to check over the file before uploading it, as you may have made a simple mistake, which could mean your pages are indexed by engines you don't want to index them, or even worse none of your pages might be indexed.
Another advantage of the Robots.txt file is that by examining it, you can get information on what spiders, or agents have accessed your web pages. This will give you a list of all the host names as well as agent names of the spiders. Moreover, information of very small search engines also gets recorded in the text file. Thus, you know what Search Engines are likely to list your website.
5.10 The Right Text Usage
5.10.1 Full Body Text
Most Search Engines scan and index all of the text in a web page. However, some Search Engines ignore certain text known as Stop Words, which is explained below. Apart from this, almost all Search Engines ignore spam.
5.10.2 Stop Words
Stop words are common words that are ignored by search engines at the time of searching a key phrase. This is done in order to save space on their server, and also to accelerate the search process.
|