Not wanting to confuse you too much with techie stuff, I want to share an important tip which involves using file called ‘robots.txt’. This is a plain text file which you put up onto your website (e.g www.forty-first.co.uk/robots.txt), and communicates with the search engines. When a search engine visits a website, they will check to see if a robots.txt file is present and if so, read the contents of it. This file usually tells the search engines to index all of the pages on the website, but what if you don’t want all the pages of your site indexed? There could be reasons such as some areas may be under development, some pages only for certain people to see etc. If so, use the robots.txt to block the search engines from indexing these pages. I can’t guarantee that all the search engines will take notice, but the main players (i.e Google, Yahoo, Live etc) should do so.
An example robots.txt would look something like this:
# /robots.txt for http://www.forty-first.co.uk/
# comments to email hidden; JavaScript is required
User-agent: *
Disallow:
Use the 'disallow' area to add in any areas on your site you do not want indexed, e.g
Disallow: /file1
Disallow: /file2 etc etc
If you really want to know more, you can log on to the Web Robots Pages website!





