“Robots Meta Tag” are used to control if you want a spider or a bot to index a html page or not. You can give permission to index your whole site and the spider will crawl all your pages.
This is a great way to control bots and spiders if you don’t have access to the root directory and robots.txt file.
Some search engines (not all) fully obey the “Robots Meta Tag”.
What is the format of “Robots Meta Tag” and where do I put it on my site?
The “Robots Meta Tag” are placed in your HTML document in the HEAD section, its not case sensitive. The format are easy and very simple to understant.
Here Is an example on how to set the statement up (case does not matter)
HTML
<HTML>
<HEAD>
<META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">
<TITLE>...</TITLE>
</HEAD>
<BODY>
...
<HEAD>
<META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">
<TITLE>...</TITLE>
</HEAD>
<BODY>
...
The previous statement lets spiders and bots know that this web-page is of limit, if the statement is in your index.html file it means the whole site is off limits.
The four metatag options for this is set in the content part of the statement. They are:
- index
- noindex
- follow
- nofollow
What does the options mean and how do we use them...
First was index.
This tell the spider/bot that it’s OK to index this page
Second is noindex
Spider/bot see this and don’t index any of the content on this page.
Third is follow
This let the spider/bot know that it’s OK to travel down links found on this page.
Last it’s the nofollow
It tells the spider/bot not to follow any of the links on this page.
So the combination of the statements tell the spider what it can do. If you use this to control the spider/bot make sure you don’t ask it to follow links and the only page you are linked to have a noindex statement.
What are the combinations and what do they control:
<META NAME="ROBOTS" CONTENT="INDEX,FOLLOW">
This statement tell the spider/bot that it’s OK to index the page and follow all links.
<META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW">
Here you tell the spider/bot that it’s not OK to index but OK to follow any links on the page.
<META NAME="ROBOTS" CONTENT="INDEX,NOFOLLOW">
Here it’s OK to index this page and NOT permitted to follow the links.
<META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">
Here you tell the spider/bot to stay away from all content and all links
There are 2 global statements and if I remember right the syntax are as follows. (Plese check this before you use it because I have never used this setup so I can’t vouch for it’s validity)
<META NAME="ROBOTS" ALL=INDEX,FOLLOW>
, and
<META NAME="ROBOTS" NONE=NOINDEX,NOFOLLOW>
So depending on how you use the statements in each page you can control spiders and bots and the access they have. If a spider or bot hits your html page and there is no robots.txt or “Robots Meta Tag” what are the defaults. There are articles that say the predefined default is INDEX and FOLLOW. One search engine that fully obeys to “Robots Meta Tag” is Inktomi and Inktomi’s default is INDEX and NOFOLLOW. So it’s important to setup the “Robots Meta Tag” correctly to get your web site indexed by all spider/bot that you want to crawl down your web site.
This is all for now,
If anyone have more information on the global statements please add them so this will be complete
Nils

