By Stephani Richardson
The "robots" meta tag, when used properly, will tell the search engine
spiders whether or not to index and follow a particular page. Some
examples of usage are as follows:
<meta name="robots" content="index,follow">
<meta name="robots" content="noindex,follow">
<meta name="robots" content="index,nofollow">
<meta name="robots" content="noindex,nofollow">
Let us first examine what these terms mean before we explain the usage for
each one:
"index"- This directive tells the search engine robots (or spiders) that
it is okay to index the page. Another words, you are allowing the search
engine to include your page within their search directory.
"noindex"- Using this tag, you are letting the robots know that this page
should not be indexed. Simply put, this page will not appear in their
search directory.
"follow"- When you use this tag, you are telling the search engines that
you want their robot to follow any links that are found on that page.
"nofollow"- The opposite of the above definition, this directive will tell
the robots not to follow any links on your page.
Putting it all together:
With the robots tags explained, let's examine the usage for each one.
1. <meta name="robots" content="index,follow">
This tag will be used when you want the search engine spiders to index the
page and follow the links to other pages. Most search engines use this
setting as a "default" setting. It is possible that you may not even need
to use this tag if you want the search engines to follow and index the
page. However, an article at Search Engine World (searchengineworld.com/metatag/robots.htm) suggests that Inktomi does not
use this as their default setting. Instead, they use the "index, nofollow"
tag.
Better safe than sorry!
There has been much debate over whether or not it is necessary to use this
tag. If there is even a slight possibility that some search engines do not
use this as the default setting, then it would only make sense to include
this tag if you want your page included in their search directory AND your
links to be followed. Do the research and decide for yourself.
2. <meta name="robots" content="noindex,follow">
This tag can be used to tell the search engines that you do not want the
page included in their directory, but you DO want them to follow the links
that lead to other pages. A good example of its usage would be your
disclaimer or privacy policy pages. You may not want these pages to show
up in the search engines if they are only important to your actual
visitors. However, if the links on these pages point to other pages that
you want the search engines to find, then you would still want the spiders
to "follow" those links.
3. <meta name="robots" content="index,nofollow">
This tag will allow your page to be indexed in the search engines, but any
links on that page will not be followed.
4. <meta name="robots" content="noindex,nofollow">
When using this tag, the search engine spiders will not include this page
in their directory and will not follow any links on the page either.
Where does the "robots" tag belong?
The "robots" meta tag should be used within the <head> and </head> tags of
your page. These tags are located at the top of the html coding. It will
look something like this:
<html>
<head>
<title>Title of your page goes here</title>
<meta name="keywords" content="word1,word2,word3,word4">
<meta name="description" content="A brief description of the content of
this page.">
<meta name="robots" content="index,follow">
</head>
<body>
Your webpage information here.
</body>
</html>
More Robots Tags
Google automatically archives a page as it crawls it. This is called a
"cached" version of the page. Visitors can retrieve the archived version
of the page by clicking on the "cached" link within Google's search
results. If you do not want your content to be archived, you can use the
following tag:
<meta
name="robots" content="noarchive">
*This will only prevent your page from being "cached". If you do not want
your page to be indexed at all, you will still need to include the "noindex"
tag.
Another alternative to the above tag is the tag that specifically
addresses Google only. If you want other search engine robots to archive
your site, but you would like to prevent Google from doing so, then you
can use the following tag:
<meta name="googlebot" content="noarchive">
The Misuse of Robots Tags
Something that has been popping up on websites everywhere is the Google
indexing tag. This is a silly little tag that is not necessary. Some
people think this tag helps Google to spider your site, but this simply
isn't true.
The tag looks like this: <meta name="googlebot" content="index,follow">.
Some website owners believe that by specifying "googlebot" that their site
has the advantage of being spidered faster and listed by Google. According
to Google's web crawler information at http://www.google.com/bot.html, you
only need to use the noindex, nofollow, or noarchive tags when you don't
want Google to cache, index, or follow that page. Google's default setting
is to index and follow the links on the page, so this "so called"
googlebot index/follow tag is completely unnecessary.
Another silly little tag--- The "Revisit-After" Tag
<meta name="revisit-after" content="90 days">
<meta name="revisit-after" content="15 days">
I'm not sure where this myth was started. Today, you will find this tag
all over the Internet. Webmasters have even promoted it, claiming that it
actually works. Are we so naive to believe the search engine spiders need
to know when to come back? I have never used this tag, and my site has no
problem with being crawled on a regular basis. Even some SEO (search
engine optimization) sites are claiming its value. This comes back to the
importance of always doing your research like this: http://www.webmasterworld.com/forum5/4924.htm
It is important to examine the correct usage of the "robots" tag before
applying it to your website. Incorrect usage of tags could result in
errors on your page that cause robots to completely ignore your page all
together.
You can find more information about web robots here: http://www.robotstxt.org/wc/robots.html
This article was added on: January 18, 2005. |