dimanche 5 juin 2011
XML: build a sitemap for search engines
Do you like this story?
A Sitemap is an XML file which contains all the information needed by search engines to crawl a web site, and eventually index it more efficiently. The XML file mainly contains URLs and additional information which can be used by search engines crawlers. If a Sitemap is not present in the root of a web site, the crawlers usually will follow the links inside a page and try to "discover" all the pages of the site. With a Sitemap, that process is done more efficiently, quickly and the web master is sure that all relevant pages are indeed analysed by search engines spiders. Sitemaps do not guarantee that a web site is indexed and ranked, however they help a lot the process.
There are many Sitemaps creator on the web, however I would like to explain how to build one yourself. As a side note, you should know that RSS feeds can be used as Sitemaps in tools like Google Webmaster, however it is better to use the Sitemap protocol to ensure that other web spiders understand the structure of your site.
<?xml version="1.0" encoding="UTF-8"?>
Then we define the protocol standard. The urlset tag will encapsulate all the urls in our Sitemap:<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
The xmlns used here is a general one. You can use schemas from Google as well:<urlset xmlns="http://www.google.com/schemas/sitemap/0.9">
As said, the urlset tag is now open, and inside it we place the actual urls following this pattern:<url>
<loc>http://www.example.com/</loc>
<lastmod>2011-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
Then we repeat it for every single page we want to include in the Sitemap.<loc> is the URL of your page. The first URL should be the URL of your web site, then the other URLs should be inside pages (something like http://www.example.com/products.html or http://www.example.com/products.asp?code=12 etc). You can actually point to specific pages and basically could be anything relevant to your web site. Note that if your web server requires it, you should end the URL with a backslash. The URL have to start with the appropriate protocol (such as http://).
<lastmod> stands for last modified and it's optional. This is the date when the page has been modified. The date must be in W3C datetime format. I usually use the YYYY-MM-DD format.
<changefreq> is stating how frequently the page is changed. It is an optional tag. Valid values for the tag are:
- always
- hourly
- daily
- weekly
- monthly
- yearly
- never
<priority> is again an optional tag, but it's quite interesting. It gives a priority to your URL relative to other URL in the site. The default value is 0.5, and it can be change to a value from 0.0 to 1.0. Using it, the Sitemap will indicate which are the most important pages in the web sites. Again, using this tag will not put your most important pages in a higher rank, however it will help web crawlers to better understand the structure of your web site. Do not put every page to a 1.0 priority, hoping to obtain a higher ranking. That is definitely not working!
Now your Sitemap page is ready to be published. Upload it and submit it to your favourite search engine. As said before, you can use Google Webmaster Tools to do it.
I hope that this post will help you. You can visit sitemaps.org for more information.
This post was written by: Franklin Manuel
Franklin Manuel is a professional blogger, web designer and front end web developer. Follow him on Twitter
0 Responses to “XML: build a sitemap for search engines”
Enregistrer un commentaire