The ABC of an XML Sitemap

Anatomy of an xml sitemap

XML Sitemaps are URL lists that gives a leg up to search engine spiders to easily find available URL’s to crawl and update their index with the latest website information. XML sitemaps serve as fodder for the search engines unlike the usual HTML sitemap which is meant as a navigation tool for human users.

In simple terms, XML sitemaps give Search Engine spiders an authoritive list of URL for indexing with additional meta data like the last modified date, frequency of change (updates) and page priority (hierarchy of importance vis a vis other pages).

Once you have your sitemap in place and submitted to the search engines, it serves as an instructional guide for the spiders to crawl your webpages in a optimal way. Though there did exist a school of thought that submitting XML sitemap does interfere with a search engine spider’s natural way of indexing through discovered links, it is now common consensus that XML sitemap do have a very important role to play in the crawling, indexing and consequent traffic stakes.

Lets take a close look at the anatomy of an XML Sitemap

Line 1 <?xml version=’1.0′ encoding=’UTF-8′?>
Line 2 <urlset xmlns=”
http://www.sitemaps.org/schemas/sitemap/0.9“>
Line 3 <url>
Line 4 <loc>http://www.yoursite.com/</loc>
Line 5 <lastmod>2008-08-17</lastmod>
Line 6 <changefreq>weekly</changefreq>
Line 7 <priority>0.5</priority>
Line 8 </url>
Line 9 </urlset>

  • Line 1 describes the type of file. The file should be UTF-8 encoded
  • Line 2 serves as a reference of the current protocol standard
  • Line 3 – <url> is the parent tag for all URL entries which would be the children tags
  • Line 4 -<loc> would be the URL of the page. The full URL starting with http should be enclosed in this tag
  • Line 5 – <lastmod> specifies the date on which the page was modified
  • Line 6 – <changefreq> specified approximately how often the page is updated, Values for this field are always, hourly, daily, weekly, monthly, yearly and never
  • Line 7 – <priority> will specify the importance of this page vis a vis other pages, so the search engines know which page to crawl first.

Close the urlset tag once you have created the URL entry. It is important to note that while <lastmod>, <changefreq> and <priority>are optional, the other tags are mandatory to complete the schema of an XML sitemap.

Each sitemap file is limited to 50,000 url’s only and if your website has more pages, you need to have multiple sitemaps with a sitemap index file.

Now that you have your sitemap ready, you can validate it using this validator tool.

If you want to generate your sitemap instead of working on it manually check out this online sitemap generator.

The next step would be upload it to your root directory and submit the sitemap url using Google Webmaster Central for Google Bot to take proactive action. You can also specify the sitemap url in your robot.txt for spiders to find it quickly.

Ideally, use the XML sitemap in conjunction with your robot.txt file (the xml sitemap instructs spiders to index priority files, the robot.txt helps specify files that you dont want to be indexed.). You are all set now to better indexing by the search engines and greater control of the files that need to be spidered.

If you like the post, do spread the Love

2 comments… add one

Leave a Comment