Back

How to Manage Sitemap for Large Website (E-Commerce type) that has 1000s URLs

Last updated on 25 Jan, 2023

You might have worked with normal sitemap but have you ever worked with large websites that has hundreds of thousands of URLs in it. If not and if you are looking for solution for how to manage sitemap for large website e.g. Amazon.com then this blog post is for you. 

Our client website Bidnfix.com is service portal where our client is offering different services in different cities of multiple countries worldwide. After optimizing this website performance using Node, React, React SSR we were good to go for SEO.

Imagine the sitemap for website having N number of services in N number cities of N number of countries.  The total number of URLs will be not less than (N*N*N).

Problem#1: One sitemap can not have more than 50K URLs in it.

Problem#2: All of this service categories, cities and countries data is managed in backend admin panel so any time if admin make changes in any of the data the related sitemap needs to be updated.

Here is how we solved both the problems.

Solution to prob#1 is simple as we have option to add index sitemap file which will have all other sitemaps as it's children. In my case we are sure that if we create country wise sitemap the total number of URLs will not exceed 50k for specific country so we have decided to create sitemaps by country and have also created index which will list all of this sitemaps as list within index sitemap.

Basically we have 1 "sitemap.xml"  i.e. index file and 51 "sitemap-{countrycode}.xml" associated sitemaps files.

Now solution to prob#2 is quite tricky as we can not update sitemap.xml as an when admin made changes in database so we have decided to create node.js based cron job which will do this task on every 10 minutes. So basically sitemaps will be delayed by only 10 minutes. I think, this is not bad as data does not change that frequently.

To create this job we have used  cron node module. Another optimization which we have done is we have added 1 flag field i.e. "sitemap=true/false" in each collection/table to track whether to update sitemap or not. So when sitemap field value is true then sitemap job/engine will re-generate sitemap for country(es) else it will leave as is. So that way we are avoiding unnecessary sitemap re-generation and reducing extra burden on server.

I hope you enjoyed reading this post.

Reach us at hello@3braintechnologies.com or call us on +91 8866 133 870.

Hire React Native App Developers hire -button
about author

Hitesh Agja

I am Hitesh Agja, and I have 12+ years of industry experience. I am always excited and passionate about learning new things, technical or not. Life is all about learning new things and making the world more progressive.

Let's talkhire -button