Webmasters have various definitions for ‘crawl budget’, which is a term that refers to Googlebots that crawl websites to index their pages for the search engine rankings (i.e. ‘Crawling’ is the start of the process to get websites into Google’s search results). Efficient crawling of a website helps with its indexing in Google Search.
Google recently clarified the meaning of ‘crawl budget’, which covers a range of issues, but they also emphasised that it’s not something that needs to concern the majority of webmasters, whose sites have less than a few thousand URLs, as most of the time, sites of that size will be crawled efficiently.
Crawl rate limit
Prioritising what to crawl, when and how much resource the server hosting the site can allocate to crawling, is more important for bigger sites, or those that auto-generate pages based on URL parameters, for example. Crawling is Googlebot’s main priority, while making sure it doesn’t degrade the experience of users visiting the site. This is called the ‘crawl rate limit,’ which limits the maximum page fetching rate for a given site.
Crawl health
If the site responds really quickly for a while, the limit goes up, meaning more connections can be used to crawl. If the site slows down or responds with server errors, the limit goes down and Googlebot crawls less. By setting the limit in Search Console, website owners can reduce Googlebot’s crawling of their site. (Note that setting higher limits doesn’t automatically increase crawling).
Crawl demand
Even if the crawl rate limit isn’t reached, if there’s no demand from indexing, there will be low activity from Googlebot. The factors that play a significant role in determining crawl demand are:
- Popularity: URLs that are more popular on the Internet tend to be crawled more often to keep them fresher in Google’s index.
- Staleness: Google’s systems attempt to prevent URLs from becoming stale in the index.
- Additionally, site-wide events, like site moves, may trigger an increase in crawl demand in order to reindex the content under the new URLs.
Taking crawl rate and crawl demand together, Google defines ‘crawl budget’ as the number of URLs Googlebot can and wants to crawl from a website. According to Google’s analysis, a website that has many low-value-add URLs can negatively affect a site’s crawling and indexing, such as having on-site duplicate content, soft error pages, hacked pages or low quality and spam content.
These sort of issues make it important to develop quality content throughout a website, but also to keep monitoring Google Search Console reports to ensure that a site is being indexed regularly and efficiently, and there are no potential issues with the site that may prevent pages being added to Google’s index.
You can read more on how to optimise the crawling of your site, here, and this is still applicable despite being an article from 2009. If you would also like to know more about how we can check if your site is being correctly indexed to ensure it’ll be admissible to Google’s search results, please contact us now.
Also of interest:
How Google’s Recent Ranking Changes Benefit High Quality Websites
Understanding Google’s Data Centers