Google announced today the completion of their new web indexing system entitled “Caffeine”. This new indexing system is the largest collection of web content they have ever offered. It provides up to fifty percent fresher results than their last index, when performing a web search using their service. News stories, Forum posts and Blog posts appear much faster in the returned search results after they have been published.
Explained in a simple way, when a person searches something on the Google search engine, he is not searching the live web, but the web indexed by Google. This helps by pointing exactly at the information that person needs.
Google explained that they built the new search indexing because the content on the web is quickly growing in size and numbers with videos, images, news plus real-time updates, which makes the average webpage much more complex. The expectations for search results by the readers are also higher than they used to be, which means that people who search something on Google want to find the latest relevant content fast, and publishers also expect to be found by the readers as soon as they publish a news article. They have built the “Caffeine” search indexing system, in order to keep up with the growth and evolution of the web, and to meet the user expectations.
This picture, taken from the official Blog of Google, shows exactly how the new search indexing system works compared the old one.
Google’s old index had several layers. Some of those layers were refreshed faster than the others, but the main layer would take a couple of weeks to update, because in order to refresh a layer of the old index, Google would have to analyze the entire web. This meant that there was a significant delay between when they found the page they have to index, and when they made it available to the person who searches.
With Caffeine however, Google analyzes the web in small portions and updates its search index globally and on a continuous basis. The searcher can find fresher news articles and posts, no matter when and where were they published, because when Google finds a new page or new information on an existing page, they add those straight to the index.
Every single second the new Caffeine search indexing system processes hundreds of thousands of new pages in parallel. It takes up nearly one-hundred million gigabytes of storage in only one database. Caffeine adds new information at the rate of hundreds of thousands of gigabytes per day.
Check out this video for a more simple explanation of how the Caffeine indexing system works:
httpv://www.youtube.com/watch?v=BNHR6IQJGZs&feature=player_embedded
Stated on the official blog of Google, they have built the new Caffeine indexing system with the future in mind, because it is a robust foundation, which gives them the opportunity to build an even faster search engine that will scale with the growth of information and content online, and delivers the results afterwards for you, the searcher.