Google Reveals Its Data-Center Inner Workings

FILED UNDER: Google, Technology

As a part of Google’s pledge to be more open about their business activities, they have revealed information about the inner-workings of their data centres.

Google’s Jeff Dean spoke to a crowd at the Google I/O conference on Wednesday describing how their data-centres are assembled.

Apparently Google uses more-or-less ordinary servers and stacks 40 in each rack. It has not been revealed how many servers it has, but with 150 racks per data centre and 36 data centres around the world we can estimate that they have over 200,000 and growing.

Each Google search query involves 700 to 1000 servers yet manages to return a response within a sub-half second.

Google largely builds its own technology instead of relying on mainstream servers and treats each machine as being expendable. Google prefers to invest in fault-tolerant software instead of top-end hardware.

“Our view is it’s better to have twice as much hardware that’s not as reliable than half as much that’s more reliable,” Dean said. “You have to provide reliability on a software level. If you’re running 10,000 machines, something is going to die every day.”

Bringing a new cluster online shows how fallible hardware can be, Dean said.

In each cluster’s first year, it’s typical that thousands of failures of hard drives will occur; 1,000 individual machine failures will occur and 5 racks will “go wonky” losing half their network packets. The cluster will have to be rewired once which affects at any given moment 5 percent of the machines over 2 days and one power distribution unit will fail bringing down 500 to 1,000 machines for about 6 hours, Dean said.

There’s also a 50 percent chance the cluster will overheat which will bring down most of the servers in less than 5 minutes – taking 1 to 2 days to recover.

There are three core elements to Google’s software architecture, MapReduce, BigTable and Google’s file system called GFS which are all proprietary.

Dean said that GFS runs on almost all machines and stores data on many. Some versions of GFS are “many petabytes (a million gigabytes) in size.”

Structure is provided by BigTable, Google’s database software. High profile commercial database management software from companies such as Oracle or IBM can’t operate on the scale that Google requires. Also, their licenses would cost the company far too much.

Google began creating BigTable in 2004 and it is now used in over 70 projects including Blogger, Google Earth, Google Maps, Google Print, Orkut as well as their massive search index.

MapReduce was created in 2003 to make good use of Google’s data. It can find how many times a single word shows up in their index and create a list of all websites that link to any given website.

Like GFS, MapReduce is designed to sidestep server problems. One system, during a presentation in 2004, withstood a failure of 1,600 servers out of a cluster of 1,800, Dean said.

As always, there are many projects in the works at Google, a company that never sits still. Hopefully they will keep their new open policy and continue to share with us information of their fascinating work and achievements.

CNet News


Copyright © 2008 - 2018 Mark's Technology News - All Rights Reserved
Proudly powered by WordPress.