Tuesday, January 13, 2009

Talk About Google With No Sense of Scale

A haphazard physics analysis, mischaracterized in the press, then echoed uncritically on television and online (at BBC, CNBC, and elsewhere) show how badly wrong you can go when you approach a problem with no sense of scale.

Times Online, which appears to have invented most of the story, talks about “more than [200 million] internet searches . . . daily.” They do not say where they got this measure, but it is so obviously wrong that it does not belong in a serious discussion of the subject. If there are 1.5 billion Internet users, an estimate from last year, then 200 million web searches per day would mean the average web user conducts less than one web search per week. Having seen Internet users conduct 10 Google searches in less than a minute, and knowing that for some users, nearly every Internet interaction begins with a search, I would guess that the number of searches is closer to 100 per Internet user per day. Most Internet users do fewer searches than this, I believe, but there are also many who exceed 1,000 searches per day, and they bring the average up. If my guess is correct, the number of Internet searches is 150 billion per day — about a thousand times as many as the Times Online suggests.

If you are so removed from a subject that you cannot catch such obvious errors in scale, your chances of reaching a meaningful conclusion from your analysis are virtually nonexistent. That approach only leads to other nonsensical statements, such as this non sequitur just a few sentences later in Times Online: “Banks of servers storing billions of web pages require power.” This picture might sound plausible, but let’s take a look at it using a sense of scale.

The data size of a typical web page, not counting video, music, and slideshows, is about 100 kilobytes. A billion of these web pages would occupy 100 terabytes. A terabyte is a garden-variety hard disk drive these days, selling for $100, occupying a space about the size of a video cassette, and using, when active, perhaps 5 to 10 watts, not nearly enough heat to burn your fingers if you touch its case while it is operating. If we’re talking about a hundred of these, to store one billion web pages, we are not talking about “banks” of computers, but one rack of equipment with a size and power consumption similar to that of a large guitar amplifier.

But wait. Many web pages, including blog pages, classified advertisements, and news stories, are stored in data form. The stored size of the page is much less than the size of the web page you see. So we can take that pile of equipment and cut it in half, at least.

Yes, there are banks of computer equipment in Internet data centers — but the need to store web pages is not why they are there.

The Times Online story’s widely cited conclusion says that an average Google search “generates about 7g of CO2.” This figure too is so far off that anyone with a sense of scale can reject it offhand. Seven grams or 3.5 liters of carbon dioxide is much too big in material terms to result from something so small and ephemeral as a web search. Looking at it more closely only confirms how far off it is. In the worst case, 7 grams of carbon dioxide are generated when 7 watt-hours of electricity are generated. One watt-hour is enough electricity to operate a computer, excluding the monitor, for at least a minute. It is fair to guess that Google (or any other search engine) might use 10 or 20 computers to respond to your search request, but it uses them for a very short time, on the order of one thousandth of a second. Just as important, these computers are capable of doing hundreds of other things during the same time period. No matter how you look at it, the energy cost of the video screen on which you see the search result is far greater than the energy cost of the search itself — all the more so if you take a minute or two to study the search result.

There is a large energy cost in computing, but it is not particularly found in the data centers. Rather, most of the energy we use when we use computers goes to drive the video display we are looking at. We saved a fortune in electricity by switching from CRTs (around 300 watts) to flat-panel displays (around 100 watts), but the power consumption of other computer components fell at the same time. Today’s 1-terabyte drive uses half the power of the 1-gigabyte drive of a decade ago. A desktop computer, excluding the display, may use about 15 to 50 watts; portable computers use considerably less. A computer that is asleep uses a fraction of a watt. This means that the video displays still take up the lion’s share of the power, when it comes to computers. When you look at it this way, the whole concept of the energy cost of a web site is almost meaningless. You are using nearly the same energy no matter what web site you look at, or even if you are just writing a letter, simply by virtue of having the computer display on. And if you have a ceiling light on too, that could double your energy consumption.

Google exaggerated its own energy use in its response to this story. A search uses “0.0003 kWh” — but they had to have thrown in lots of unrelated energy uses to get a number that high, not just a share of the energy used to create the search database, but probably the lights and air conditioning for the offices at the Googleplex too.

The engineers who design and operate the larger data centers have a powerful incentive to reduce their energy use — an electric bill that comes out of their monthly budget, in direct proportion to the electricity they use. For the rest of us, the power used in data centers is small enough that we don’t need to worry about it.