Saturday, April 23, 2011

Cloud Computing: Smooth Operation, Spectacular Collapses

With two high-profile cloud services offline this week, adding to a string of cloud computing failures, it is evident that cloud computing is not the highly efficient solution its supporters had expected. By dynamically balancing its workload, a cloud can operate surprisingly smoothly. There are fewer failures, but the complexity this requires leaves a cloud vulnerable to the kind of total failure that leaves engineers scratching their heads for days on end, working around the clock and missing holidays as they try to patch together something that works.

This pattern of smooth operation and spectacular failure doesn't occur because cloud computing is an immature technology, but because of the way it shifts risks. Auto-balancing works . . . until it doesn't. A system that depends on auto-balancing is likely to work fine right up until the second it fails. This can create the illusion of a reliability that isn't necessarily there.

This principle isn't limited to cloud computing, but applies across a wide range of topics. Tunisia's government, for example, weathered crises one by one until its ruler fell ill. Then it fell apart quickly. As another example, if we adopt the "smart grid" approach to electricity, we have to accept the risk of the weeks-long nationwide blackouts that a smart grid would occasionally produce.