Monday, August 6, 2012

More Binary Transaction Thinking, This Time at Southwest Airlines

In the last two weeks I have written about two transaction-processing glitches that turned what were supposed to be single transactions into multiple transactions. This happened first in banking, at Nationwide building society. It accidentally debited an entire day’s worth of debit card transactions again the next morning. Then something similar, if more damaging, happened in stock trading, as Knight Capital fired off a rapid series of mistaken stock orders that left it broke in just 45 minutes. And over the weekend, it happened again. This time the subject was airline reservations and it was Southwest Airlines that reported the problem.

It was supposed to be a short-term promotion on Friday, 50 percent off on airplane tickets with a long list of limitations and restrictions. But with heavy volume on the web servers, requests often didn’t go through on the first try, so the order-processing software tried again. Worse, requests that appeared to have been dropped had in many cases actually been filled. The result was that customers requesting one reservation often got two, or as many as 20, and were charged that many times.

Southwest recognized the failure almost right away. It didn’t have a simple solution because sometimes customers really do enter multiple reservations, but with just thousands of suspect transactions, it would be quick enough to review them one by one, a process it said could be mostly completed by this morning.

Southwest’s quick response, which included the expense of extra people working through the weekend, helped limit the cascade of problems that followed, such as flights reported as full that actually had dozens of empty seats. The root cause of the problem, though, was the same software failure as in the previous two incidents, with binary logic in transaction-forwarding routines. It is natural enough to code these routines with two cases, one for a successful request and another for a request that failed. These three recent stories illustrate the high cost of this kind of design. If the outcome of a request is unknown and the software just guesses that it is a failure, the results can be quite different from what is intended.

When the weekend started, Knight Capital had to line up a minimum of $50 million in working capital just to open for business today. The latest reports are that it was successful in a preferred stock issue. It got the funding it needed, but at a high price; its previous shareholders may own just 30 percent of the company after the announcement that may come later this morning.