August 5, 2015

The Perils of Complexity: Antidotes to IT Complexity

CIO Strategy Exchange, New York, 2007

“If the system ever does reach an equilibrium, it isn’t just stable. It’s dead.”
– John Holland, University of Michigan

“Trade is the natural enemy of all violent passions. Trade loves moderation, delights in compromise and is most careful to avoid anger. It is patient, supple and insinuating, only resorting to extreme measures in cases of absolute necessity.”
-Alexis de Tocqueville “Democracy in America” 1840

Opening Chorus

Complexity theory is a recent discipline that deals broadly with related – but not predictable – relationships among networks of humans, organizations, machines, or natural phenomena. In some sense, it complements chaos theory which suggests the butterfly wing in China somehow triggers the hurricane in the Gulf of Mexico. Chaos theory assumes the linkages can’t be traced. Complexity theory tries to find a plausible relationship although the complexity of some phenomena, by definition, renders them unpredictable.

Admittedly, this amateurish explanation is merely a teaser for a much larger topic than this position paper is prepared to engage. Our topic is limited to IT complexity – the often unpredictable consequences spawned by the interaction of underlying technological, organizational and business factors. But first a welcome caveat: the phenomena investigated by academic researchers developing this theory are more complex than even the largest IT structures – closer to the entire Internet in scope than any corporate network. So we’ll focus on the more practical issues.

In Six Degrees, The Science of a Connected Age, author Duncan Watts opens with the disconcerting illustration (heavily excerpted) of a minor incident provoking a massive power failure. “A single transmission line in Western Oregon sagged a little too far and struck a tree that had been trimmed a little too long, causing it to flash over… What happened next was frighteningly swift and totally unexpected… the coping mechanism was to transfer the load to the other lines in the set. Unfortunately, these other lines were also carrying loads close to their limits, and the extra burden proved too much. One by one the dominos began to fall…forced out of service because of a transformer outage… tripped out due to a relay malfunction… When power lines are heavily loaded, they heat up and stretch…sagging in the hot sun. The hopelessly overtaxed line hit one of the ubiquitous trees. Once generated, power has to go somewhere… Cut off from California it surged east from Washington and then south, sweeping like a tidal wave through Idaho, Utah, Colorado, Arizona, New Mexico, Nevada and Southern California, tripping hundreds of lines and generators, fracturing the western system into four isolated islands, interrupting service to 7.5 million people.” (pgs 21/22)

Isn’t this meltdown exactly the CIO’s nightmare concerning the company’s mission-critical infrastructure? And though our position paper can’t do justice to complexity theory in its dimensions and depth, this cascading disaster can serve as our cover photo. One component flutter… and collapse. When networks in their largest sense grow too complex, their failures can become unpredictable in timing, duration, and impact.

If that’s the prognosis, what’s the solution? In our crowd, the usual method for containing complexity is infrastructure standardization – where we begin. Then address the accelerating dependencies in global businesses and the layers of additional complexity which follow. And finally, we’ll cover several exogenous factors that have already begun to weaken traditional IT vendors (more problems!), but may also provide a long-term method of easing customers’ stress.

First Order Solutions

Standardizing and consolidating servers and networks are widely accepted as the essential first steps in attacking complexity. So is the basket of best practices collected in the Information Technology Infrastructure Library (or ITIL.) The management lessons learned here resonate across most business sectors:

“Philosophically, I don’t know at what point having an additional platform increases complexity beyond the breaking point, but I know that point occurs,” begins our [REDACTED] member. “The only way to manage complexity is to contain it. At the infrastructure level, containment requires strictly limiting the hardware, operating system, and middleware platforms. So we toughed it out with Solaris and Linux even when Sun had Ultrasparc quality problems and other customers switched to HP, adding another operating system. On failover software and procedures, there is one approach for all systems; it worked on 9/11, so why add others?”

As for applications, the [REDACTED] business units can choose their own, assuming their choices: a) don’t duplicate the functions of applications already implemented elsewhere in the investment bank; b) operate on standard platforms; and c) use the firm’s standard interface format which contains every imaginable field. Inapplicable fields are simply left blank.