August 4, 2015

Breakpoint: Big Data and Big Applications

CIO Strategy Exchange, New York, 2014

This report blends two big topics into a heady stew. Big Data means anything au courant and/or already discredited – according to the trade press. A conundrum. Big Applications are the fresh batch, cooked to new tastes, heavily salted by consumerization and ecommerce. The two overlap in their primary ingredients: IT budgeting, organization, staffing, and even architecture. So bibs on, spoons out! To begin: Here are the most productive questions discussed during CIOSE member interviews. The questions framed this research and will, hopefully, whet your appetite:


  1. Is there a Big Data activity in your enterprise?
  2. Is there a Big Data organization within IT – or elsewhere in the company? How is it differentiated from existing Business Intelligence (BI) units?
  3. How are the Big Data projects funded? How are benefits projected and/or assessed? How will funding growth be handled in the future?
  4. Does the activity include “data scientists?” How are their skills defined? Can existing staff be retrained? How are data scientists recruited?
  5. What new sources are integrated into Big Data analyses that were not previously available? How is data quality validated?
  6. What are the most important tools? Hadoop, HANA, Exodata, other?


  1. Has the IT organization written a major new system, replaced a legacy system or vintage package in the past three years? If so, what was the motivation?
  2. How was approval of the additional budget (if any) obtained?
  3. How have appropriate staff resources been acquired?
  4. What changes in approach, process or tools are especially helpful?
  5. What factors, if any, retarded progress?

Big Data is triggering a storm of debate in headier academic circles. Among the most frequent complaints are that – despite the hype – data science is actually far less important than the introduction of the automobile or electricity. And it can’t replace scientific experimentation. No argument from us or from the CIOSE membership. Those cerebral complaints are straw men – at least in the business community.

Many critiques fixate on the incidence of spurious correlations that lack causation. “If you look 100 times for correlations between two variables, you risk finding about five bogus correlations that appear significant, purely by chance, although no actual meaningful connection exists between the variables.” (NYTimes, April 6, 2014) There’s more: An IBMer proceeds with a barrage against “bogus correlations that are flukes” or appear by chance (as suggested above), “ephemeral” in that they disappear with regression-to-the-mean; “uncorroborated” by scientific observation; “wrong headed” or just silly; and “hyped” or essentially political. (James Kobielus Blog) Of course, the number of charts that depict obviously bogus correlations fly rampant: e.g. autism and organic food or Windows market share and murder rates. Nonetheless, some correlations are intriguing, like “deals closed during a new moon are 43% bigger on average than when the moon is full.” (WSJournal, March 23, 2014)

The many critiques that decry data quality have greater validity. Our CIOSE members readily acknowledge imperfections in their own non-financial structured data. Gathering, cleaning and organizing disparate data (especially unstructured data) can be a messy process. That vacuumed up from Facebook and Twitter is often nasty: It is easily gamed or otherwise distorted as in “Google Bombing.”

Another question about unstructured data is: Does it really represent 80 percent of all Big Data fodder? (That’s the popular wisdom.) One legitimate study of electronically stored data by the Data Warehousing Institute indicated that structured data represents the largest share of the storage soup at 47 percent, followed by unstructured at 31 percent and then semi-structured (e.g. XML) at 22 percent. We wonder: is all this a frivolous schema?

Industry attitudes towards legacy applications have shifted markedly in the past three years. Reasons given for disgorging the legacy snarls include the operational risks imposed by old hardware and operating systems, the cost of maintenance and support, and the inability of older technology to export information to end users and customers in an acceptably friendly fashion. Page speed latency, or the time websites take to fill consumer or employee screens, is also a compelling reason in some environments.

Latency targets that meet the evolving expectations of consumers as well as employees (who share consumer expectations) have been plummeting for years. A two-second target was a 2009 Forrester recommendation; indeed, a Tagman study two years later showed 47 percent of consumers “expect a page to load in two seconds or less.” After two seconds, the conversion rate at which users click through banner ads to a product description declines noticeably. For every additional second, click-throughs fell by seven percent and revenues dropped by a small but discernible one percent.

Clearly, today’s shoppers have very twitchy attention spans. An Aberdeen study showed “A one second delay in page load time equals 11 percent fewer page views, a 16 percent decrease in customer satisfaction, and a 7 percent loss in conversions.” By 2014, Google engineers found users frustrated after just 400 milliseconds; they’ll likely decamp to a competitor whose website is just 250 millisecond faster. The times … they are accelerating.

Paradoxically, actual latency is lengthening as pages are enriched by multiple hosts, film and deeper/jazzier graphics. In summer 2013, Radware’s annual survey of retail websites showed that latency had worsened by 14 percent over the prior year. The next Radware survey showed “the median webpage takes 9.3 seconds to load – a 21 percent slowdown in the last twelve months” from 7.7 seconds. Ominously, “the majority of online shoppers abandon a page after waiting three seconds for it to load.” There’s obviously a clear tradeoff between delivering screen richness and reducing page latency. Especially since 79 percent of consumers who abandon a slow site are unlikely to return – ever!