I read an interesting article some days ago in Information Management, by Navin Sivanandam and about “From big data to big decisions” (source: http://www.information-management.com/news/Big-Data-Decisions-Experiments-10026788-1.html). I then contrasted Sivanandam’s world with my own, and specifically an ongoing G2M campaign for a client in B2B space. The contrast was stark, Sivanandam’s world is about randomized trials, statistical significance, and sample sizes of thousands or more. My world is about various natural experiments, learning, and sample sizes of 1-10, but big decisions.
I decided to explore the apparent contrast between Sivanandam’s big data / big decision narrative and my own experience in the form of this blog, and specifically do some demything regarding big data.
Myth 1: Big data is about big decisions. Big decisions are arguably about major investment projects, like entering a new market, launching a new product line, or acquiring a competitor. The decision material for such Board-level decisions tends to be NPV estimates based on 10-100 parametric assumptions and complemented with a strong narrative. Indeed, it is hard to imagine a CEO justifying a key decision to his Board with an output of a statistical algorithm based large amounts of data, rather than the NPV calculation and the narrative. It is similarly hard to understand what role big data could have in this picture.
Myth 2: Big data is about decisions. No, in my opinion big data is about monetization of information rather than about decisions, and to the extent about decisions, typically about operational decisions. Let us look at some typical and frequently cited examples: Rolls Royce with its real-time system for monitoring of jet engine turbines, Google’s auctioning algorithms for Adwords real estate, and many credit-scoring algorithms used by credit card companies and banks. One could of course say that this is about deciding to dispatch a repair crew with spare parts, deciding where to put what ad when, or deciding to accept or reject a credit card application. However, the whole point is effective monetization of large amounts of data.
Myth 3: Big data is like a well-designed experiment in physics, based on a linear sequence of observations, hypotheses, experimental design, experiment, analysis, and conclusions. No, in my opinion it is about an iterative process, in which one iteratively explores hypotheses, looks for patterns, draws preliminary conclusions, does further analyses, and goes back to the exploration of hypotheses.
Myth 4: Since statistical significance is a reasonable standard for any analysis based on big data, say an A / B test of a new web page (and the data often support claims of statistical significance), such requirement extends to other realms of business decision making. A related myth is that you cannot do meaningful inferences in business without large sample sizes. Let me shed some light on this issue based on some work for a former employer: I once (I think this was early 2001-2002) executed one G2M strategy in India and succeeded, we executed another one in Russia and succeeded less well. Therefore, for me, it followed that the one we followed in India (a single distributor with sales force + direct channel) was good, the one in Russia (catalogue-based distributor / reseller) was less good. Was conclusion correct and generalizable (in 2002): probably. Was conclusion based on statistically significant evidence: probably not (as based on sample sizes of 1).
I made indeed an effort to gather information as to what constitutes valid basis for a conclusion, in a number of domains (specifically, venture community, business, law, engineering, science, hard science, and creative professions). The standards differ widely, from 5 sigma in particle physics, 2-3 sigma in medical science, to beyond reasonable doubt in criminal law, to > 50% in civil law, to in accordance with standards in engineering, to around 60-90% probability of correct a priori NPV in business, to 1-2 successes out of 10 possibilities in the VC community. However, one is led to respect that statistical significance is just one out multiple standards.
Myth 5: Number crunching, as in big data, is somehow a superior approach to gathering evidence for business decisions. I would argue otherwise, and that you may want to invest as much in small-scale business experiments as you do in your massive Hadoop cluster with an R stats package. For more information, I recommend Thomas H. Davenport: “How to design smart business experiments” (see hbr.org/2009/02/how-to-design-smart-business-experiments). His conclusion is, for reference: i) understand when testing makes sense; ii) establish a process of testing (create hypotheses; design tests; execute tests; iterate); iii) build a test capability; and iv) create a testing mind-set.
So far, we have primarily had a descriptive perspective. One could try to extend the analysis using a normative perspective. I am not sure it would add additional insight.
In fact, and since we started out with Sivanandam’s article about “From big data to big decisions”, I would argue that if you want to explore the nature of big decisions, you should rather read the classics, like for example Ghemawat’s “Commitment—The dynamic of strategy”. Ghemawat provides a clear perspective on what constitutes big decisions (those that require irreversible investments, in products, markets, or production facilities), how to make them right (by investing in durable, specialized, and untraded factors that are scarce and such that scarcity value can be appropriated), and the value of flexibility (or reversibility and optionality). Think Apple and its launches of iMac (1998), iPod (2001), iTunes (2003), iPhone (2007), unt so weiter. It is hard to see what role big data could have had in Apple’s big decisions and its subsequent and persistent success.
And to my readers: I would welcome any examples you have of big data having contributed to big decisions. (There quite probably have been some, I am just not aware of them.)