I love the “big data” trend: the possibilities of being able to fully collect and analyze user behavior are tremendous.
However, we must not take the human factor out of the equation. Big data is not all intelligent algorithms tirelessly seeking connections – Humans are very much the key part – the part of interpretation and translation of the results into real-world meaning. Even more – the dark charm of statistics can very easily cause us to forget the “real world” meanings and arrive very quickly at very wrong conclusions.
A well-known example is “Simpson’s Paradox“, aptly named after its discoverer Udny Yule. The phenomena basically states that is it very easy to fall into the wrong conclusion if you group different samples without looking for causality in the same time.
A few easy examples are in the Wikipedia entry, and in Product Management we actually encounter similar effects all the time, when performing A/B testing or when observing user behavior.
A recent example came across when I was alerted to strange behavior at a customer site, where the admins could not figure out why certain media content was far more popular than other content. We spent hours digging through web site analytics, building funnels for in and out flows, and wracking our brains.
Finally, we discovered that on certain conditions, the video player loads slowly. So slowly in fact, that users simple abandoned the page before the video even started player. This caused a dent in the data, totally unrelated to the actual media content, the page styling, etc cetera.
Being able to track all possible conditions, and all possible analytics, creates a huge pile of data. Machines will help sorting through it, and will present possible correlations, but it is up to the human at the helm to use common sense and rule out the irrelevant correlations, and dig for more inputs (e.g. video player load time) when the results are not satisfactory.
Be aware of the “lurking variables” and use intuition and imagination to flush them out.
Finally, an example on a related topic, of how being presented with numbers can drive perception, even if the whole picture is not revealed. From the Washington Post: Gas prices are indeed soaring… until you take into account the (not-so-lurking) variable of inflation.