In every aspect of business right now, companies collect data until they see a pattern that appears statistically significant, and then they use that tightly selected data to drive decisions. The problem is, we assume that the data has merit, that it is objective, and that it holds the answers that will change the way business is done. Data is anything but objective because there are always humans involved. Critics have come to call the problem p-hacking and the practice uses a quiver of little methodological tricks that can inflate the statistical significance of a finding:
- Conducting analyses midway through experiments to decide whether to continue collecting data
- Recording many response variables and deciding which to report post analysis
- Deciding whether to include or drop outliers post analyses
- Excluding, combining, or splitting treatment groups post analysis
- Including or excluding covariates post analysis
- Stopping data exploration if an analysis yields a significant p-value
Add it all up, and you have a significant problem in the way our society produces knowledge. Increasingly, we desperately try to reduce the vast complexity of the world into a series of statistics that we can use to try to comprehend what’s happening. As if staring at the numbers long enough will give us the secrets of the universe. We divest brands of meaning, devalue the art of marketing, and fixate on sample size. But the world is a bit more complex than that. And when we get it wrong, it can be a disaster.
In the second half of the 18th century, Prussian rulers wanted to know how many natural resources they had in the forests of the country. So, they started counting. And they came up with these huge tables that would let them calculate how many board-feet of wood they could pull from a given plot of forest. All the rest of the forest, everything it did for the people and the animals and general ecology of the place was discarded from the analysis.
But the world proved too unruly. Their data wasn’t perfect. So they started creating new forests, the Normalbaum, planting all the trees at the same time, and monoculturing them so that there were no trees in the forest that couldn’t be monetized for wood. Based on the data at hand they began to transform the real, diverse, and chaotic old-growth forest into a new, more uniform forest that could be controlled.
And for the first hundred years or so, the scheme worked. But then the forests started dying. The complex ecosystem that underpinned the growth of these trees through generations were torn apart by the rigor of the Normalbaum. The nutrient cycles were broken. Resilience was lost. The hidden underpinnings of the world were revealed only when they were gone.
Now, take the ad-supported digital media ecosystem. The idea is brilliant: capture data on people all over the web and then use what you know to show them relevant ads, ads they want to see. Not only that, but because it’s all tracked, an advertiser can measure what they’re getting more precisely. And the spreadsheet makes an awful lot of sense at first. Unfortunately, looking at data alone overlooks the peculiarities and complexities of the human experience. Because data is very good at answering how and what, we assume it can also answer why. This is in fact rarely the case.
Advertisers and ad-tech firms want to capture user data to show them relevant ads. They want to measure their ads more effectively. But placed into the real-world, the system that grew up around these desires has reshaped the media landscape in unpredictable ways.
We’ve deceived ourselves into thinking data is a camera, but in fact, it is an engine. Capturing data about something changes the way that something works. Even the mere collection of stats is not a neutral act, but a way of reshaping the thing itself.
There are numerous quotes about how important data is, and how decisions should always be backed by data. Data is one perspective. What your users are saying is another perspective. What you internally want to do is another. What makes financial sense is another. To make a decision you gather the perspectives that matter to you, weight them according to your judgment and then make your call. Data is a false god. You can tag every link, generate every metric, and run split tests for every decision, but no matter how deep you go, no matter how many hours you invest, you’re only looking at one piece of the puzzle.