A dozen times I’ve heard my mom say that every few years the scientists change their mind on whether eggs are healthy. In the last few years, ketchup bottles have shed claims that lycopene reduces your risk of prostate cancer. I want to explain why these U-turns happen so often. It is the skeleton in the closet the public doesn’t know about.
A Pictorial Representation
This cartoon hit the nerd-o-sphere (aka econo-blogosphere) lately. It hits home with virtually every data-head out there. Unfortunately, I am afraid most lay people and the media don’t get the very serious point it makes.
Here is the cartoon:
If you look closely, you will see that only for green jelly beans is P<0.05. We say that the “p-value” is less 0.05. When you look for correlations in a sample between two variables (i.e. jelly beans and acne), you essentially end up with two pieces of information. First, the “estimate” which tells you the magnitude of the link between the two variables. For example, eating jelly beans might increase the number of pimples by 43%. Second, we have the “p-value” which tells us how sure we are the estimate is not just a statistical anomaly. The p-value tells you how likely it is that you would find a link even if no true relationship exists. The reason you might find such a link is that randomness sometimes looks like a pattern. (For example, sometimes you do flip 10 heads in a row.)
To decide if two variables are likely related, it is important to use both pieces of information – the estimate and the p-value. The rule-of-thumb for saying we think an estimated relationship is true is usually P < 0.05. If P <0.05 it is “statistically significant.” That is, there is less than a 5% chance we would find evidence for an effect that doesn’t actually exist. Of course, this means if you estimate the effect of 20 different colors of jelly beans on acne, then simply by chance you should expect to find at the 5% significance level at least one “effect” that is not true.
For the one reader still with me after the statistics lesson, I will now explain how we might end up with a bunch of flip-flopping on scientific (and economic) studies.
Suppose you wanted to know if lycopene – which is in tomatoes – is good for health. One study you could do is to look at how much lycopene people in different countries eat because of their diets. Let’s say you have a bunch of data on broad health outcomes like cancer rates or life spans. You could then estimate the effect of having more lycopene in your diet on these broad health outcomes. My gut feeling is you would also find no effect. Because so many things impact life span, the effect of lycopene is probably small.
Of course, it’s no fun to find that something doesn’t matter. So, what you would then do is look at different types of diseases. Maybe you have data on 20 diseases from heart failure to diabetes to brain cancer to prostate cancer. You then estimate the effect of lycopene on each disease individually. Amazingly, you find that “lycopene at statistically significant levels reduces the chance of prostate cancer.” The news reports this important result, Heinz Ketchup prints it on every bottle, and husbands who hate tomatoes have them shoved down their throats!
My point is that it may not be true at all. All the time, follow up studies with different data will overturn the “statistically significant” results because the first finding was due to randomness. If you have enough variables, you can always find significant results. You can find stock prices that correlate merely by chance, or maybe unemployment and skirt length. You might find Viagra also helps with arthritis. In short you get lots of fun things to talk about at cocktail parties.
The danger of data mining is you cease to be able to know what is “truth” and what is randomness.*
The second factor, however, which exacerbates the danger of data mining is “publication bias.” Five different researchers might run studies to determine the effect of lycopene on health. Four of the five might find no effect on prostate cancer. If you knew this you would say there probably is no effect.
The problem is everyone likes sexy results. Saying A doesn’t affect B is kind of boring. Furthermore, journal publications tend to reward results rather than non-results. And if you know anything about academia you know you “publish or perish.” The one guy who found a statistically significant effect will get published in a peer-reviewed journal. The other four who found no effect might not even waste their time sending their results in since the deck is stacked against them. Zero results are boring results.
Most news outlets will report on “peer reviewed” work. The net effect is that what gets reported is biased toward finding effects. In fact, if you wanted to do your own search of the literature you still may never find out about the other studies because they were never published. They are crammed in the dark reaches of someone’s desk.
So, the next time you hear “researchers have found…”, keep in mind that data mining plus publication bias means you should be very skeptical of new findings.
*I am not saying truth is determined by empirical evidence, which is why I put it in quotes.
If you missed my other post from this morning it is here.