As regular readers will know, I do like a health-and-fitness-based scientific study with some good, solid statistics to cite. However, among the rigorously tested experiments, based on results collected over an extended period and conducted using a large number of subjects, there are plenty of more questionable papers floating around. The latter are frequently founded on limited evidence and published largely to grab headlines, and unfortunately, it is usually such papers that seem to gain sway, often for all of the wrong reasons.
I wrote about one such study back in February, which purported to show that running was bad for you.
More recently, another paper has come onto my radar, which posits that ‘eating chocolate makes you thinner’. This proposition hit headlines across Europe, making news in 20 different countries and in 12 different languages. It made it onto TV news shows and into a multitude of papers and magazines. And it was all based on bad data.
In fact, this study was, actually part of an experiment to test how easy it was to make a headline with some questionable ‘science’ and data manipulation. And the answer? Well, pretty easy it would seem.
This is a really interesting case and shows quite how easily data can be engineered and how a lack of peer review or editorial discrimination when it comes to big headline-grabbing ‘results’, can lead to unfounded health claims. As such I wanted to share it here.
Test subjects in Germany were invited to take part in a clinical trial, each being randomly assigned to different diet regimes.
One group followed a low-carbohydrate diet, a second followed the same low-carb diet, plus a daily 1.5 oz. bar of dark chocolate and the rest, the control group, were instructed to make no changes to their current diet. They weighed themselves each morning for 21 days and the study finished with a final round of questionnaires and blood tests.
Both of the treatment groups lost around 5 pounds over the course of the study, while the control group’s average body weight fluctuated up and down around zero. The people on the low-carb diet plus chocolate also were found to lose weight 10 percent faster than the non-chocolate-eating group. Not only was that difference statistically significant, but the chocolate group had better cholesterol readings and higher scores on the well-being survey.
So, I know what you’re thinking: the study did show accelerated weight loss in the chocolate group, right? So surely we can trust it; isn’t that how science works?
Sadly for the chocolate fans amongst you, not really, no. You see, if you measure a large number of things about a small number of people, you are almost guaranteed to get a ‘statistically significant’ result.
The ‘chocolate-weightloss’ study, included 18 different measurements—weight, cholesterol, sodium, blood protein levels, sleep quality, well-being, etc.—from just 15 people, and this is a recipe for false positives.
As John Bohannon wrote of his pseudo-study:
Think of the measurements as lottery tickets. Each one has a small chance of paying off in the form of a ‘significant’ result that we can spin a story around and sell to the media. The more tickets you buy, the more likely you are to win. We didn’t know exactly what would pan out—the headline could have been that chocolate improves sleep or lowers blood pressure—but we knew our chances of getting at least one ‘statistically significant’ result were pretty good.
And what does ‘statistically significant’ mean?
Basically that some result has a small p value. The letter p seems to have totemic power, but it’s just a way to gauge the signal-to-noise ratio in the data. The conventional cutoff for being “significant” is 0.05, which means that there is just a 5 percent chance that your result is a random fluctuation. The more lottery tickets, the better your chances of getting a false positive. So how many tickets do you need to buy?
P(winning) = 1 – (1 – p)n
With our 18 measurements, we had a 60% chance of getting some“significant” result with p < 0.05. (The measurements weren’t independent, so it could be even higher.) The game was stacked in our favor.
This is called ‘p-hacking’, basically adapting your experiment and data to push p under 0.05.
But p-hacking aside, this study was also doomed to fail (or ‘succeed’, as it happened) by the tiny number of subjects used, which amplifies the effects of uncontrolled factors.
Just to take one example: A woman’s weight can fluctuate as much as 5 pounds over the course of her menstrual cycle, far greater than the weight difference between our chocolate and low-carb groups. Which is why you need to use a large number of people, and balance age and gender across treatment groups.
Something that Bohannon explicitly states did not happen in this study.
Luckily, scientists are getting wise to these issues. Some journals are trying to phase out p value significance testing altogether and almost no one takes studies with fewer than 30 subjects seriously; editors will reject them out of hand before sending them to peer reviewers. But as this case has shown, that is not always the case; there are still plenty of journals that care more about money than reputation and, sadly, still plenty of people who want to believe that running is bad for you and eating chocolate makes you thinner.