Obsession with Regression: March 2016

I have a dream that one day every child will take a class which will teach them to recognize statistical crimes. It would replace another high school math or science class, like calculus, trigonometry, geometry, or Newtonian physics, because these are totally useless for 90% of the population. (I was a physics major. I’m allowed to say these things.) Statistics is not like that. Send a child into the world unable to recognize statistical crimes and you are preparing them to be perpetually lied to -- by politicians pushing agendas, journalists facing tight deadlines, and scientists trying to get published.

This class would not be a math class. I don’t care if kids understand how to do a chi-2 test. I just want to make them very paranoid. It would be like that the scene in Harry Potter where the students are taught “constant vigilance” against the Dark Arts.

“[The teacher] gave a harsh laugh, and then clapped his gnarled hands together. ‘The sooner you know what you’re up against, the better. How are you supposed to defend yourself against something you’ve never seen?’ ”

And then instead of torturing a spider (seriously, who hired that guy? Don’t wizards have any teaching standards?) you could enumerate a bunch of statistical crimes. Which, to reinforce the fact that this class is necessary, I’m now going to do. I spent a month annotating every single article I read that discussed data for a popular audience (sample titles: “White Female Republicans are the Angriest Republicans”, “Study: More Useless Liberal Arts Majors Could Destroy ISIS”, “The Reproductive Rights Rollback of 2015”). In total I annotated 49 articles; you can see my annotations here and a note on my methodology here [1].

These are my overall impressions. They are not statistical; they’re a qualitative summary. Throughout I use “article” to refer to the general-interest publication and “study” to refer to the original scientific work it describes.

Sites which specialized in statistical writing, like the NYT’s Upshot and FiveThirtyEight, wrote about data more reliably.
Almost all the articles had something I could push back on. Most frequently, I had questions the original article didn’t answer or caveats it didn’t mention. This isn’t necessarily the journalist’s fault: most general-interest articles are shorter than the studies they describe, and so details get lost. But I also found a third of the articles were substantially misleading. (I’m not labeling those articles in the spreadsheet since I don’t want to be mean and the cutoff is somewhat arbitrary: maybe you could argue I’m an overly anal statistician and the actual fraction is a fourth or a fifth.) So if you want to know what a study says, reading a general-interest article about the study is not a reliable way to figure it out unless you really trust the journalist or outlet -- you have to at least glance through the study. General-interest articles often misdescribe studies, presenting correlational studies as causal, or presenting theoretical models as though they actually analyzed data. You don’t always have time to skim the original study, but I think you should before you repost it on Facebook or Twitter.
Article titles are particularly likely to mislead. Outlets have incentives to use clickbait titles, the title is often not written by the author of the article, and it’s hard to summarize a complex topic in a dozen words. Please do not repost something after only reading the title.
Be particularly suspicious of results which are politically charged or published in politically biased outlets (Jezebel, Breitbart), especially if the article substantiates the outlet’s worldview. (Also be suspicious of results which substantiate your worldview -- if you’re like me, you’re less inclined to question them.)
Here are some questions to ask. If an article says, “A new study shows that X” your first question should be: how? Was it an experiment? A survey? A meta-analysis? A theoretical model? Sometimes this will be pretty obvious. If an article says, “Study shows that ⅔ of Americans prefer chocolate to vanilla”, the scientists probably ran a survey. But if an article says, “Study shows that increasing the minimum wage increases unemployment” -- it makes a huge difference whether the authors found a new natural experiment or did a meta-analysis of the past literature or are a bunch of undergrads who wrote up a theoretical model after passing Econ 101.

Once you understand how the study was conducted, push back on the study itself. If they claim to have “controlled for other factors” -- controlling for other factors is really hard. If they ran a survey -- was the population actually representative? Could non-response bias explain their results? In general: are the effects large enough to actually matter? Are their results statistically significant? Did they look at a hundred different things and only report the one which they liked? Are the numbers they are reporting the ones we care about, and are they properly contextualized? Try to think of other explanations for their data besides the one they favor. Be creative and obnoxious. You can find examples of how I think about articles in my annotations.

I close on a gentler note. The fact that you can make statistical arguments against an article does not mean that the author is incompetent or ill-intentioned or that the article is bad. All work has caveats -- certainly you can argue with all my blog posts -- and that’s fine as long as they’re clear. But some caveats are subtle and not clearly acknowledged (or deliberately hidden) which is why we need to teach our children to defend themselves. Avada Kedavra!

While I was working on this project, the New York Times and the Wall Street Journal both published op-eds arguing we should teach statistics. I dream of a world where statistical literacy is so common that statistical errors, like spelling errors, make it impossible to be taken seriously; where publications that use only anecdotes get demands for data. It would be a world where we paid attention to gun violence not because of mass shootings, but because of the far larger numbers of people who are shot and go unnoticed every day; where terrorists could no longer sow fear by killing a far smaller fraction of the population than die annually from heart disease; where we donated to charities that saved lives as opposed to making us feel good; where we conducted randomized controlled trials to test which government programs worked best. I truly believe that millions of people would lead better lives if everyone understood and applied basic statistical reasoning. That’s just not true of trigonometry. Let’s teach statistics instead.

Notes:

[1] When I first wrote this piece, it was 6:10 AM and I just couldn’t take it anymore and I ranted about three articles which I thought were bad. After I calmed down I decided that was both mean and unpersuasive, so I did a more systematic annotation. My reading material skews towards the New York Times, so to get a more representative sample I annotated not just the articles I would read naturally: I also went back and read statistical articles in other widely read publications like Gawker, Buzzfeed, and Breitbart (I Googled “new study” + publication name). (I was doing this quickly, so if you think I’ve been unfair or misunderstood an article, my apologies -- let me know and I’ll fix the spreadsheet.)

Obsession with Regression

Monday, March 28, 2016

Protecting Yourself Against Statistical Crimes