Today 23andMe released some statistical analysis I've done on their Parkinson's data, although I'm just the stats nerd on the project -- much more credit goes to the unbelievable organizational effort by many people at 23andMe as well as other organizations like the Michael J. Fox Foundation, and the 10,000(!) Parkinson's patients who provided their data.
On the one hand, I like helping out with this research because, in contrast to my research on sex or Shakespeare, it has the potential to save lives. On the other hand, it's the most high-pressure work I've ever done, because if I mess it up, people may actually die. So I literally did all my analysis twice -- completely rewrote the code -- because I was scared. At least I'm confident it's correct now.
But, in general, I am frightened by my fallibility. Each analysis I do relies on hundreds or thousands of lines of code, and if I do one analysis a month, it seems arrogant to the point of self-delusion to think that I will never in my career write a line that contains a serious error. And conceptual errors are even harder to spot than coding ones. So I see mistakes in published work in fields from economics to computational biology, and it's hard for me to think that these mistakes don't contribute to the low reproducibility of results even in cases, like cancer research, where people really do die if you get stuff wrong.
There is a more positive way to put this. Over the summer I was at a Coursera recruiting event where Andrew Ng, one of the founders, addressed a crowd of potential employees:
"100,000 people might do a Coursera assignment. If it takes each of them 4 hours on average, and you do a bad job, you've wasted 91 years of human life, so you've basically killed someone--"
"What Andrew's trying to say," Daphne Koller, the other co-founder, burst in, "is that working here gives you huge power to affect people's lives for the better."
If people die if you get stuff wrong, it means they live if you get stuff right. Refusing to wield this power because you fear the responsibility is not really an option. And some organizations -- the airline industry, say -- really have mastered the art of (pretty much) never getting things wrong. But scientists and statisticians clearly haven't, so I'd welcome any tricks you have for making work/code reliable and reproducible. Write me a comment below (you don't have to be a scientist or statistician!) or shoot me an email at emmap1 at alumni dot stanford dot edu.
And apologies for the doom and gloom post -- we'll get back to love and sex next week, I promise.