Saturday, May 23, 2015

Three Hundred Thousand Stories

Content warning: sexual assault, mass shooting, profanity.

One year ago today, a 22-year-old named Elliot Rodger killed six people in an effort to “destroy all women because I can never have them. I will make them suffer for rejecting me. I will arm myself with deadly weapons and wage a war against all women and the men they are attracted to. And I will slaughter them like the animals they are”.

I do not want to write about Rodger. He deserves to be forgotten. I want to write about what happened afterwards, which is that Twitter exploded with the hashtag #YesAllWomen: thousands of people protesting sexual violence and sexism more broadly. Over the next month, I collected more than 1.13 million posts, more than 300,000 of them unique. I have studied many Twitter discussions, but I think this was the most extraordinary, an event unprecedented in history. Rape so often ends in silence: from Alaska to South Africa, college campuses to military barracks, the majority of sexual assaults go unreported, and the vast majority of assaulters walk free.

407 people told stories of being raped in my dataset. As I counted their stories and marveled at their courage I found myself wanting to reach out, like Darth Vader, and choke the life out of those who had hurt them. They were painting a picture that transcended any single tweet: who cared enough about sexual violence to speak up about it, how men and women had spoken differently, how the movement had evolved, who had tried to attack it. Because statistics can never replace individual stories, I also spoke to the most prolific tweeters, who ranged from scientists to sexual assault survivors, dominatrices to grandmothers. This is what they told me.


Three quarters of the tweeters whose gender I could identify from first name were women [1]; they were 10 times as likely as the average tweeter to describe themselves as feminists, and also more likely to describe themselves using words like “gender”, “equality”, “atheist”, “literature”, “violence”, “queer”, and “vegetarian”. To understand how their backgrounds influenced what they cared about, I made a visualization connecting tweets to the words tweeters tended to use to describe themselves. Here’s a screenshot; tweets are circles:
Screen Shot 2015-05-23 at 1.53.01 PM.png
You can see the interactive version and read the tweets here. It shows atheists retweeting "#YesAllWomen are treated like property in the Bible... but extremely valuable property! Like, worth as much as dozens of cows!" It shows men saying that they can do better. It shows gamers and video fans discussing gender inequality in online videos: "When a woman makes a video, most comments are about tearing apart her looks ... with a man, almost none."

But to me the most moving thing about this picture is its lack of pattern. Maybe you can see systematic differences in what yogis and daughters and actresses tweet, but as a statistician trained to find differences in data, I see instead a universal rage: all women face violence and misogyny. Perhaps their only common element is their courage: as their Twitter profiles attest, they are female, fabulous, and fierce.

Of course, there’s a lot more to someone than their Twitter profile reveals. So I wrote to the most prolific tweeters to learn more. Some of them had tweeted thousands of times, and had faced so much harassment for doing so that they were reluctant to talk to me. One, a former dominatrix, told me that need for money had driven her down dangerous paths, teaching her that “a group of men in the world and even some women expect there to be an underclass of people whose lives and bodily integrity just matter less than people's desire for entertainment.”

A second tweeter was a grandmother (never let it be said that they don’t use Twitter), who told me she got “granny-angry”. A survivor of sexual assault and domestic abuse who worked for two decades as a victims’ advocate, she spent a summer posting obsessively -- “I was determined the damned misogynists weren't going to hijack the tag” -- building support with the other frequent tweeters even as they were threatened and, occasionally, publicly identified online.

A third woman, a scientist, got angry at the “flood of trolls” saying “Rape claims are inflated” or “Be grateful you don’t have REAL problems.”

“I was so sick of seeing ‘cite your sources’ from men insulting women who had been raped...that I decided I’d cite ALL the sources,” she said. So she began tweeting out the conclusions of dozens of academic papers on assault.

“A man told me ‘shut up, no one cares,’ ” she said.  “I said ‘Obviously you do, or you wouldn’t be trolling this tag.’ In response, someone else told me to die and threatened to rape me. I reported it as a violent threat, only to have Twitter respond—two days later—that it wasn’t violating their terms of service.”

Interestingly, in spite of these attacks, the vast majority of tweets containing #YesAllWomen supported the hashtag, and most posters saw their number of Twitter followers increase. Still, I wanted to understand who, exactly, attacks a woman discussing her own rape, so I identified widely shared anti-#YesAllWomen tweets and studied the people who shared them. Like the pro-#YesAllWomen tweeters, most of these anti-#YesAllWomen tweeters had more followers a month later, probably indicating their different social circles. I contacted several of the attackers to determine their motivation, but they were not terribly interested in having a serious conversation with a woman studying a feminist tweet, and several preferred to harass me instead. So I resorted instead to my other strategy for getting to know people: I downloaded 1.5 million of their tweets. While I was tempted to homogenize the attackers, I found diverse clusters. Some were high profile, including a mixed-martial arts champion, a conservative commentator and comedian, and a Peabody Award-winning actor; the less high-profile tweeters included political conservatives and wrestling fans. One of the most prolific, who tweeted more than 1,700 times(!) in my data, highlights their diversity: she is, to quote her Twitter profile, “into Wrestling, Catfighting And Sexfighting”, “A Domme Who Conquors Her Slaves” and “Gay But...Do Not Hate Men!”. Ironically, she was upset about the same thing many pro-#YesAllWomen tweeters were: what she saw as double standards. For example, an attractive criminal, like Jeremy Meeks, got women asking to be raped, while a less attractive criminal would not; women who beat up men were powerful, but men who beat up women were abusers.

Because men and women often have different perspectives on sexual assault, I also spoke to men who tweeted prolifically. Two said they were motivated in part by the experiences of women they cared about.

“I've had close friends come home crying from frat parties after a ‘rodeo’ gang rape,” one said. “I've worked jobs where I've seen women slapped on the ass, pinched, whistled at, preyed upon and cowed by management.  I have experienced fear reactions and trust barriers from women...once we're better friends they tell me about the times that they've been attacked.” A second male tweeter said he had been influenced growing up by his mother’s stories of being sexually abused. Both said that other men had questioned their masculinity for defending #YesAllWomen -- “calling me a ‘pussy’, asking me if I ‘had a vagina’ ” -- though one noted he got nowhere near as much abuse as the women.

Then I looked for broader gender differences. Looking at things retweeted mostly by men or women doesn’t reflect too well on the men. The most male-dominated tweets include charming observations like “#YesAllWomen should hit the weight room so they can avoid getting raped wit they weak asses” or “How many Women does it take to change a light bulb? 11, 10 to form a support group, and one to get her boyfriend to do it” [2]. But these tweets aren’t a fair summary of what most men think; rather, they reveal the things that women don’t think. In fact, four of the five most common tweets were the same for men and women.

It’s fairer to compare all tweets from men and women. For example, women use more words expressing fear and use more first-person language. Men use more profanity, and they also use it differently. When men say “dick”, it’s most frequently to retweet, “#YesAllWomen should suck dick on the first date”; for women, it’s “I'm not a tease for letting you buy me a drink and not going home with you. You offered me a drink, not your dick on a plate”. We can also find differences in how people treated those of the same versus the opposite gender. Men used more words expressing emotion, especially negative emotion, when talking to women. Women fired back: more than two thirds of tweets from women containing “part of the problem” were at men.

As I worked on this analysis, the laptop on which I stored the data became a grim weight; walking back to my room late each night, I shied away from the men I passed. But the weight also reminded me I did not walk alone: #YesAllWomen had revealed a vast network of people who shared common fears [3].

One of the tweeters said it best. A survivor of sexual assault who suffers from a severe anxiety disorder, she said she considered not speaking up because of backlash. I asked her why she did anyway.

Things like this help remind everyone who's had to face these types of oppression... that they are not alone,” she told me. “If my putting my neck on the line means that even one other person who's gone through these things feel just a little more supported, a little more safe, a little more like they can get up in the morning and feel okay with existing and that their life might just work out -- so be it.”

Thanks to the members of the #YesAllWomen movement for trusting me with their experiences, and to everyone who speaks out about sexism and sexual violence.


[1] Throughout, when analyzing gender differences, I use tweeter first name to identify gender, a method that works about half the time. It is important to remember that the other half may exhibit different statistical properties. Shoot me an email if you have thoughts on this.
[2] And oh, the many #YesAllWomen lightbulb jokes. A sampling:
How does a feminist change a lightbulb? She just holds on and the world revolves around her.”
“How many feminists does it take to change a light bulb? None, they can't change anything.”
“How many feminists does it take to screw in a light bulb? Nevermind that, the word screw promotes rape culture.”
The feminists countered with some of their own:
“How many men does it take to change a lightbulb? None because they get mad after it won't screw”
“How many ‘nice guys’ does it take to change a light bulb? No one knows, they keep blaming it for not getting turned on.”
“How many men's rights activists does it take to change a lightbulb? Well, not all of them." #NotAllMen #Butprobablyalotofthem”
[3] Indeed, I found some evidence that the movement gained strength as tweeters combined multiple ideas to make more powerful tweets. In several cases, there was an early tweet about how women had to behave and then a later, more widely shared tweet contrasting women’s experience with men’s.
Early Tweet
Later Tweet
Because you get a "rape whistle" when you start college #YesAllWomen
Because when girls go to college they're buying pepper spray and rape whistles while guys are buying condoms #YesAllWomen
The constant worry your skirt or dress is too short or maybe your bra strap is showing and you'll attract the "wrong attention" #YesAllWomen
a ‘cool story babe, now make me a sandwich’ shirt doesn't break the school dress code. a girl's bra strap does #YesAllWomen

The marketplace of ideas has become, perhaps, the fusion reactor of ideas, where memes combine with explosive power and supporters of a social movement draw strength from each other through digital synapses. But it is important to note that establishing causality is difficult -- how do we know if the second tweeter saw the first tweet? Still, the pattern is repeated and the explanation is plausible.

Monday, May 18, 2015

Five Tips on Getting Non-Fiction Pieces Published As A Young Writer

This piece benefited from contributions from Julian Baird Gewirtz, a Rhodes Scholar and doctoral candidate at Oxford whose work has appeared in The New Yorker, the Economist, the Wall Street Journal, Foreign Policy, and the Atlantic, among others.

I often chat with friends who are interested in publishing an op-ed or non-fiction essay. I have made many mistakes trying to do this which hopefully you can avoid; here are some tips. Take this with a grain of salt; it’s just one approach, and others may work better.

1. Maximize the odds that someone actually reads your piece. Here is a list of people ranked by how likely they are to respond to you:

  1. an editor with whom you’ve previously published.
  2. an editor with whom you have some other connection (someone you’ve met, a third-party intro, be creative).
  3. an writer / editor you’ve found online. This is often how I pitch something (do not feel like you need a personal connection to an editor to send them writing!) Here’s how I find an editor: after identifying a publication, I look through its articles to find one that’s similar to what I’m trying to publish (if you can’t find an article like that, you might want to find another publication). Hopefully the author is an editor with an email address I can find; writers are okay too. If this doesn’t work, you could try checking the publication’s masthead. I try not to email anyone with a job title that sounds too important.
  4. the general submission email for the publication. I have never had success emailing “opinions@publicationX” or “tips@publicationX”. Greg Smith, who published this widely read article, sent it to the NYT general op-ed address, had no luck, and emailed it to four editors, who responded. I’m not saying don’t ever email a general submissions email, but I haven’t had luck doing this, and I would try a real person first if you can.

Do not submit a piece to multiple editors simultaneously. I know, it’s tempting to save time, but it’s considered a faux pas by many editors and if they find out that you’re doing it, you risk pissing them off.

Okay, so you’ve found an email address. Keep your initial email short. Here is a recent representative pitch I sent to an editor.

Dear X,

I am a computer scientist who has written previously for the New York Times, FiveThirtyEight, and Quartz. I have a piece which I hoped might be of interest to the Atlantic: it's about what analyzing email data can tell you about love. The piece is attached and also pasted below.

I hope your weekend has gone well, and thanks very much for your time.
Emma Pierson

I modify how I describe myself depending on who I’m writing to (this email was to a computer scientist). Shiny credentials are probably helpful but don’t go crazy with them. I often use an email address that does not reveal my gender. I always attach the full piece (and also paste it into the email).

Some people also pitch short descriptions of a potential piece without having written it; I don’t have much experience with that, but Julian does. He adds: “If you have a specific idea for a piece but are worried that it will require more research or time than you can realistically spare without knowing that an editor is interested, it’s very common to send a (slightly more detailed) pitch to an editor before actually writing the piece. I’ve never had a piece accepted only from a pitch, but I have had editors express interest, which motivated me to write something that they subsequently accepted... and, in other cases, something they subsequently rejected. Of course, you have to be sure that the piece is viable, and that you’ll be able to get the access you need. (Two exceptions to this are book reviews, where the publication will get the advance copy of the book on your behalf, and interviews, where the publication’s brand will often be essential to getting the interview.) I’d recommend only doing this for pieces that you are pitching for print, to weeklies, monthlies, and the like. In my limited experience, online editors usually prefer just to read the piece and accept or reject it based on that; they’re not working on lengthy timelines.”

2. Ways to break in. Okay, but how do you get started if you’ve never published anything before? Mostly, aim high and pitch a lot. Getting rejected costs you nothing; getting published gains you a lot. So, ignore your ego. In many academic fields, people first send their papers to the most competitive journals, expecting them to get rejected; then they move down. I would view publishing articles this way.  Caveat: do not spam editors (I don’t think I’ve ever sent an editor who rejected me a second piece within a month.) Also, obviously, don’t send out shoddy work. I don’t send anything out for publication unless at least one person whose opinion I respect has read it.

One way to ignore your ego: do not take a rejection as a reflection on the quality of the piece. There is so much bad stuff that is so widely read! Nothing you write can be worse than that. If I have something I think is good, I send it to at least two or three editors before giving up. Also, if I don’t hear back from an editor, I send a follow-up a week later; I try to make it sound like it was my fault they didn’t respond. All of my most widely read pieces have gotten rejected several times before they were published. Learn to love rejection (this also describes my dating strategy).

Also, consider starting a blog. This offers a few benefits:

  1. Gives you a compiled collection of work which you can send to publications that are looking for freelancers (this is how I got started at FiveThirtyEight)
  2. Posts can get picked up by other publications. When you write a blog post, you can post it other places (I use Reddit, Twitter, Hacker News, and Facebook).
  3. Gives you a place to put pieces that no one else will publish (which makes me more willing to write risky or personal pieces).

3. Be sensitive to time-sensitivity. Some pieces are “evergreen” -- they are about perennially interesting topics and will not get less publishable with time. But no one wants to read about a Twitter hashtag that spiked a month ago unless you can somehow make it fresh. Unfortunately, the timescale required for careful writing or analysis does not fit well with the news cycle, and I think you want to avoid turning out something quick and shoddy. Even if you could write a piece in zero time, if you’re trying to get it published with editors you’ve never made contact with, it might take weeks to get someone to pay attention, which is too long for news events. A few potential solutions:

  1. write evergreen pieces.
  2. if you have something inherently time-sensitive, send it to places that are more likely to publish it quickly -- editors you’ve previously made contact with, less competitive publications. I have never managed to get a time-sensitive piece published with an editor I’ve never contacted before.
  3. anticipate events. For example, I wrote a piece about the Ferguson grand jury decision about two days after it was announced. Because I knew the decision was coming, I did most of the analysis beforehand and sent the piece to an editor I already knew before the news broke.

Trying to publish about breaking news events is high-risk (you might not get published at all) but high-reward (because it might get widely shared).

A final note: regardless of whether your piece is evergreen, you can often make it more appealing by adding a hook that references a recent event.

4. Don’t be a prima donna. Keep in mind that editors are older than you, have all the power, and may be talking about you and rejecting your work for the next several decades; don’t be arrogant with your credentials or opinion of your writing. Example of what not to do: I was once working with an editor on a piece right after I had gotten my wisdom teeth taken out. Still a little sleepy from anesthesia, I read the editor’s proposed changes and shot her an email saying, “no, my friend at the New York Times thinks we should do it this way…” The editor did not work at the New York Times, and this did not go over well. Don’t do stuff like that. Also probably don’t email editors while high on surgical drugs.

In general, I am especially deferential when dealing with editors. I happily accept whatever money they do (or usually don’t) offer. I do not complain when they choose a title I dislike. When they are unquestionably misstating statistical results, I say “I think this might be slightly misleading”. When they reject my pieces, I thank them for their time, and when they publish them, I thank them about eight times. When I am actively working with an editor on a piece, I drop almost everything else to work on it. Etc.

5. Write something only you can write. My way to do this is to use data that no one else has. Find the way that works for you. If what you’re writing could be written by any smart person with access to Google, it’s less likely to be published. (Take this piece on income inequality by Nick Kristof, for example. I like this piece and I am glad someone wrote it. But there’s no need for me to attempt to write it because Nick Kristof can do a better job of it than I can and more people want to listen to him anyway.) Other people I know who have gotten published while in school have often leveraged their own special knowledge: for example, Sam Sussman has used his Israel-Palestine expertise, Julian has used his knowledge of China, and Seth Stephens-Davidowitz has used Google data.

If you have further questions, disagreements, or tips of your own, shoot me an email so I can update this post if necessary!

Thursday, May 14, 2015

We Wrote a Paper!

This paper describes joint work with Alexis Battle, Sara Mostafavi, and Daphne Koller. Thanks as well to the GTEx Consortium for data.

In spite of what this blog might imply, I do not spend all my time studying sex and stalking people on Twitter. From nine to five I apply statistics to biological datasets because a) there is an enormous amount of new biological data with exciting implications for how we treat disease and b) there is no way to make sense of it without using a computer. This week we published a paper on which I began work as an undergrad, and on the theory that one should never do work that cannot be explained to the intelligent (and attractive) non-specialist which I imagine you are, I am going to describe it here. This post is a little longer than most of my posts, mostly because it has lots of pictures, but I have faith in you.

We begin with a mystery: cells throughout your body have (essentially) the same DNA, and yet do very different things. How? One sentence recap of ninth grade bio:

Genes (blueprints for proteins) are used to produce RNA (an intermediate molecule) which is used to produce proteins (tiny molecular machines that do most of the work in a cell)

So by altering how much RNA you make from a gene, you can control the amount of protein, and thus the functions of a cell. We call this RNA data “gene expression data”, and recently a large group of scientists known as the GTEx Consortium produced gene expression data for more than a thousand samples from dozens of different human tissues. This data allows you to ask cool questions like “what genes are particularly highly expressed in the liver?” which gives you a hint about which genes are important to the liver’s functions.   

Another thing we might expect to differ between tissues is how gene expression levels are correlated with each other. (Genes A and B are positively correlated if, when A is expressed at a high level, so is B.) We say that correlated genes are “co-expressed”, and we care because genes that are co-expressed often work together. So looking at the specific co-expression network for each tissue can tell us something about each tissue accomplishes its function. Below is an example of a co-expression network (source): genes are circles, co-expressed genes are linked by blue lines, and you can see that genes with common functions often cluster together .

Screen Shot 2015-05-13 at 11.21.56 AM.png

In our paper, we study co-expression networks in 35 tissues throughout the human body, and we do three things:

  1. We come up with a new method for inferring the networks more accurately. I named this method “GNAT” (Gene Network Analysis Tool), for my boyfriend Nat, because I am a computer scientist and this is the sort of thing we think is romantic.
  2. We statistically study the networks to find biological principles.
  3. We create a web tool so other people can study our networks as well.

Inferring the networks is hard because you have tens of thousands of genes and you are considering links between all possible pairs of genes. This means that with ten thousand genes you have about fifty million possible links. This raises two issues.

1. The sheer size of the mathematical objects involved can crash your computer or take way too long to deal with. GNAT uses various mathematical tricks to deal with this.
2. You don’t have enough data. For some tissues, you might have samples from only a dozen people, which is not enough to give you very good estimates of fifty million links. GNAT improves the estimates by sharing information across tissues: while the networks in your liver and brain are different, they probably also have some similarities, and so we can use the brain’s network to learn the liver’s more accurately. We make a tree that groups similar tissues together, and we encourage tissues close together in the tree to have similar networks. We show that this increases accuracy. While we use our method on human tissues, you could also use it on any group of datasets related by a tree: different species, evolving cancer cells, or even non-biological correlation datasets. Here is the tree we used:

Having learned the networks, we analyze them to try to find principles that guide how the body works. We look at two types of genes: transcription factors, which are master regulators that control lots of other genes, and genes which are known to have tissue-specific functions (for example, immune-related genes in the blood). If your genes were the Mafia, transcription factors would be the kingpins and tissue-specific genes would be the street-level enforcers (that’s how I thought about it, anyway). We find that the kingpins tend to be connected to the enforcers, and enforcers connected to kingpins are expressed at higher levels. Kingpins tend to lie at the centers of networks, while enforcers tend to lie at the peripheries. All this paints a coherent statistical picture of how tissues accomplish their specific functions: tissue-specific kingpins control and upregulate tissue-specific enforcers.

We also find lots of groups of interconnected genes that may work together. Groups that are highly expressed in a particular tissue often have tissue-specific functions: for example, neural firing in the brain and muscle function in the heart. Groups that persist across tissues tend to have functions common to many types of cells, like cell division.

On the one hand, it is deeply cool to find mathematical evidence of deep biological principles. While working on this paper I went for a walk around Stanford’s Dish and on one particularly steep ascent (photo cred to Shengwu Li)

Screen Shot 2015-05-13 at 11.35.48 AM.png

I could feel my lungs and muscles working and I thought: I have seen the clusters of genes that let my lungs bring me this air and my blood fight its pathogens and my muscles use its oxygen. I’m not a religious person, so moments like this are about as close as I get to spirituality. At the same time, it is hard to verify the links in our networks: because our analysis is based on correlations, we need more targeted biological techniques to firmly establish causality. We are also peering into a deeply alien world about which we have relatively little data. I enjoy working with social science data because I have an intuition for what the confounds and interesting questions are; high-dimensional biological data remains huge and unintuitive to me. Still, it’s pretty cool that we’re kept alive every second by processes so far beyond our understanding -- stare in reverence at your palms, because that skin conceals a million microscopic mysteries.

The last thing we did was make a web tool so other people could use our networks for their own discoveries. The story of making this tool illustrates how research often works for me, so I will tell it because I think it’s useful to not just present final products. On the left is the original version, which I wrote; on the right is the pretty and much faster version, which my coauthors wisely hired a professional web programmer to revamp. (Definitely play with the latter if you’re actually curious.)

Screen Shot 2015-05-14 at 8.35.34 AM.png

I liked my coauthors’ idea to build a website but had never made one. So I didn’t pursue it for a while, since the website would have to A) process very large genetic networks quickly and B) make complex graphics from them:

and I didn’t know how to do either. But that summer I was working at Coursera, and one day my boss showed me Flask, which is a tool that allows you to accomplish A. Then I went out to dinner with Sophia Westwood and Sarah Sterman, two computer science master's students, and Sophia showed us a network visualization she had built using a tool called d3. She was visualizing computer programs, not genes, but the similarity was there, and that showed me how to do part B.

The moral of this story, which I have learned repeatedly, is that hanging out with smart people is always useful because it gives you new ideas in unexpected directions. Relatedly, I learned a ton from working on this paper, mostly from my coauthors (two of whom became professors while we were writing it). The contrast in scale with my social science projects is striking: those usually take weeks, and this took years. I am still trying to decide what combination of blog posts, general audience pieces, and academic publications most efficiently gets new information to the people who will benefit from it.