Obsession with Regression

Monday, May 18, 2015

Five Tips on Getting Non-Fiction Pieces Published As A Young Writer

This piece benefited from contributions from Julian Baird Gewirtz, a Rhodes Scholar and doctoral candidate at Oxford whose work has appeared in The New Yorker, the Economist, the Wall Street Journal, Foreign Policy, and the Atlantic, among others.

I often chat with friends who are interested in publishing an op-ed or non-fiction essay. I have made many mistakes trying to do this which hopefully you can avoid; here are some tips. Take this with a grain of salt; it’s just one approach, and others may work better.

1. Maximize the odds that someone actually reads your piece. Here is a list of people ranked by how likely they are to respond to you:

an editor with whom you’ve previously published.
an editor with whom you have some other connection (someone you’ve met, a third-party intro, be creative).
an writer / editor you’ve found online. This is often how I pitch something (do not feel like you need a personal connection to an editor to send them writing!) Here’s how I find an editor: after identifying a publication, I look through its articles to find one that’s similar to what I’m trying to publish (if you can’t find an article like that, you might want to find another publication). Hopefully the author is an editor with an email address I can find; writers are okay too. If this doesn’t work, you could try checking the publication’s masthead. I try not to email anyone with a job title that sounds too important.
the general submission email for the publication. I have never had success emailing “opinions@publicationX” or “tips@publicationX”. Greg Smith, who published this widely read article, sent it to the NYT general op-ed address, had no luck, and emailed it to four editors, who responded. I’m not saying don’t ever email a general submissions email, but I haven’t had luck doing this, and I would try a real person first if you can.

Do not submit a piece to multiple editors simultaneously. I know, it’s tempting to save time, but it’s considered a faux pas by many editors and if they find out that you’re doing it, you risk pissing them off.

Okay, so you’ve found an email address. Keep your initial email short. Here is a recent representative pitch I sent to an editor.

“Dear X,

I am a computer scientist who has written previously for the New York Times, FiveThirtyEight, and Quartz. I have a piece which I hoped might be of interest to the Atlantic: it's about what analyzing email data can tell you about love. The piece is attached and also pasted below.

I hope your weekend has gone well, and thanks very much for your time.

Emma Pierson

http://cs.stanford.edu/~emmap1/”

I modify how I describe myself depending on who I’m writing to (this email was to a computer scientist). Shiny credentials are probably helpful but don’t go crazy with them. I often use an email address that does not reveal my gender. I always attach the full piece (and also paste it into the email).

Some people also pitch short descriptions of a potential piece without having written it; I don’t have much experience with that, but Julian does. He adds: “If you have a specific idea for a piece but are worried that it will require more research or time than you can realistically spare without knowing that an editor is interested, it’s very common to send a (slightly more detailed) pitch to an editor before actually writing the piece. I’ve never had a piece accepted only from a pitch, but I have had editors express interest, which motivated me to write something that they subsequently accepted... and, in other cases, something they subsequently rejected. Of course, you have to be sure that the piece is viable, and that you’ll be able to get the access you need. (Two exceptions to this are book reviews, where the publication will get the advance copy of the book on your behalf, and interviews, where the publication’s brand will often be essential to getting the interview.) I’d recommend only doing this for pieces that you are pitching for print, to weeklies, monthlies, and the like. In my limited experience, online editors usually prefer just to read the piece and accept or reject it based on that; they’re not working on lengthy timelines.”

2. Ways to break in. Okay, but how do you get started if you’ve never published anything before? Mostly, aim high and pitch a lot. Getting rejected costs you nothing; getting published gains you a lot. So, ignore your ego. In many academic fields, people first send their papers to the most competitive journals, expecting them to get rejected; then they move down. I would view publishing articles this way. Caveat: do not spam editors (I don’t think I’ve ever sent an editor who rejected me a second piece within a month.) Also, obviously, don’t send out shoddy work. I don’t send anything out for publication unless at least one person whose opinion I respect has read it.

One way to ignore your ego: do not take a rejection as a reflection on the quality of the piece. There is so much bad stuff that is so widely read! Nothing you write can be worse than that. If I have something I think is good, I send it to at least two or three editors before giving up. Also, if I don’t hear back from an editor, I send a follow-up a week later; I try to make it sound like it was my fault they didn’t respond. All of my most widely read pieces have gotten rejected several times before they were published. Learn to love rejection (this also describes my dating strategy).

Also, consider starting a blog. This offers a few benefits:

Gives you a compiled collection of work which you can send to publications that are looking for freelancers (this is how I got started at FiveThirtyEight)
Posts can get picked up by other publications. When you write a blog post, you can post it other places (I use Reddit, Twitter, Hacker News, and Facebook).
Gives you a place to put pieces that no one else will publish (which makes me more willing to write risky or personal pieces).

3. Be sensitive to time-sensitivity. Some pieces are “evergreen” -- they are about perennially interesting topics and will not get less publishable with time. But no one wants to read about a Twitter hashtag that spiked a month ago unless you can somehow make it fresh. Unfortunately, the timescale required for careful writing or analysis does not fit well with the news cycle, and I think you want to avoid turning out something quick and shoddy. Even if you could write a piece in zero time, if you’re trying to get it published with editors you’ve never made contact with, it might take weeks to get someone to pay attention, which is too long for news events. A few potential solutions:

write evergreen pieces.
if you have something inherently time-sensitive, send it to places that are more likely to publish it quickly -- editors you’ve previously made contact with, less competitive publications. I have never managed to get a time-sensitive piece published with an editor I’ve never contacted before.
anticipate events. For example, I wrote a piece about the Ferguson grand jury decision about two days after it was announced. Because I knew the decision was coming, I did most of the analysis beforehand and sent the piece to an editor I already knew before the news broke.

Trying to publish about breaking news events is high-risk (you might not get published at all) but high-reward (because it might get widely shared).

A final note: regardless of whether your piece is evergreen, you can often make it more appealing by adding a hook that references a recent event.

4. Don’t be a prima donna. Keep in mind that editors are older than you, have all the power, and may be talking about you and rejecting your work for the next several decades; don’t be arrogant with your credentials or opinion of your writing. Example of what not to do: I was once working with an editor on a piece right after I had gotten my wisdom teeth taken out. Still a little sleepy from anesthesia, I read the editor’s proposed changes and shot her an email saying, “no, my friend at the New York Times thinks we should do it this way…” The editor did not work at the New York Times, and this did not go over well. Don’t do stuff like that. Also probably don’t email editors while high on surgical drugs.

In general, I am especially deferential when dealing with editors. I happily accept whatever money they do (or usually don’t) offer. I do not complain when they choose a title I dislike. When they are unquestionably misstating statistical results, I say “I think this might be slightly misleading”. When they reject my pieces, I thank them for their time, and when they publish them, I thank them about eight times. When I am actively working with an editor on a piece, I drop almost everything else to work on it. Etc.

5. Write something only you can write. My way to do this is to use data that no one else has. Find the way that works for you. If what you’re writing could be written by any smart person with access to Google, it’s less likely to be published. (Take this piece on income inequality by Nick Kristof, for example. I like this piece and I am glad someone wrote it. But there’s no need for me to attempt to write it because Nick Kristof can do a better job of it than I can and more people want to listen to him anyway.) Other people I know who have gotten published while in school have often leveraged their own special knowledge: for example, Sam Sussman has used his Israel-Palestine expertise, Julian has used his knowledge of China, and Seth Stephens-Davidowitz has used Google data.

If you have further questions, disagreements, or tips of your own, shoot me an email so I can update this post if necessary!

Thursday, May 14, 2015

We Wrote a Paper!

This paper describes joint work with Alexis Battle, Sara Mostafavi, and Daphne Koller. Thanks as well to the GTEx Consortium for data.

In spite of what this blog might imply, I do not spend all my time studying sex and stalking people on Twitter. From nine to five I apply statistics to biological datasets because a) there is an enormous amount of new biological data with exciting implications for how we treat disease and b) there is no way to make sense of it without using a computer. This week we published a paper on which I began work as an undergrad, and on the theory that one should never do work that cannot be explained to the intelligent (and attractive) non-specialist which I imagine you are, I am going to describe it here. This post is a little longer than most of my posts, mostly because it has lots of pictures, but I have faith in you.

We begin with a mystery: cells throughout your body have (essentially) the same DNA, and yet do very different things. How? One sentence recap of ninth grade bio:

Genes (blueprints for proteins) are used to produce RNA (an intermediate molecule) which is used to produce proteins (tiny molecular machines that do most of the work in a cell)

So by altering how much RNA you make from a gene, you can control the amount of protein, and thus the functions of a cell. We call this RNA data “gene expression data”, and recently a large group of scientists known as the GTEx Consortium produced gene expression data for more than a thousand samples from dozens of different human tissues. This data allows you to ask cool questions like “what genes are particularly highly expressed in the liver?” which gives you a hint about which genes are important to the liver’s functions.

Another thing we might expect to differ between tissues is how gene expression levels are correlated with each other. (Genes A and B are positively correlated if, when A is expressed at a high level, so is B.) We say that correlated genes are “co-expressed”, and we care because genes that are co-expressed often work together. So looking at the specific co-expression network for each tissue can tell us something about each tissue accomplishes its function. Below is an example of a co-expression network (source): genes are circles, co-expressed genes are linked by blue lines, and you can see that genes with common functions often cluster together .

Screen Shot 2015-05-13 at 11.21.56 AM.png

In our paper, we study co-expression networks in 35 tissues throughout the human body, and we do three things:

We come up with a new method for inferring the networks more accurately. I named this method “GNAT” (Gene Network Analysis Tool), for my boyfriend Nat, because I am a computer scientist and this is the sort of thing we think is romantic.
We statistically study the networks to find biological principles.
We create a web tool so other people can study our networks as well.

Inferring the networks is hard because you have tens of thousands of genes and you are considering links between all possible pairs of genes. This means that with ten thousand genes you have about fifty million possible links. This raises two issues.

1. The sheer size of the mathematical objects involved can crash your computer or take way too long to deal with. GNAT uses various mathematical tricks to deal with this.

2. You don’t have enough data. For some tissues, you might have samples from only a dozen people, which is not enough to give you very good estimates of fifty million links. GNAT improves the estimates by sharing information across tissues: while the networks in your liver and brain are different, they probably also have some similarities, and so we can use the brain’s network to learn the liver’s more accurately. We make a tree that groups similar tissues together, and we encourage tissues close together in the tree to have similar networks. We show that this increases accuracy. While we use our method on human tissues, you could also use it on any group of datasets related by a tree: different species, evolving cancer cells, or even non-biological correlation datasets. Here is the tree we used:

Having learned the networks, we analyze them to try to find principles that guide how the body works. We look at two types of genes: transcription factors, which are master regulators that control lots of other genes, and genes which are known to have tissue-specific functions (for example, immune-related genes in the blood). If your genes were the Mafia, transcription factors would be the kingpins and tissue-specific genes would be the street-level enforcers (that’s how I thought about it, anyway). We find that the kingpins tend to be connected to the enforcers, and enforcers connected to kingpins are expressed at higher levels. Kingpins tend to lie at the centers of networks, while enforcers tend to lie at the peripheries. All this paints a coherent statistical picture of how tissues accomplish their specific functions: tissue-specific kingpins control and upregulate tissue-specific enforcers.

We also find lots of groups of interconnected genes that may work together. Groups that are highly expressed in a particular tissue often have tissue-specific functions: for example, neural firing in the brain and muscle function in the heart. Groups that persist across tissues tend to have functions common to many types of cells, like cell division.

On the one hand, it is deeply cool to find mathematical evidence of deep biological principles. While working on this paper I went for a walk around Stanford’s Dish and on one particularly steep ascent (photo cred to Shengwu Li)

Screen Shot 2015-05-13 at 11.35.48 AM.png

I could feel my lungs and muscles working and I thought: I have seen the clusters of genes that let my lungs bring me this air and my blood fight its pathogens and my muscles use its oxygen. I’m not a religious person, so moments like this are about as close as I get to spirituality. At the same time, it is hard to verify the links in our networks: because our analysis is based on correlations, we need more targeted biological techniques to firmly establish causality. We are also peering into a deeply alien world about which we have relatively little data. I enjoy working with social science data because I have an intuition for what the confounds and interesting questions are; high-dimensional biological data remains huge and unintuitive to me. Still, it’s pretty cool that we’re kept alive every second by processes so far beyond our understanding -- stare in reverence at your palms, because that skin conceals a million microscopic mysteries.

The last thing we did was make a web tool so other people could use our networks for their own discoveries. The story of making this tool illustrates how research often works for me, so I will tell it because I think it’s useful to not just present final products. On the left is the original version, which I wrote; on the right is the pretty and much faster version, which my coauthors wisely hired a professional web programmer to revamp. (Definitely play with the latter if you’re actually curious.)

Screen Shot 2015-05-14 at 8.35.34 AM.png

I liked my coauthors’ idea to build a website but had never made one. So I didn’t pursue it for a while, since the website would have to A) process very large genetic networks quickly and B) make complex graphics from them:

and I didn’t know how to do either. But that summer I was working at Coursera, and one day my boss showed me Flask, which is a tool that allows you to accomplish A. Then I went out to dinner with Sophia Westwood and Sarah Sterman, two computer science master's students, and Sophia showed us a network visualization she had built using a tool called d3. She was visualizing computer programs, not genes, but the similarity was there, and that showed me how to do part B.

The moral of this story, which I have learned repeatedly, is that hanging out with smart people is always useful because it gives you new ideas in unexpected directions. Relatedly, I learned a ton from working on this paper, mostly from my coauthors (two of whom became professors while we were writing it). The contrast in scale with my social science projects is striking: those usually take weeks, and this took years. I am still trying to decide what combination of blog posts, general audience pieces, and academic publications most efficiently gets new information to the people who will benefit from it.

Wednesday, May 6, 2015

Does Twitter Really Give Everyone a Voice?

My piece on this topic, which analyzes more than a dozen news events over the last year, was just published by Quartz. I think there are a lot of interesting subtleties related to quantifying influence online, some of which this analysis may not capture, so please write to me if you have thoughts.

Thursday, April 9, 2015

Whether You're Crazy Depends on Whether You're Female

Recently I wrote a tongue-in-cheek piece in which I statistically analyzed 5,500 emails exchanged with my boyfriend over the past four years. Thanks so much to the many friends who said nice things and shared the piece; it meant a lot to me, because the internet evidently has decided I’m undateable. Here’s a sample of responses:

“Don't be surprised if the BF dumps her.”

“That poor bastard.”

“Maybe he doesn't miss her because she's a lunatic.”

“I was originally going to suggest the author break up now and start collecting cats.”

“I'm guessing that this couple is very young and have a trivial understanding of love.”

“I'm thinking the guy should run, and fast. This is author is obsessive...”

I am, of course, obsessive -- note the title of this blog -- but about statistics, not my boyfriend. And so when I read these responses (after I walked into my boyfriend’s bedroom and asked worriedly, “Have I been totally horrible to you for four years? I haven’t, have I? Do you want me to make you a waffle?” [1]) I began to wonder why people assumed my obsession was with my relationship rather than with math. Would people have responded differently to the essay, I wondered, had I been male?

To answer this question, I created two abridged versions of the essay which were identical except that one was supposedly written by a man with a girlfriend and one was supposedly written by a woman with a boyfriend. (One interesting follow-up would be to run a version of this experiment with same-sex couples.) I put the essays on Mechanical Turk, a site social scientists use to run surveys, and asked people to rate the author along various dimensions:

with a total of 23 adjectives. (You can read more about the survey methodology in footnote [2]). Below, I refer to the person who read and rated the essay as the “reader” and the person who wrote the essay as the “author”.

It turns out that gender does affect how people respond; both the gender of the author and the gender of the reader were significantly correlated with the adjectives that were used [3]. Female authors were described differently than male authors. Below I plot each word by how much people thought it described the male author (horizontal axis) and female author (vertical axis). The left plot shows the results for female readers, and the right plot shows the results for male readers. (Apologies for the small font; zoom in.)

Screen Shot 2015-04-09 at 3.40.38 PM.png

If a word lies near the diagonal black line, it means that people used it to describe male and female authors equally often; I’ve highlighted in red words which are especially far from the black line (p < .05, t-test; I didn’t use multiple-hypothesis correction because I just wanted to visually highlight the most skewed words [4]). Male readers rated female authors as more “dangerous”, “bitchy”, “aggressive”, “crazy”, and “possessive” and less “smart” than male authors; female readers described male authors as “genius” and “weird” more often than female authors. When we combine results for both male and female readers, female authors were more often described as “bitchy”, “crazy”, “dangerous”, and “aggressive”, and male authors were more often described as “genius”.

Here’s a slightly different way of looking at the data. Below I plot each adjective by how often male readers used it to describe the female author (horizontal axis) and how often female readers used it to describe the female author (vertical axis). Male and female readers differed significantly in how they reacted to the essay written by a female author.

Screen Shot 2015-04-09 at 3.29.44 PM.png

Female readers were more likely to describe the female author as “likeable” and “dateable”, whereas male readers were more likely to describe her as “obsessive”, “possessive”, “aggressive”, “autistic”, “robotic”, and “dangerous” (thanks, y’all). (Interestingly, there were no large differences in how male and female readers reacted to male authors.)

These results are intriguing and not what I expected, so feel free to comment or email me if you have explanations or follow-up projects.

One final thought. This experiment identifies a fairly clean case, I think, in which I was treated differently because of my gender. In general, it is very hard to be sure whether treatment is due to gender. For example, I have been condescended to hundreds of times, but I have no idea which (if any) of those occurred because I was female. Ellen Pao’s defeat in her gender discrimination lawsuit -- and, more broadly, the low success rate of plaintiffs in such cases -- also illustrates the difficulty of proving that treatment is due to gender. The difficulty of proof implies, I think, two things: a) we shouldn’t dismiss an allegation of gender bias as false merely because it is unprovable; b) we should keep thinking of creative ways to prove bias (Experiments? Class actions?)

Notes:

[1] Perhaps this is too obvious to mention, but when you write an internet comment about the author of a personal essay, remember that the author may actually read it. (I have spoken to other authors who do this as well.) I want to continue to write personal essays about statistics because a) most people find pure math dry and b) one of the best writers I ever got to work with yelled at me repeatedly to be more intimate (“face your dragons! face your dragons!”) But like, it’s a bit of a downer to be told by random strangers on the internet that you ought to be dumped -- and obviously, I can deal with it and it’s your right to make such comments, but I would still think about what you’re really accomplishing.

[2] I gave the survey to 200 people on Mechanical Turk and randomized whether each person read the male or female version of the essay. I asked two comprehension questions to make sure people actually read the essay and filtered out people who gave incorrect responses. After filtering, I was left with 94 responses from male readers and 90 from female readers.

[3] I ran the regression author_gender ~ adj_1_score + adj_2_score + … + adj_23_score; then I did a joint F-test on all the coefficients, which yielded a p-value of 7.2 * 10-4. Similarly, the joint F-test on reader_gender ~ adj_1_score + adj_2_score + … + adj_23_score yielded a p-value of 7.5 * 10-4. These regressions are a little unintuitive because their structure implies that adjectives affect gender rather than the other way around, but they show that the adjective scores are significantly correlated with gender of both author and reader.

[4] So you should take any individual word with a grain of salt -- but the difference between words as a whole is highly significant, and the patterns are pretty consistent.