Thursday, April 9, 2015

Whether You're Crazy Depends on Whether You're Female

Recently I wrote a tongue-in-cheek piece in which I statistically analyzed 5,500 emails exchanged with my boyfriend over the past four years. Thanks so much to the many friends who said nice things and shared the piece; it meant a lot to me, because the internet evidently has decided I’m undateable. Here’s a sample of responses:  

“Don't be surprised if the BF dumps her.”
“That poor bastard.”
“Maybe he doesn't miss her because she's a lunatic.”
“I was originally going to suggest the author break up now and start collecting cats.”
“I'm guessing that this couple is very young and have a trivial understanding of love.”
“I'm thinking the guy should run, and fast. This is author is obsessive...”

I am, of course, obsessive -- note the title of this blog -- but about statistics, not my boyfriend. And so when I read these responses (after I walked into my boyfriend’s bedroom and asked worriedly, “Have I been totally horrible to you for four years? I haven’t, have I? Do you want me to make you a waffle?” [1]) I began to wonder why people assumed my obsession was with my relationship rather than with math. Would people have responded differently to the essay, I wondered, had I been male?

To answer this question, I created two abridged versions of the essay which were identical except that one was supposedly written by a man with a girlfriend and one was supposedly written by a woman with a boyfriend. (One interesting follow-up would be to run a version of this experiment with same-sex couples.)  I put the essays on Mechanical Turk, a site social scientists use to run surveys, and asked people to rate the author along various dimensions:

Screen Shot 2015-04-06 at 5.11.23 PM.png

with a total of 23 adjectives. (You can read more about the survey methodology in footnote [2]). Below, I refer to the person who read and rated the essay as the “reader” and the person who wrote the essay as the “author”.

It turns out that gender does affect how people respond; both the gender of the author and the gender of the reader were significantly correlated with the adjectives that were used [3]. Female authors were described differently than male authors. Below I plot each word by how much people thought it described the male author (horizontal axis) and female author (vertical axis). The left plot shows the results for female readers, and the right plot shows the results for male readers. (Apologies for the small font; zoom in.)
Screen Shot 2015-04-09 at 3.40.38 PM.png

If a word lies near the diagonal black line, it means that people used it to describe male and female authors equally often; I’ve highlighted in red words which are especially far from the black line (p < .05, t-test; I didn’t use multiple-hypothesis correction because I just wanted to visually highlight the most skewed words [4]). Male readers rated female authors as more “dangerous”, “bitchy”, “aggressive”, “crazy”, and “possessive” and less “smart” than male authors; female readers described male authors as “genius” and “weird” more often than female authors. When we combine results for both male and female readers, female authors were more often described as “bitchy”, “crazy”, “dangerous”, and “aggressive”, and male authors were more often described as “genius”.

Here’s a slightly different way of looking at the data. Below I plot each adjective by how often male readers used it to describe the female author (horizontal axis) and how often female readers used it to describe the female author (vertical axis). Male and female readers differed significantly in how they reacted to the essay written by a female author.

Screen Shot 2015-04-09 at 3.29.44 PM.png

Female readers were more likely to describe the female author as “likeable” and “dateable”, whereas male readers were more likely to describe her as “obsessive”, “possessive”, “aggressive”, “autistic”, “robotic”, and “dangerous” (thanks, y’all). (Interestingly, there were no large differences in how male and female readers reacted to male authors.)

These results are intriguing and not what I expected, so feel free to comment or email me if you have explanations or follow-up projects.

One final thought. This experiment identifies a fairly clean case, I think, in which I was treated differently because of my gender. In general, it is very hard to be sure whether treatment is due to gender. For example, I have been condescended to hundreds of times, but I have no idea which (if any) of those occurred because I was female. Ellen Pao’s defeat in her gender discrimination lawsuit -- and, more broadly, the low success rate of plaintiffs in such cases -- also illustrates the difficulty of proving that treatment is due to gender. The difficulty of proof implies, I think, two things: a) we shouldn’t dismiss an allegation of gender bias as false merely because it is unprovable; b) we should keep thinking of creative ways to prove bias (Experiments? Class actions?)

[1] Perhaps this is too obvious to mention, but when you write an internet comment about the author of a personal essay, remember that the author may actually read it. (I have spoken to other authors who do this as well.) I want to continue to write personal essays about statistics because a) most people find pure math dry and b) one of the best writers I ever got to work with yelled at me repeatedly to be more intimate (“face your dragons! face your dragons!”) But like, it’s a bit of a downer to be told by random strangers on the internet that you ought to be dumped -- and obviously, I can deal with it and it’s your right to make such comments, but I would still think about what you’re really accomplishing.
[2] I gave the survey to 200 people on Mechanical Turk and randomized whether each person read the male or female version of the essay. I asked two comprehension questions to make sure people actually read the essay and filtered out people who gave incorrect responses. After filtering, I was left with 94 responses from male readers and 90 from female readers.
[3] I ran the regression author_gender ~ adj_1_score + adj_2_score + … + adj_23_score; then I did a joint F-test on all the coefficients, which yielded a p-value of 7.2 * 10-4. Similarly, the joint F-test on reader_gender ~ adj_1_score + adj_2_score + … + adj_23_score yielded a p-value of 7.5 * 10-4. These regressions are a little unintuitive because their structure implies that adjectives affect gender rather than the other way around, but they show that the adjective scores are significantly correlated with gender of both author and reader.
[4] So you should take any individual word with a grain of salt -- but the difference between words as a whole is highly significant, and the patterns are pretty consistent.

Saturday, February 14, 2015

Should You Go Out for a Romantic Valentine's Dinner?

No, say the experts: the restaurants will be busier and serving new menus, producing worse food and worse service which you’ll pay more for. If your date is a disaster, you’re stuck for three courses while surrounded by canoodling couples; even if it isn’t, your love may not be best celebrated at restaurants where you “feed each other every bite of the meal” and “discover how a fondue fork can give Cupid’s arrow a run for its money” [1]. I find these arguments compelling, but I’m biased: I’m spending Valentine’s alone this year, and it would warm my frozen heart to know the champagne-sipping couples I watch through rosy windows are secretly miserable.

But does the data actually support my Valentine’s fantasy? (The main perk of being a computer scientist: more statistical schadenfreude). Yelp just released 1.6 million reviews which you should all download right now, and from these I identified 1,778 Valentine's restaurant reviews. Here is what I learned:

1. People do not report worse experiences on Valentine’s Day. People tend to go out to slightly nicer restaurants on Valentine’s, probably because it’s a special occasion. But they do not give them lower reviews: there is no difference between the rating a restaurant usually gets and the rating it gets on Valentine’s. I was pretty disappointed by this, so I tried looking just at restaurants described as romantic, classy, or expensive. I also found no difference for these groups [2]. But I found some consolation in the fact that...

2. If you do go out, it is reasonably likely that you will have a bad experience. 19% of Valentine’s reviewers gave restaurants 1 or 2 stars out of 5, so ⅕ of couples I see are not having a very good time. And perhaps this is also a compelling argument for staying home with your sweetheart: imagine cooking a nice dinner, drinking some wine, and getting up to whatever else you get up to, or down to. If you really think there’s a higher than 19% chance you’d give that 1 or 2 stars, you might want to reconsider your relationship.

Going to a restaurant on Valentine’s may be riskier than going on another day: people were more likely to give restaurants 1 or 2 star reviews, although they were also more likely to give them 5 star reviews. If you’re willing to take the risk, I might suggest casting it as romantic, because danger is titillating: tell your date, “We’re risking everything by getting dinner tonight...but I want to take that journey with you.” 

There are some caveats to the ratings. Maybe if I have a mediocre Valentine’s meal, I’m more reluctant to admit it because I spent a lot of money and it makes me sad about my relationship. Or maybe there’s an effect in the opposite direction: I’m more willing to criticize a meal on Valentine’s because my expectations are higher. Also, Yelp reviewers probably don’t represent the general population: I’m no psychologist, but I’m pretty sure some of the people who write one-star reviews have anger management issues. Finally, there are some false positives in my dataset: not everyone who mentions Valentine’s in their review was going out for a romantic dinner.

What differentiates a good Valentine’s date from a bad one? Here are the phrases most associated with high and low ratings [3]:
Phrases Associated with High Ratings
Phrases Associated with Low Ratings
absolutely amazing, pretzel, crudo, flawless, worth every penny, intrusive, outstanding, so fresh, pistachio, it was delicious, gluten free, the farm, melted in my mouth
will never go back, horrible, apology, was the worst, was cold, sucks, pissed, undercooked, awful, ruined, ranch

Pretzels: the food of love. Of course, what makes a good date will vary depending on the type of restaurant. Here are some phrases that indicated good dates in different types of restaurants.

Good Dates in Expensive Restaurants
Good Dates in Inexpensive Restaurants
Good Dates in Romantic Restaurants
worth every penny, crudo, melted in my mouth, cheddar, fillet, black cod, vinaigrette, hamachi, sashimi, grits
Gluten free, the farm, pistachio, gyro, very fresh, cozy, sushi
creamy, tuna, rich, duck, oysters, attentive, local, cozy, green, las vegas, sorbet, bread pudding, scallops

3. Sandwiches aren’t sexy. To figure out which foods were most romantic, I compared the words people used to describe each restaurant on Valentine’s to the words they used to describe it on other days. Here are the foods most and least associated with Valentine’s:

Most Associated
Least Associated
set menu, cookie, champagne, lobster bisque, creamed, surf and turf, truffle, bruschetta, tenderloin, short ribs, rose, cherry, milk, chocolate covered, butternut squash, yellowtail, risotto
french toast, mexican food, happy hour, chicken, custard, bloody, taco, wings, eggs, pork belly, sake, halibut, french fries, mustard, sandwiches, broccoli, horseradish

The Valentine’s foods are mostly classics (“chocolate covered” is most frequently followed by “strawberry” but also describes bon bons and souffle). As for the non-Valentine’s foods: taking someone out for happy hour on Valentine’s might make you seem cheap, broccoli and french fries just aren’t romantic, and who wants to kiss someone who smells like mustard or horseradish? Some of the differences seem arbitrary: why is yellowtail sexier than halibut, short ribs than pork belly? Some may be due to nomenclature: perhaps on Valentine’s you rename your custard “creme brulee”.

4. There’s no statistically significant correlation between how expensive the restaurant is and the ratings people give it. So I’m definitely spending my next Valentine’s at Chipotle. Here are some phrases most associated with expensive and inexpensive Valentine’s dates [4]:
Expensive Dates
Inexpensive Dates
Paris, black truffle, bone in ribeye, caviar, creamed spinach, sommelier, beef wellington, foie gras, wine pairings, wagyu, souffle, amuse, black cod
burrito, slaw, pita, pizzas, diamond, gyro, fry, brisket, panini, meatloaf, sandwich, wrap, pad thai, chipotle, pastries, takeout, tilapia

Of course, many people may be hoping for more than black truffles from their Valentine. To determine which lovers had been sexually active after their dates, I performed a backtrace on the IPs used to submit the reviews, linked the IPs to home addresses, and looked for changes in electricity and water usage, ambient noise, and local seismological readings consistent with sexual activity. I found the following factors showed associations:

Haha naw I’m just messing with you. I’m not that creepy, and also it’s completely technically impossible. Finally, in case this wasn’t obvious, I don’t actually hate couples. Whether you’re single, married, or somewhere in between, have a very happy Valentine’s.


[1] I will admit I harbor a certain vitriol towards fondue -- a girl once used it to try to seduce my boyfriend -- and in general towards foods with sexual overtones. If you’re hungry, eat dinner, and if you’re horny, have sex -- but public culinary foreplay has always struck me as an awkward combination.
[2] I did find a significant difference for Yelp reviewers who had earned “elite” status: they tended to give restaurants significantly higher ratings on Valentine’s Day than the restaurants received overall. I wasn’t sure why this might be, so I consulted an expert on online restaurant reviews, who suggested that elite reviewers might be better at using the review system to filter out bad restaurants or that elite reviewers were encouraged to go to restaurants with few reviews, which might be trying harder to impress customers on Valentine’s.
[3] I filtered out phrases that were redundant or uninformative.
[4] It’s possible, of course, that the cheap dates weren’t full Valentine’s dinners, just lunches that occurred on Valentine’s Day, meaning that we’re not really comparing apples to apples.

Monday, January 26, 2015

How America Responded to All 339 Lines in the State of the Union

For non-Americans: the State of the Union is an annual address in which the president outlines how the country’s doing and the agenda for the future.

Which parts of Obama’s address particularly resonated with the public? One can judge the congressional response to each line by listening to applause in the chamber, but it's harder to know what the country as a whole thought. One way to find out is to look up every single sentence in the speech on Twitter and study each response. Yes, this took a while, and yes, I should probably find other hobbies, but the results were worth it.

Each point in the graph below represents a single sentence, ordered from left to right by order in the speech; the height of the point represents how many people retweeted it.

What is that point in the upper-right corner, the quote that is shared 10 times as widely as any other line in the speech? Surely it must be some rhetorical masterstroke, a particularly inspired policy proposal? No: it's the ad-libbed moment in the speech where Obama puts down the Republicans.

We criticize our politicians for lack of civility, but how do we expect them to behave when we pay more attention to their snappy retorts than their policy proposals?

Zooming in on the graph (click here and mouse over points to see which quotes they represent) we see that many of the most widely shared lines reprise the messages of hope and unity that so resonated when Obama originally took office: "I still believe we are one people", "I still believe that together, we can do great things", and "Let's begin this new chapter -- together -- and let's start the work right now". (More substantive lines, like those about gay marriage, community college, middle-class economics, and climate change, were also shared widely).

The only problem with these pleas for unity? For the most part, only the Democrats found them compelling. Self-identified liberal tweeters outnumbered self-identified conservative tweeters more than two to one for every single one of the top 25 most retweeted lines in the speech. Among those sharing "My fellow Americans, we too are a strong, tight-knit family", I could identify 96 liberal tweeters and only 5 conservative tweeters [1]: perhaps not such a tight-knit family after all.

So what lines were shared among Republicans? Here are the lines whose retweeters skewed most conservative (among lines shared by at least five tweeters with identified political affiliation).

Republicans / Total
Already, we've made strides towards ensuring that every veteran has access to the highest quality care.
100% (14/14)
Helping hardworking families make ends meet.
99% (74/75)
In Iraq and Syria, American leadership -- including our military power -- is stopping ISIL's advance.
96% (22/23)
There are no guarantees that negotiations will succeed, and I keep all options on the table to prevent a nuclear Iran.
91% (10/11)
We need to do more than just do no harm.
86% (32/37)
That's what middle-class economics is -- the idea that this country does best when everyone gets their fair shot, everyone does their fair share, and everyone plays by the same set of rules.
84% (42/50)

Of course, just because conservatives were sharing Obama's words doesn't mean they agreed with them. For example, "Already, we've made strides towards ensuring that every veteran has access to the highest quality care" was shared by conservatives with the addendum "This one difficult to take". "We need to do more than just do no harm" was shared by conservatives who replied, "PLEASE DON'T".

Sometimes differences between the political parties emerged even in consecutive lines of the speech. For example, when Obama said "And today, America is number one in oil and gas. America is number one in wind power," 68% of people sharing the line about oil were conservative; 82% of people sharing the line about wind power were liberal.

Of course, political party isn't the only characteristic that affects how we react to a speech. Men and women [2] reacted in different ways as well.

Lines Whose Resharers Were Most Likely to Be Female
Lines Whose Resharers Were Most Likely to Be Male
That's why this Congress still needs to pass a law that makes sure a woman is paid the same as a man for doing the same work.
Instead of getting dragged into another ground war in the Middle East, we are leading a broad coalition, including Arab nations, to degrade and ultimately destroy this terrorist group.
I want our actions to tell every child, in every neighborhood: your life matters, and we are as committed to improving your life chances as we are for our own kids.
We can't slow down businesses or put our economy at risk with government shutdowns or fiscal showdowns.
It's time we stop treating childcare as a side issue, or a women's issue, and treat it like the national economic priority that it is for all of us.
We believed we could reverse the tide of outsourcing, and draw new jobs to our shores.
[Child care]'s not a nice-to-have -- it's a must-have.
Members of both parties have told me so.
Today, we're the only advanced country on Earth that doesn't guarantee paid sick leave or paid maternity leave to our workers.
We still need to make sure employees get the overtime they've earned.
That's why we defend free speech, and advocate for political prisoners, and condemn the persecution of women, or religious minorities, or people who are lesbian, gay, bisexual, or transgender.
Today, we have new tools to stop taxpayer-funded bailouts, and a new consumer watchdog to protect us from predatory lending and abusive credit card practices.
We still may not agree on a woman's right to choose, but surely we can agree it's a good thing that teen pregnancies and abortions are nearing all-time lows, and that every woman should have access to the health care she needs.
If we don't act, we'll leave our nation and our economy vulnerable.

Beyond gender and political party, people’s backgrounds guide their reactions in many subtler ways. Tweeters who reshare the speech’s quote from Pope Francis are more likely to describe themselves as Catholic; those who reshare figures on Iraq and Afghanistan are more likely to be veterans, and those who reshare figures on graduation rates and test scores are more likely to be educators.

After I completed this analysis, I could see why Obama’s speechwriter needs a stiff drink. He had to please a thousand different constituencies in a single speech, half the country wasn’t going to be happy no matter what he said, and the line that people liked best wasn’t even in his draft. I’ll stick to writing code.

[1] Most tweeters do not identify their political affiliation in their profile, so this approach works only for a small subset of tweeters.
[2] I used the name of the tweeter to identify their gender. This does not work for all tweeters, and it does not account for people whose gender does not fit a binary description or does not match that implied by their name, but I still think it’s worth doing on balance because it gives a useful (if noisy) signal.