Recently I wrote a tongue-in-cheek piece in which I statistically analyzed 5,500 emails exchanged with my boyfriend over the past four years. Thanks so much to the many friends who said nice things and shared the piece; it meant a lot to me, because the internet evidently has decided I’m undateable. Here’s a sample of responses:
“Don't be surprised if the BF dumps her.”
“That poor bastard.”
“Maybe he doesn't miss her because she's a lunatic.”
“I was originally going to suggest the author break up now and start collecting cats.”
“I'm guessing that this couple is very young and have a trivial understanding of love.”
“I'm thinking the guy should run, and fast. This is author is obsessive...”
I am, of course, obsessive -- note the title of this blog -- but about statistics, not my boyfriend. And so when I read these responses (after I walked into my boyfriend’s bedroom and asked worriedly, “Have I been totally horrible to you for four years? I haven’t, have I? Do you want me to make you a waffle?” [1]) I began to wonder why people assumed my obsession was with my relationship rather than with math. Would people have responded differently to the essay, I wondered, had I been male?
To answer this question, I created two abridged versions of the essay which were identical except that one was supposedly written by a man with a girlfriend and one was supposedly written by a woman with a boyfriend. (One interesting follow-up would be to run a version of this experiment with same-sex couples.) I put the essays on Mechanical Turk, a site social scientists use to run surveys, and asked people to rate the author along various dimensions:
with a total of 23 adjectives. (You can read more about the survey methodology in footnote [2]). Below, I refer to the person who read and rated the essay as the “reader” and the person who wrote the essay as the “author”.
It turns out that gender does affect how people respond; both the gender of the author and the gender of the reader were significantly correlated with the adjectives that were used [3]. Female authors were described differently than male authors. Below I plot each word by how much people thought it described the male author (horizontal axis) and female author (vertical axis). The left plot shows the results for female readers, and the right plot shows the results for male readers. (Apologies for the small font; zoom in.)
If a word lies near the diagonal black line, it means that people used it to describe male and female authors equally often; I’ve highlighted in red words which are especially far from the black line (p < .05, t-test; I didn’t use multiple-hypothesis correction because I just wanted to visually highlight the most skewed words [4]). Male readers rated female authors as more “dangerous”, “bitchy”, “aggressive”, “crazy”, and “possessive” and less “smart” than male authors; female readers described male authors as “genius” and “weird” more often than female authors. When we combine results for both male and female readers, female authors were more often described as “bitchy”, “crazy”, “dangerous”, and “aggressive”, and male authors were more often described as “genius”.
Here’s a slightly different way of looking at the data. Below I plot each adjective by how often male readers used it to describe the female author (horizontal axis) and how often female readers used it to describe the female author (vertical axis). Male and female readers differed significantly in how they reacted to the essay written by a female author.
Female readers were more likely to describe the female author as “likeable” and “dateable”, whereas male readers were more likely to describe her as “obsessive”, “possessive”, “aggressive”, “autistic”, “robotic”, and “dangerous” (thanks, y’all). (Interestingly, there were no large differences in how male and female readers reacted to male authors.)
These results are intriguing and not what I expected, so feel free to comment or email me if you have explanations or follow-up projects.
One final thought. This experiment identifies a fairly clean case, I think, in which I was treated differently because of my gender. In general, it is very hard to be sure whether treatment is due to gender. For example, I have been condescended to hundreds of times, but I have no idea which (if any) of those occurred because I was female. Ellen Pao’s defeat in her gender discrimination lawsuit -- and, more broadly, the low success rate of plaintiffs in such cases -- also illustrates the difficulty of proving that treatment is due to gender. The difficulty of proof implies, I think, two things: a) we shouldn’t dismiss an allegation of gender bias as false merely because it is unprovable; b) we should keep thinking of creative ways to prove bias (Experiments? Class actions?)
Notes:
[1] Perhaps this is too obvious to mention, but when you write an internet comment about the author of a personal essay, remember that the author may actually read it. (I have spoken to other authors who do this as well.) I want to continue to write personal essays about statistics because a) most people find pure math dry and b) one of the best writers I ever got to work with yelled at me repeatedly to be more intimate (“face your dragons! face your dragons!”) But like, it’s a bit of a downer to be told by random strangers on the internet that you ought to be dumped -- and obviously, I can deal with it and it’s your right to make such comments, but I would still think about what you’re really accomplishing.
[2] I gave the survey to 200 people on Mechanical Turk and randomized whether each person read the male or female version of the essay. I asked two comprehension questions to make sure people actually read the essay and filtered out people who gave incorrect responses. After filtering, I was left with 94 responses from male readers and 90 from female readers.
[3] I ran the regression author_gender ~ adj_1_score + adj_2_score + … + adj_23_score; then I did a joint F-test on all the coefficients, which yielded a p-value of 7.2 * 10-4. Similarly, the joint F-test on reader_gender ~ adj_1_score + adj_2_score + … + adj_23_score yielded a p-value of 7.5 * 10-4. These regressions are a little unintuitive because their structure implies that adjectives affect gender rather than the other way around, but they show that the adjective scores are significantly correlated with gender of both author and reader.
[4] So you should take any individual word with a grain of salt -- but the difference between words as a whole is highly significant, and the patterns are pretty consistent.