Monday, July 25, 2016

Arguments I’ve had with Shengwu Li, part 1: should you vote?

This piece comes from conversations with my housemate, Shengwu, who is leaving us for Harvard. We are all sad about this; I am sad about it in part because Sheng has sharpened my thinking as much as anyone I know. I still remember the first time we talked about math; sitting at a coffee shop I thought, shit, this guy with the Oxbridge accent and the penchant for Latin phrases might actually be as smart as he sounds. You won’t find someone who’s more fluent talking about both math and morality, and you won’t get a better intellectual guarantee than “this argument is Shengwu-proof” . I’m not really sure what we’ll do without him -- less rigorous social science, I guess. Any errors in this analysis remain my own.

Here are two questions we’ve been arguing about recently.

  1. Is it rational to vote in a national US election just because of the remote possibility you could change which candidate wins?
  2. How much does sleeping through the first round of the debate world championships hurt your final result?  

I’m going to talk about the first argument in this post and the second argument in a later post because I’ve been informed that there’s a limit to how much other people want to listen to Sheng and I argue.

There are many arguments for voting -- “what would happen if no one did?” “people died for your right to vote!” -- but let’s consider a crazy one: you should vote because you might be the tie-breaking vote in an election. Obviously, this is incredibly unlikely. On the other hand, if you did, it would be incredibly important. So just how unlikely is it?

Say you live in a state with a million expected voters where one candidate is polling at 52% and the other at 48%. Here’s a first attempt at figuring out how likely you are to be the tying vote: if each voter is like a coin with a 52% chance of landing heads, what is the probability you get exactly 50% heads? The answer is so small -- less than one in 10100 -- that the computer returns 0. If you run a bunch of simulations of the fraction of heads, it looks like this -- pretty much every time you’ll end up with a fraction close to .52.

This implies that voting is a waste of time; you have no chance of influencing the outcome. But this simple analysis turns out to be wrong for a simple reason: polls aren’t perfect. Just because a poll tells you 52% of voters prefer a candidate doesn’t mean that 52% of voters actually do. One source of error is the margin of error in polls: as the Upshot pointed out recently, because pollsters only sample a relatively small group of people, polling results fluctuate just due to random noise. So if a poll tells you that 52% of voters prefer a candidate, the real percentage might be something like:

Which is a lot more uncertainty than we saw before.

Given this uncertainty, it turns out the chance that you're the tying vote is not that low -- somewhere between 1 in 100,000 and 1 in 10 million (see [1] if you want modeling details). Which sounds low but is actually astonishingly high. If the difference in value between the two candidates is, say, a trillion dollars -- which is plausible, given the size of the US budget and the very different uses Trump and Clinton would put it to -- then a 1 in 10 million chance of influencing the election outcome means that in expectation, you change the allocation of $100,000. Of course, I’ve only simulated a single state, not the whole election, using a very simple model. But after doing this I discovered a more sophisticated analysis from Andrew Gelman, Nate Silver, and Aaron Edlin which reached a similar conclusion: if you live in a state which is at all close, voting is probably worth your while simply because of the possibility you will be the tie-breaking vote in an election. If you are trying to decide whether your state is at all close, err on the side of “yes”; here are 538's predictions for every state.

This was surprising to both of us; I am definitely going to vote now (in Virginia, a swing state), and you should too!

Addendum: a corollary to this is that you should not vote for a third party candidate on the grounds that "my vote will never matter anyway, so I'll just vote my conscience". Vote for a third party candidate if you want to, but if your state is at all close -- and current polls imply that many states will be -- weigh your desire to vote your conscience against the fact that, mathematically, your vote really could affect the outcome of the election.


[1] Very simple model: draw the true fraction of voters ut preferring a candidate from a normal, ut  ~ N(up, sigma), where up is the fraction preferring a candidate according to polls and sigma is the uncertainty on the polls (due not just to margin of error but to other things as well); then draw the number of voters voting for a candidate, V, from a binomial distribution, V ~ binom(T, ut), where T is the total number of voters. What is the probability that V = T / 2?

Wednesday, June 29, 2016

How Angry to Be

This essay was written over the course of six months and was initially titled “Why I don’t write angry feminist essays”. Then I realized that I actually was pretty angry. The final product is two-part: first I talk about the benefits of staying calm, and then I talk about the benefits of anger.

The TV show Jessica Jones pits a woman with superhuman strength against her rapist, Kilgrave, who can control minds. Many people have commented on how Jones is really fighting the patriarchy: Kilgrave forces women to smile for him and sleep with him, stalks them and threatens the people they love, and Jones confronts many lesser sexists throughout the season. It is deeply satisfying to watch her throw sexist assholes through walls; I watched the entire season three times in three months.  As I did, however, I realized the very thing that made the show satisfying makes it a poor metaphor for fighting sexism.

Kilgrave is an Unequivocally Bad Dude -- a serial rapist and murderer who casually tortures people for no reason:
Figure 1: Seriously, dude?
Few agents of the patriarchy are so one-dimensional. Consider:

Female engineer 1: God, this guy at work asked a coding question today -- and when I answered, he ignored me until a male engineer gave the exact same answer.
Female engineer 2: I hate that. So what did you do?
Female engineer 1: I threw him through a wall. Fractured his spine.
Female engineer 2: You go, girl!

said nobody ever.

This isn’t a complaint about Jessica Jones; it’s a superheroine show, not an advice manual. But it is a complaint about the tactics used by many people who claim to be fighting the patriarchy. Too often, I think, we treat people as Kilgraves when they aren’t.

There are innumerable examples of this. Take the backlash against Scott Aaronson (professor accused of being a sexist, entitled nerd) or Chris Herries (student accused of comparing rape victims to bikes) or any of the targets of viral internet outrage, take this uncharitable response to Stack Overflow’s well-intentioned attempt to get more data on women among their users, take many, many things written by Jezebel. I try not to demonize people in my work for two reasons.

  1. It paints a false picture. People are complicated and contradictory. There are rapists whose wives love them [1]; fraternity men who call some women “slampieces” but treat other women with respect; men who campaign for gender equality but talk over women; women who call themselves feminists but attack Bill Clinton’s accusers; fathers who cheer on their daughters but are biased against their female employees; online trolls who feel remorse; philosophy professors who argue for consent but harass their female students; men who can quote Simone de Beauvoir but won’t do the damn dishes. Demonizing people who do sexist things ignores these complexities. Worse, it lets us relax in the comfortable lie that only demons do sexist things. The scarier truth is that the great injustices in history have not been carried out by Kilgraves, but by ordinary people -- perhaps there was a psychopath at the helm, but they needed willing executioners.

  1. Demonizing people alienates potential allies. Take this recent piece criticizing tech CEOs who think “diversity” refers only to gender: “Is it because they’re racist? Sexist? Ignorant? Some sick combination of all three? Probably.” This is a great way to make CEOs afraid to talk about diversity at all. Of course they should be thinking about diversity in more nuanced ways, but there are less vicious ways to convey that.
I know that such rhetoric alienates powerful men because I’ve talked to them. One male tech leader told me that while he cared about diversity, he was reluctant to speak up about it publicly because he worried about incurring backlash. Another told me that he was less likely to hire gender studies majors because he thought they were more likely to sue for discrimination. Perhaps these men should not have harbored such views. But given that they do, we should adapt our rhetoric if we wish to be persuasive.
And if you’re saying screw ‘em, I don’t care if my essays alienate men -- I would submit to you that men rule the world, and if we really want to change it, as opposed to just writing echo-chamber clickbait, we will do so faster if we don’t lose 90% of CEOs, members of Congress, and tech leaders. Which is not to say that you should try to persuade all men -- but if you don’t persuade any, you might want to revise your rhetorical approach. (It is of course sometimes necessary to voice difficult truths which will alienate people; I am not opposed to all radical feminism, but specifically to feminism which divides by demonizing. You are not speaking truth to power when you demonize people; you are simply being inaccurate in a way which also drives away allies.)
I worry too that the more extreme voices in the feminist movement get disproportionate attention (a phenomenon my sister and I also observed when studying campus activists). Anecdotally, men I talk to are more likely to have heard about public shaming campaigns than about the less controversial aspects of feminism. Zeynep Tufekci, a professor at UNC who studies activist campaigns, notes that if a protest movement does not self-regulate and consciously decide its boundaries, it gets defined by its noisy, flamboyant outliers. Always.


And yet. I can make reasoned arguments for seeing the good in sexists, for preaching to the unconverted, for cutting out vitriol and sarcasm. But the truth is that I’m filled with so much anger. People read my statistical pieces about gender and tell me that they like that I can stay so detached and I laugh -- because why would I possibly spend so much time doing math about gender inequality if I were detached?

And I’m getting angrier with age. I am only 25 so this is somewhat concerning. I have lost the ability to laugh at things. I’m writing this in my room after watching Top Gun, an 80s movie about a bunch of military pilots. In one scene, the hero and his copilots surround a woman in a bar, much too close, red-faced and sweating, and drunkenly serenade her; she eventually escapes, so the hero follows her into the woman’s bathroom and tries to block her from leaving. On the one hand it’s dated and you want to laugh at these ridiculous men. But then I think about the rapes in the US military and how this movie was so influential that signups for the naval air force went up by 500% and I wonder how many rapes it helped cause. I read that the actress in the movie, a lesbian, was actually raped in real life, and spent decades thinking the rape was a punishment for her sexuality -- these are the hilarious things that happen when you aggressively push heterosexuality on everyone, haha! I remember the thousands of sexually aggressive comments of the fraternity men I studied and the papers that show how much more often rape occurs in fraternities and the hundreds of stories from survivors I perused and the ones who were brave enough to talk to me directly and the rape cases my mother prosecuted and I have to admit: I can no longer laugh at sexually aggressive bros. I close the door to my room and slam my fist into the wall over and over again, feeling nauseous.

On a day to day basis it isn’t the rapists who bother me; it is the subtle inequalities even among my progressive, thoughtful social circle. (I can only imagine how women who face more overt discrimination feel.) It’s the men who sit when the women clean after dinner; the women who don’t speak up when they’re uncomfortable because they want to be accommodating; the men who ramble at me even when I’m supposed to the one giving a talk or giving advice; the dates and friends who can lecture but can’t listen; the professional meetings where comments are addressed only to my male colleague. And if I can see these inequalities already, what will happen if we have children, when our incomes further diverge? I am afraid for my friends and afraid for myself.

Part of this increased awareness is the accumulation of slights that are each insignificant, the chafing of a blister rubbed raw. Part of it is that I spend so much of my research time looking for discrimination. Part of it is dealing with online responses to my writing. My writing voice is mild, but in response, I’ve had people call me a lying cunt; insult my body; speculate that I’m bitter because no one will sleep with me and I’m too prudish to get invited to parties; advise my boyfriend to break up with me; call me a token admit to the schools I’ve attended or the conferences I’ve been invited to. These commenters, of course, are the warts on the long statistical tail of readers, but you don’t forget them. On the advice of a male colleague, I now pay $100 a year to keep my personal information off the internet.  

Part of it is that I’m now single. I could make a joke here about how attempting to date Silicon Valley men would make anyone a feminist, but that would contravene what I said above about not unnecessarily being an ass to men and isn’t the point I’m trying to make anyway. When I was in a relationship and I got angry about something, I’d come home to a boyfriend who, our breakup notwithstanding, was one of the most gentle and decent people I’ve met, and he would both calm me down and by his mere existence remind me that #NotAllMen are Satan.

These days I sleep alone. I wake at four in the morning and brood for hours; there’s no one to break up my thoughts, no one to vent to except the empty page. And I’m beginning to believe in the value of anger untempered.

A few months ago I went to dinner with a man who wouldn’t let me get a word in and I came home to an empty house; soon my anger got the better of me and I wrote an essay about mansplainers so quickly it was as if it was torn out of me. It was one of the angriest essays I’ve written but also, I think, one of the better ones. Another night, I woke up so irritated about a male collaborator who was not pulling his weight that I couldn’t go back to sleep. Perhaps men were simply better at getting credit for projects they hadn’t led, I thought, while women’s contributions were ignored [2]. I grabbed my laptop and wrote a computer program to scrape a database of scientific papers and look for gender disparities. A few months later I published a piece based on that 3 AM analysis. So there are times when I am glad I’ve had to reach for a laptop rather than a hug; when I’ve had to put my anger fresh onto the page because there was no one to dull it.

(My friend advised me to cut out the last three paragraphs because they invite nasty comments about how I’m just bitter because I’m single, but I want to trust you to read what I wrote and not skew my words.)

So as I said at the beginning, I’m conflicted about how angry to be. Often when I work I’m not angry at all -- I’m looking at the world through twin lenses which both help me stay detached. The warm lens comes from years of counseling training: to listen without judging, to want to understand how someone who seems despicable can be the hero of their own story. The cold lens comes from years of quantitative training: rapists, like cancer cells, are simply high-dimensional mathematical structures to be parsed.

At times I can master this difficult balance: to humanize without exculpating; to be both furious and curious; to be motivated by anger but not overwhelmed by it.

And then I read, say, the Palo Alto woman’s statement to her assaulter and my careful detachment shatters and I’m overwhelmed again.


[1] You might say, I don’t really care if a rapist had a golden childhood -- but I think you actually should care, not because it exculpates them but because the whole problem is to understand how people grow up to commit rape. If only devil spawn did evil things, we would have a much easier problem. Obviously, we need to put an end to media pieces which portray rapists as athletes first and criminals second. But we also need to analyze rapists as a complex human beings, not monsters, while still giving full weight to the seriousness of their crimes. David Lisak’s seminal research on repeat rapists, which humanizes them without ever forgetting that they’re serial predators, is an example of this kind of work.
[2] There turns out to be some evidence for this: see here and here.

Thursday, May 19, 2016

Five things I learned from counting 900 engineers at Google I/O

I wrote this in a few hours to release it during the conference it discusses, so treat its conclusions with the appropriate grain of salt and let me know if you find mistakes.

Google gave me a ticket to their annual developer conference, Google I/O. I am a computer scientist but not really a developer (edit: for the benefit of the person on Hacker News alleging "reverse discrimination" -- I won the ticket in a coding competition. Maybe be a little careful throwing around statements like that. I describe myself as “not a developer” because I’m better at other aspects of computer science, not because I can’t code) so I decided that, rather than bugging people about full stack development or cross-platform coding, I could make myself more useful by analyzing diversity data. By this I mean I spent 6 hours wandering into various conference venues and tapping “F” on my phone when I saw a woman and “M” when I saw a man; in total I counted 916 people. Shoutout to my extremely tolerant housemate Andrew Suciu, who received all these messages:
I focused on gender because I didn’t think I could guess people’s race with high accuracy. (I realize, of course, that inferring gender from appearance also has serious caveats, but the data is useful to collect and I wasn’t going to interrogate random strangers about their gender identity). Here are five things I learned.

  1. Women were unexpectedly well-represented. 29% of people I counted were women. (I tweeted at Google to get the official numbers and will update if they reply.) That means women were better-represented in my data than they are, for example, as software engineers at 80% of tech companies, or among software developers overall (21%) in Labor Department statistics, or among Google engineers (17%). I am puzzled by this and welcome your explanations. From what I saw at the conference Google made pretty good efforts on the gender diversity front: a) posting a very large sign with anti-harassment guidelines b) giving women free tickets to the conference (that’s how I ended up there) c) featuring professional woman emoticons in the keynote and d) having three women speak in the keynote.
  2. There was surprisingly little variation in gender ratio between conference events. I computed gender ratios at about a dozen conference events, and they were considerably more stable than I expected them to be -- almost always within 10% of the overall average of 29%. (Whether the variation is even statistically significant depends on how exactly you define the categories -- perils of categorical f-tests and p-values!) Full data at the end of the piece. This is considerably less dramatic than, say, gender variation across developer subfields in Stack Overflow’s developer survey. One explanation might be that at a conference, people wander randomly into lots of events, which homogenizes gender ratios by adding noise.
  3. Women cluster together. We don’t just have the total counts of women: we also have the groupings of women, because I tapped “M” and “F” in the order that I saw people, and for lines that order is meaningful. (In cases where people are just sitting around, it’s a little more arbitrary, so I exclude that data from this analysis.) So if, for example, I tapped “MMMMMFFFFFFFF” that would be highly grouped data -- all the men are together and so are all the women. So we can ask whether the women group together more than we would expect if the line were just in random order, and it turns out they do (statistical details here [3]). At some events I could see this clearly without statistics -- at the machine learning office hours, for example, one table had only 1 woman out of 18, and the other table had 7 women out of 13 (fine, statistics: Fisher’s exact test p = .004, t-test p = .002). I think a large driver of the clustering is probably that women arrive in groups because they work at a company together, not that they preferentially connect at the conference, but the latter could play a role as well. (Anecdotally, three of the four people who spoke to me during the conference were women.)
  4. Live-blogging is perilous. When I arrived at the conference at 8:30 AM, about an hour and a half before it started, a quick headcount implied that 90 - 95% of attendees were men, and I posted this online. But as the conference progressed and I got more data, it became clear the early figure was too skewed. I regret not waiting to get more data before posting. While I was clear about the lack of data, there was no advantage to posting so quickly. I often think about this when people rapidly tweet their reactions to complex events. It doesn’t matter how smart you are; you’ll still write a better piece if you reflect. And I realize, of course, that sometimes you have to work very quickly because an event demands it -- I wrote this post in a few hours so I could publish it during the conference, so take my statements with a grain of salt -- but I still wish we took more time to think.  
  5. Machine learning should not just be used for takeout delivery.

Stand back, I’m going to use a metaphor. Imagine King Arthur came back to the castle to find his son cutting pizza with Excalibur.

Arthur: Son, that is literally the magical blade of destiny I pulled from the stone to to become king of England.
Son: Yeah, dad, but it cuts pizza really well.
Arthur: Sure, but can’t you think of anything more exciting to do with it?

This is how I feel about a lot of applications of machine learning, which I’m using here to mean “statistical methods that computers use to do cool things like learn to do complex tasks and understand images / text / speech”. Machine learning is revolutionary technology. You can use it to build apps that will understand “I want curry” and “play Viva la Vida”, but are those really the examples you want to highlight, as the conference keynote did? Let’s talk instead about how we can use machine learning to pick out police encounters before they become violent and stop searching and jailing so many innocent people; let’s talk about catching cancer using a phone’s snapshot of a mole or Parkinson’s using an accelerometer’s tremor or heart disease using a phone-based heart monitor. Those are the technologies that deserve the label “disruptive”, the you’re-a-goddamn-wizard-Harry applications that make your heart freeze. A few moments of the two-hour keynote emphasized big ideas -- the last ten minutes, an app to help Syrians relocate -- but most of the use cases were fixes to first-world problems [3].
Part of this, of course, is that it’s probably more profitable to drone-deliver San Franciscans Perrier than to bring people in Flint any water at all. But part of it is that people’s backgrounds influence what they choose to create. A woman would’ve been less likely to create this app which lets you stalk a random stranger, and someone who’d suffered from racism or classism would’ve been less likely to create this app which lets you identify “sketchy” areas. Machine learning is revolutionary technology, but if you want to use it to create revolutionary products, you need people who want revolution -- people who regularly suffer from deeper shortcomings in the status quo than waiting 15 minutes for curry [4].

[1] I’m not that worried about double-counting because there were 7,000 people at the conference.
[2] Call each MMMFMFMFMF... vector corresponding to a line at an event Li. Then I use the following bootstrapping procedure: compute a clustering statistic which I’ll describe below, randomly permute each Li separately, recompute the clustering statistic, repeat a few hundred times and compare the true clustering statistic to the random clustering statistics. (I permute each Li separately because if you mix them all together, you’ll be confounded by event gender differences). I tried two clustering statistics: first, the number of F’s who were followed by an F, and second, the bucket statistic, which is defined as bucket(Li, n, k): the number of bins of size n within Li  that contain at least k women, with k chosen to be at least half of n. I used the second statistic because I thought I might automatically group women together when there was ambiguity in the order of the line (people standing side by side) which I was worried would bias into the first statistic since it’s very local. For both statistics, the true clustering statistic was higher than the average random clustering statistic (regardless of the value of n, k I chose for the second statistic). For the first statistic, this was true for pretty much every random iterate, and for the second statistic, the percentage of random iterates it was true varied for depending on n and k, averaging about 90% of iterates.
[3] This is based only on the keynote, not the rest of the conference. I think this is reasonable because the keynote was watched by millions of people, it was run by the CEO of Google, and a speech like that should reflect what you want to emphasize.
[4] To be clear, I’m not really blaming Google either for the trivial-app problem (although I think their examples could’ve been better chosen, as a company they do a lot of amazing things) or the lack-of-diversity problem; they’re industry-wide symptoms. But that isn’t really reassuring.

Full data:

People entering keynote speech, 9:30 AM: 65 / 236 are women, 28% .
People exiting keynote speech, 12:00 PM: 25 / 88, 28%.
High performance web user interfaces line: 21 / 58, 36%
Accessibility office hours: 10 / 40, 25%
Machine learning office hours: 8 / 31, 26%
People sitting around on grass: 33 / 87, 38%
Access and empathy tent: 13 / 38, 34%
Android studio / google play: 6 / 46, 13%
Making music station: 4 / 18, 22%
Project loon (giant balloon): 5 / 11, 45%
Android experiments (random cute stuff): 15 / 46, 33%
Audience arranged in ring around robotic arm spewing paint to music: 17 / 48, 35%
Android pay everywhere line: 6 / 25, 24%
Engineering cinematic experiences in VR line: 15 / 62, 24%

Devtools on rails line: 24 / 82, 29%