Tuesday, December 9, 2014

How Likely is Someone Accused of Assault by Multiple People to be Innocent?

This post discusses sexual assault. I have tried to keep descriptive details to the minimum required for the math. I am aware this is a sensitive and controversial topic and, as always, welcome your emails with comments, suggestions, or objections.

The Bill Cosby case has highlighted the threat of serial offenders, and may illustrate a larger trend. One study found that more than 90% of campus sexual assaults were committed by a very small proportion of the population (6%) who on average committed six assaults. If this study is accurate, it seems profoundly important, because it implies most assaults are not being committed by people who sincerely misinterpret their partner’s intentions once. While it can be difficult to determine guilt in a single case, my intuition is that a person independently accused of assault by multiple people is quite unlikely to be innocent.

Because, like many people, I feel strongly when it comes to sexual assault, I built a mathematical model to investigate whether this intuition was accurate. Based on the above discussion of serial offenders, I included two groups in my model: “disproportionate assaulters”, who assault people at a high rate, and everyone else, who assault people at a lower rate [1]. I refer to these groups as “DAs” and “non-DAs”.

One question of interest is: given that someone has been accused of assault by k people, what is the probability that they are guilty in at least one case [2]? I created a simulation you can play with to answer this question. The horizontal axis is how many people have accused someone of assault, and the vertical axis is the probability that they are completely innocent. I initially, pretty arbitrarily, set the parameters of the model as follows, but I invite you to modify them by playing with the sliders:

p0, proportion of people who are DAs
pr1, probability a DA will assault someone in a given encounter
pr2, probability a non-DA will assault someone in a given encounter
pag, probability someone who is guilty of assault in a given encounter will be accused
pai, probability someone who is innocent of assault in a given encounter will be accused
n1, number of sexual encounters had by a DA
n2, number of sexual encounters had by a non-DA

Shoot me an email if you have good ways to pin down any of these values; the values I chose yield roughly the results in the original paper on serial offenders, but there’s a lot of residual uncertainty [3].

Obviously this model is idealized and does not capture all the complexities at play here. (Feel free to extend it yourself and write to me about it! Here’s some math and code; [4] has some notes on how I think you might extend it.) Still, from playing with it, we can make a few observations:

  1. The intuition that, “if you are accused of assault by multiple people, you’re probably guilty” is often accurate. Importantly, even if you were to choose settings where someone who has been accused once is more likely than not to be innocent, the probability of innocence often drops dramatically if they have been accused twice.

  1. This is still mostly true even if we don’t buy the assumption that there are two different kinds of people (by setting pr1 = pr2 and n1 = n2).
  2. As we would expect, the probability that someone is innocent increases dramatically as we increase pai , the probability that someone who is innocent will be accused. But we know pai  must be very small simply because the vast majority of people are never accused of assault. For example, if each person has 10 sexual encounters and 90% of people are never accused of sexual assault, pai  must be lower than 1% even if only innocent people are accused. (Thanks to Seth Stephens-Davidowitz for pointing this out.)
  3. Increasing our certainty that the guilty are guilty is not just good for accusers: it is also good for the accused, because it potentially allows us to raise the standard of evidence while still catching the same number of guilty people.
  4. In some cases with multiple accusations, it is essentially impossible that someone is innocent. My mother, who worked as a prosecutor in sexual assault cases, observed that because some repeat offenders use very similar methodology each time, accusers’ testimonies can share distinctive details in a way that would be impossible if no assault occurred (assuming accusations are levied independently).

But how can we combine multiple accusations of assault, given that survivors are usually unaware of each other and often reluctant to come forward? The New York Times recently reported on a tool designed to do this. It allows survivors to file accusations with a third party, who will keep the accusations confidential unless multiple accusations are levied at the same person. The thought is that this could make survivors of assault more willing to come forward and make it easier to identify serial offenders. My initial reaction to this idea was that it was so exciting I should drop out of school to go work on it. After thinking about it further and reading this paper, I concluded that this idea has at least three downsides as well:

  1. It may discourage survivors from reporting assault via the usual avenues -- if they file a third-party accusation and no one else does, they may conclude that they were “mistaken” and never follow up, which seems very harmful.
  2. You do not want to create a world where only people accused of assault multiple times are ever convicted. This has echoes of the “woman’s testimony is worth half of a man’s” standard which is applied in some countries. You also really don’t want people feeling like they can commit “one free assault”.
  3. If we have to wait for multiple assaults to be reported, serial assaulters have more time to commit assaults.

I cannot overemphasize that this is a complicated and painful problem to which there are no easy technological or statistical solutions. But I do think the combine multiple accusations approach is an interesting one, so I’d love to hear your thoughts.


After writing the main post, I wanted to add a statistical note on the doubts which have emerged about the UVA sexual assault case. (If you’re not familiar with the details, several weeks ago Rolling Stone published a story about a gang rape at UVA which got a lot of attention; a few days ago, they issued a statement saying there were “discrepancies” in the accuser’s account and their trust in her was “misplaced”, and the internet exploded.) I think the UVA episode illustrates precisely why statistics are so important -- because anything can happen in a single story, making it a risky thing to hang a cause on. Regardless of what really happened at UVA, the broader trend is clear: the rate of campus sexual assaults is high (20%, says CDC, although better data should be collected); the rate of false accusations is low (this review cites 6 studies which all yield estimates between 2% and 8%, lower than the rate of false reports for car theft). This is much more important than what happened in a single UVA fraternity on a single night. Similarly, to me the compelling story behind Ferguson is not contingent on what exactly happened between Darren Wilson and Michael Brown over the course of 90 seconds -- it is the systemic racial divides in Ferguson, and the research that makes discrimination against African-Americans by the police and justice system all too overwhelmingly clear. Causes are more robust to randomness when backed by statistics in addition to anecdotes.


[1] This is known as a mixture model, a very useful statistical tool that assumes that your data is generated by a combination of different groups. For example, you might assume that Tweets are generated by a mixture of Democrats and Republicans, or gene expression patterns are generated by a mixture of cancer cells and healthy cells. Obviously, this model should not be taken to imply that assault is committed only by “evil people” who are immune to social and cultural factors, just that some people tend to commit assaults more than others. The fact that rates of assault are much higher in some environments implies that social and cultural factors do play a role both in how likely someone is to commit assault and how likely they are to get away with it.
[2] There are other questions as well: for example, given that someone has been accused of assault by k people, what is the probability they are in the serial offender group? Given that someone has been accused of assault by k people, what is the probability that a particular allegation is true?
[3] Particularly pai, the probability that someone who is innocent is accused, since this is such an important parameter. For example, if only 2 - 8% of accusations are false, ought we choose a value of pai such that 92 - 98% of those accused of sexual assault by one person are guilty -- either the accusation is false or the person is guilty? Or is there some third possibility -- perhaps that the victim is telling the truth but their story does not meet a legal standard?
[4] For example, there probably there aren’t really two clearly separate populations --  there’s some continuous distribution of propensities to assault, and number of people you have sexual encounters with.

Thursday, November 27, 2014

Ferguson FAQ

Recently I published an analysis of the Ferguson conflict that showed, using Twitter data, that there was a “red group” and a “blue group” who rarely talked to each other, thought very different things, came from very different backgrounds, and often were uncivil even when they did talk. Thanks to everyone who wrote to me about the analysis! Here are answers to the most common questions I’ve received.

What data did you use?

215,000 tweets containing the Ferguson hashtag collected between November 17th and 19th (prior to the announcement of the verdict).

What tools did you use to collect the data?

Python -- specifically, the tweepy library and a program I wrote which you can find here (described at more length here).

What tools did you use to analyze the data and make the visualization?

Python for analysis; Gephi for visualization. See Gilad Lotan’s excellent tutorial on how to use Gephi to analyze Twitter data.

How did you divide Tweeters into red and blue groups?

I used Gephi's community detection algorithm (on the adjacency matrix for the most frequent tweeters, where Mij was 1 if tweeter i had mentioned tweeter j in a tweet), sometimes known as the Louvain method. Essentially, this divides Tweeters into groups that mention each other frequently.

Regarding whether this grouping is valid: as I note in the piece, I am mindful of the fact that there are many ways to group data, and I think this is worth exploring further. One problem we always face is how many groups there are (see here and here). You can always sort of make it look like people hate each other by clustering the data into groups even if there isn’t necessarily any separation between the groups -- this is something to be wary of when looking at analyses like this one.

But I think several pieces of evidence (in addition to Gephi's striking visual) point to the validity of the red / blue division. The fact that the two groups are associated with the tweeters’ self-descriptions (like race and political affiliation) is revealing; the fact that the two groups are associated with tweeting different things is also revealing (and by no means something I expected to see -- for example, if you divide Twitter datasets by gender, you will frequently find that men and women tweet essentially similar things). This evidence is powerful because it is external -- it was not used to come up with the grouping, but it supports it.

In general, we often bring in such external evidence to argue that a grouping is valid. For example, in a biological analysis we might cluster genes into groups that show similar expression patterns (group A highly expressed in the liver and not in the lungs; group B highly expressed in the lungs and not in the liver). We would be more sure that the groups we had found were “real” if there was external evidence like a transcription factor that was known to turn on all the genes in group A, or a biological function that was common to all the genes in group A.

You said the blue cluster is much larger than the red cluster. What happens if you break down the blue cluster further?

I don’t know! Someone should figure this out.

Can I see your data or code?

Yes. I cannot make the data publicly available because of Twitter’s terms of service, but if you are a researcher with a project, shoot me an email. In addition to the two days of data used in this analysis, I also have several million tweets both from several months ago, when Ferguson initially made the news, and from after the verdict was announced.

As always, if you work at Twitter and have any objection to any of this, please email me -- I am acting in good faith and more than happy to comply with your requests.

Saturday, November 1, 2014

Why I'm Not Flirting with Lesbians In Central Park

I have flown across the ocean to become a Very Serious Oxford Student who can read two books at once, tassel swinging:

I have been told that Oxford will actually expel me for wearing that hat. Today is my one-month anniversary of arriving in England and I’ve decided that I should write a piece or two about what I’ve learned here, in part to confirm to my family that I’m still alive. If you just want statistics, please skip this post and I promise the next one will have lots and lots of p-values.

I have never before gone weeks half-wondering if I’m dreaming. At first I thought it was just jetlag or social exhaustion, but I’ve come to realize that it’s something longer-term: I never fully understood that filling out those scholarship forms meant I would, in fact, fly across a real ocean and attend a real university. So when I sit at formal hall eating smoked duck and drinking white wine in a building about 40 times older than I am, part of me believes that I am, in this well-named “city of dreaming spires”, still asleep. That, of course, is a good dream.
Lesson 1: we forget how many ways there are to live a life. Keeping sane, I think, requires becoming willfully blind to possible lives. Eg, at the moment I am a long-haired computer science researcher in a committed straight relationship; but if I wanted to, by tomorrow I could be a spiky-haired harmonica-player flirting with lesbians in Central Park. In theory. But, of course, I don’t really consider that possibility, because it’s terrifying and paralyzing to constantly consider dumping your boyfriend, switching careers, and crossing an ocean; I get pretty overloaded just deciding what to eat for lunch. And because the grass is always greener I imagine that if we really did discard personas so lightly, we’d often do so prematurely.

But I worry that instead we go too far in the other direction. In Silicon Valley, at least, it’s easy to develop a tunnel vision which I will summarize in the following table. The middle column is somewhat hyperbolic [1], but the right column is (at least loosely) based on actual conversations I have had with people in Oxford.
Complete The Following Sentence
Answer in Silicon Valley
Other Possible Answers
“The fundamental problem is…”
“...our MySQL server won’t sync with the cloud.”
“...the lack of objective morality in a post-modern world.”
“You have to be careful when you sneak into…”
“...the front of the line at the Google cafeteria.”
“You can use social media to…”
“...disrupt the groups-larger-than-three-but-smaller-than-five space.”
“...represent the parents of the children who died at Newtown.”

There is such a range of ways to live! People here put on black robes for dinner and say grace in Latin and sit at “high table” so they can look down on us mortals and it all seems so absurd to me but they have been doing this for eight hundred years. And at the Oxford Union, the debating society, I see eighteen-year-olds in tuxedos giving grandiose speeches on subjects they don’t understand, playing at being members of parliament, and again it seems absurd to me -- but there’s a decent chance they really will be members of parliament. (I have also, incidentally, seen and heard of more sexism, racism, and classism in a month here than I did in a year working in tech companies, but we can talk about that another time.)

Perhaps more important, I think, than these differences in lifestyle is the diversity in worldviews. Part of this I’ve seen from the people who come to speak at Oxford: three-star generals who stand up and defend the Iraq War and Jan Brewer who says that the only thing Obama has done right is “be a good father”. Part of it is due to the other Rhodes scholars. It's nice to meet a bunch of people who don't, usually, code, and hear what it's really like to march in Ferguson and how one sneaks into Burma and what the hell is going on with Turkey and how you get water to remote Latin American towns and why you need boots on the ground to conduct an airstrike at all and why it’s so hard to prosecute war crimes and...

The part that really bakes my noodle is that, of course, even this relative diversity is only a tiny slice of human experience. In Palo Alto they drink $4 coffee, and in Oxford they drink $4 tea: this is a long way from how most people live. I realized that I could not remember the last time I’d had a long conversation with someone who hadn’t gone to college. (Can you?) Perhaps this shouldn’t surprise me, given my previous work on how birds of a feather flock together; we are astonishingly good at self-segregation, and we build complex mechanisms to facilitate it. After I did the birds-of-a-feather work, I was somewhat troubled to find that someone had used my results to support their dating app that only allows in elites. I’m now at a university where iron gates separate the black-robed students from the beggars outside, where even the way someone speaks is a clue to their class; I don’t think we need to build more walls.

Anyway, hit me up if you’re in Oxford and, assuming I don’t get hit by a car on the wrong side of the road, I’ll keep you posted on the other things I learn in England; also, if you have cool ideas for using statistics to understand the British, shoot me an email.


[1] I should also mention that Stanford, of course, has very strong humanities departments and students (indeed, the other two Rhodes scholars from my year studied history and political science) even if Palo Alto feels extremely tech-focused.
[2] I should perhaps clarify that I do not believe one needs to have spiky hair, or play the harmonica, to flirt successfully with lesbians in Central Park. Indeed, I don't have any idea how one flirts with lesbians in Central Park, or even if there are any to flirt with. Sadly, for the reasons discussed above, I will probably remain ignorant, but feel free to enlighten me.