Obsession with Regression: November 2014

Recently I published an analysis of the Ferguson conflict that showed, using Twitter data, that there was a “red group” and a “blue group” who rarely talked to each other, thought very different things, came from very different backgrounds, and often were uncivil even when they did talk. Thanks to everyone who wrote to me about the analysis! Here are answers to the most common questions I’ve received.

What data did you use?

215,000 tweets containing the Ferguson hashtag collected between November 17th and 19th (prior to the announcement of the verdict).

What tools did you use to collect the data?

Python -- specifically, the tweepy library and a program I wrote which you can find here (described at more length here).

What tools did you use to analyze the data and make the visualization?

Python for analysis; Gephi for visualization. See Gilad Lotan’s excellent tutorial on how to use Gephi to analyze Twitter data.

How did you divide Tweeters into red and blue groups?

I used Gephi's community detection algorithm (on the adjacency matrix for the most frequent tweeters, where Mij was 1 if tweeter i had mentioned tweeter j in a tweet), sometimes known as the Louvain method. Essentially, this divides Tweeters into groups that mention each other frequently.

Regarding whether this grouping is valid: as I note in the piece, I am mindful of the fact that there are many ways to group data, and I think this is worth exploring further. One problem we always face is how many groups there are (see here and here). You can always sort of make it look like people hate each other by clustering the data into groups even if there isn’t necessarily any separation between the groups -- this is something to be wary of when looking at analyses like this one.

But I think several pieces of evidence (in addition to Gephi's striking visual) point to the validity of the red / blue division. The fact that the two groups are associated with the tweeters’ self-descriptions (like race and political affiliation) is revealing; the fact that the two groups are associated with tweeting different things is also revealing (and by no means something I expected to see -- for example, if you divide Twitter datasets by gender, you will frequently find that men and women tweet essentially similar things). This evidence is powerful because it is external -- it was not used to come up with the grouping, but it supports it.

In general, we often bring in such external evidence to argue that a grouping is valid. For example, in a biological analysis we might cluster genes into groups that show similar expression patterns (group A highly expressed in the liver and not in the lungs; group B highly expressed in the lungs and not in the liver). We would be more sure that the groups we had found were “real” if there was external evidence like a transcription factor that was known to turn on all the genes in group A, or a biological function that was common to all the genes in group A.

You said the blue cluster is much larger than the red cluster. What happens if you break down the blue cluster further?

I don’t know! Someone should figure this out.

Can I see your data or code?

Yes. I cannot make the data publicly available because of Twitter’s terms of service, but if you are a researcher with a project, shoot me an email. In addition to the two days of data used in this analysis, I also have several million tweets both from several months ago, when Ferguson initially made the news, and from after the verdict was announced.

As always, if you work at Twitter and have any objection to any of this, please email me -- I am acting in good faith and more than happy to comply with your requests.

I have flown across the ocean to become a Very Serious Oxford Student who can read two books at once, tassel swinging:

I have been told that Oxford will actually expel me for wearing that hat. Today is my one-month anniversary of arriving in England and I’ve decided that I should write a piece or two about what I’ve learned here, in part to confirm to my family that I’m still alive. If you just want statistics, please skip this post and I promise the next one will have lots and lots of p-values.

I have never before gone weeks half-wondering if I’m dreaming. At first I thought it was just jetlag or social exhaustion, but I’ve come to realize that it’s something longer-term: I never fully understood that filling out those scholarship forms meant I would, in fact, fly across a real ocean and attend a real university. So when I sit at formal hall eating smoked duck and drinking white wine in a building about 40 times older than I am, part of me believes that I am, in this well-named “city of dreaming spires”, still asleep. That, of course, is a good dream.

Lesson 1: we forget how many ways there are to live a life. Keeping sane, I think, requires becoming willfully blind to possible lives. Eg, at the moment I am a long-haired computer science researcher in a committed straight relationship; but if I wanted to, by tomorrow I could be a spiky-haired harmonica-player flirting with lesbians in Central Park. In theory. But, of course, I don’t really consider that possibility, because it’s terrifying and paralyzing to constantly consider dumping your boyfriend, switching careers, and crossing an ocean; I get pretty overloaded just deciding what to eat for lunch. And because the grass is always greener I imagine that if we really did discard personas so lightly, we’d often do so prematurely.

But I worry that instead we go too far in the other direction. In Silicon Valley, at least, it’s easy to develop a tunnel vision which I will summarize in the following table. The middle column is somewhat hyperbolic [1], but the right column is (at least loosely) based on actual conversations I have had with people in Oxford.

Complete The Following Sentence	Answer in Silicon Valley	Other Possible Answers
“The fundamental problem is…”	“...our MySQL server won’t sync with the cloud.”	“...the lack of objective morality in a post-modern world.”
“You have to be careful when you sneak into…”	“...the front of the line at the Google cafeteria.”	“...Syria.”
“You can use social media to…”	“...disrupt the groups-larger-than-three-but-smaller-than-five space.”	“...represent the parents of the children who died at Newtown.”

There is such a range of ways to live! People here put on black robes for dinner and say grace in Latin and sit at “high table” so they can look down on us mortals and it all seems so absurd to me but they have been doing this for eight hundred years. And at the Oxford Union, the debating society, I see eighteen-year-olds in tuxedos giving grandiose speeches on subjects they don’t understand, playing at being members of parliament, and again it seems absurd to me -- but there’s a decent chance they really will be members of parliament. (I have also, incidentally, seen and heard of more sexism, racism, and classism in a month here than I did in a year working in tech companies, but we can talk about that another time.)

Perhaps more important, I think, than these differences in lifestyle is the diversity in worldviews. Part of this I’ve seen from the people who come to speak at Oxford: three-star generals who stand up and defend the Iraq War and Jan Brewer who says that the only thing Obama has done right is “be a good father”. Part of it is due to the other Rhodes scholars. It's nice to meet a bunch of people who don't, usually, code, and hear what it's really like to march in Ferguson and how one sneaks into Burma and what the hell is going on with Turkey and how you get water to remote Latin American towns and why you need boots on the ground to conduct an airstrike at all and why it’s so hard to prosecute war crimes and...

The part that really bakes my noodle is that, of course, even this relative diversity is only a tiny slice of human experience. In Palo Alto they drink $4 coffee, and in Oxford they drink $4 tea: this is a long way from how most people live. I realized that I could not remember the last time I’d had a long conversation with someone who hadn’t gone to college. (Can you?) Perhaps this shouldn’t surprise me, given my previous work on how birds of a feather flock together; we are astonishingly good at self-segregation, and we build complex mechanisms to facilitate it. After I did the birds-of-a-feather work, I was somewhat troubled to find that someone had used my results to support their dating app that only allows in elites. I’m now at a university where iron gates separate the black-robed students from the beggars outside, where even the way someone speaks is a clue to their class; I don’t think we need to build more walls.

Anyway, hit me up if you’re in Oxford and, assuming I don’t get hit by a car on the wrong side of the road, I’ll keep you posted on the other things I learn in England; also, if you have cool ideas for using statistics to understand the British, shoot me an email.

Notes:

[1] I should also mention that Stanford, of course, has very strong humanities departments and students (indeed, the other two Rhodes scholars from my year studied history and political science) even if Palo Alto feels extremely tech-focused.
[2] I should perhaps clarify that I do not believe one needs to have spiky hair, or play the harmonica, to flirt successfully with lesbians in Central Park. Indeed, I don't have any idea how one flirts with lesbians in Central Park, or even if there are any to flirt with. Sadly, for the reasons discussed above, I will probably remain ignorant, but feel free to enlighten me.

Obsession with Regression

Thursday, November 27, 2014

Ferguson FAQ

Saturday, November 1, 2014

Why I'm Not Flirting with Lesbians In Central Park