Obsession with Regression: Five things I learned from counting 900 engineers at Google I/O

Thursday, May 19, 2016

Five things I learned from counting 900 engineers at Google I/O

I wrote this in a few hours to release it during the conference it discusses, so treat its conclusions with the appropriate grain of salt and let me know if you find mistakes.

Google gave me a ticket to their annual developer conference, Google I/O. I am a computer scientist but not really a developer (edit: for the benefit of the person on Hacker News alleging "reverse discrimination" -- I won the ticket in a coding competition. Maybe be a little careful throwing around statements like that. I describe myself as “not a developer” because I’m better at other aspects of computer science, not because I can’t code) so I decided that, rather than bugging people about full stack development or cross-platform coding, I could make myself more useful by analyzing diversity data. By this I mean I spent 6 hours wandering into various conference venues and tapping “F” on my phone when I saw a woman and “M” when I saw a man; in total I counted 916 people. Shoutout to my extremely tolerant housemate Andrew Suciu, who received all these messages:

I focused on gender because I didn’t think I could guess people’s race with high accuracy. (I realize, of course, that inferring gender from appearance also has serious caveats, but the data is useful to collect and I wasn’t going to interrogate random strangers about their gender identity). Here are five things I learned.

Women were unexpectedly well-represented. 29% of people I counted were women. (I tweeted at Google to get the official numbers and will update if they reply.) That means women were better-represented in my data than they are, for example, as software engineers at 80% of tech companies, or among software developers overall (21%) in Labor Department statistics, or among Google engineers (17%). I am puzzled by this and welcome your explanations. From what I saw at the conference Google made pretty good efforts on the gender diversity front: a) posting a very large sign with anti-harassment guidelines b) giving women free tickets to the conference (that’s how I ended up there) c) featuring professional woman emoticons in the keynote and d) having three women speak in the keynote.
There was surprisingly little variation in gender ratio between conference events. I computed gender ratios at about a dozen conference events, and they were considerably more stable than I expected them to be -- almost always within 10% of the overall average of 29%. (Whether the variation is even statistically significant depends on how exactly you define the categories -- perils of categorical f-tests and p-values!) Full data at the end of the piece. This is considerably less dramatic than, say, gender variation across developer subfields in Stack Overflow’s developer survey. One explanation might be that at a conference, people wander randomly into lots of events, which homogenizes gender ratios by adding noise.
Women cluster together. We don’t just have the total counts of women: we also have the groupings of women, because I tapped “M” and “F” in the order that I saw people, and for lines that order is meaningful. (In cases where people are just sitting around, it’s a little more arbitrary, so I exclude that data from this analysis.) So if, for example, I tapped “MMMMMFFFFFFFF” that would be highly grouped data -- all the men are together and so are all the women. So we can ask whether the women group together more than we would expect if the line were just in random order, and it turns out they do (statistical details here [3]). At some events I could see this clearly without statistics -- at the machine learning office hours, for example, one table had only 1 woman out of 18, and the other table had 7 women out of 13 (fine, statistics: Fisher’s exact test p = .004, t-test p = .002). I think a large driver of the clustering is probably that women arrive in groups because they work at a company together, not that they preferentially connect at the conference, but the latter could play a role as well. (Anecdotally, three of the four people who spoke to me during the conference were women.)
Live-blogging is perilous. When I arrived at the conference at 8:30 AM, about an hour and a half before it started, a quick headcount implied that 90 - 95% of attendees were men, and I posted this online. But as the conference progressed and I got more data, it became clear the early figure was too skewed. I regret not waiting to get more data before posting. While I was clear about the lack of data, there was no advantage to posting so quickly. I often think about this when people rapidly tweet their reactions to complex events. It doesn’t matter how smart you are; you’ll still write a better piece if you reflect. And I realize, of course, that sometimes you have to work very quickly because an event demands it -- I wrote this post in a few hours so I could publish it during the conference, so take my statements with a grain of salt -- but I still wish we took more time to think.
Machine learning should not just be used for takeout delivery.

Stand back, I’m going to use a metaphor. Imagine King Arthur came back to the castle to find his son cutting pizza with Excalibur.

Arthur: Son, that is literally the magical blade of destiny I pulled from the stone to to become king of England.

Son: Yeah, dad, but it cuts pizza really well.

Arthur: Sure, but can’t you think of anything more exciting to do with it?

This is how I feel about a lot of applications of machine learning, which I’m using here to mean “statistical methods that computers use to do cool things like learn to do complex tasks and understand images / text / speech”. Machine learning is revolutionary technology. You can use it to build apps that will understand “I want curry” and “play Viva la Vida”, but are those really the examples you want to highlight, as the conference keynote did? Let’s talk instead about how we can use machine learning to pick out police encounters before they become violent and stop searching and jailing so many innocent people; let’s talk about catching cancer using a phone’s snapshot of a mole or Parkinson’s using an accelerometer’s tremor or heart disease using a phone-based heart monitor. Those are the technologies that deserve the label “disruptive”, the you’re-a-goddamn-wizard-Harry applications that make your heart freeze. A few moments of the two-hour keynote emphasized big ideas -- the last ten minutes, an app to help Syrians relocate -- but most of the use cases were fixes to first-world problems [3].

Part of this, of course, is that it’s probably more profitable to drone-deliver San Franciscans Perrier than to bring people in Flint any water at all. But part of it is that people’s backgrounds influence what they choose to create. A woman would’ve been less likely to create this app which lets you stalk a random stranger, and someone who’d suffered from racism or classism would’ve been less likely to create this app which lets you identify “sketchy” areas. Machine learning is revolutionary technology, but if you want to use it to create revolutionary products, you need people who want revolution -- people who regularly suffer from deeper shortcomings in the status quo than waiting 15 minutes for curry [4].

Notes:

[1] I’m not that worried about double-counting because there were 7,000 people at the conference.

[2] Call each MMMFMFMFMF... vector corresponding to a line at an event Li. Then I use the following bootstrapping procedure: compute a clustering statistic which I’ll describe below, randomly permute each Li separately, recompute the clustering statistic, repeat a few hundred times and compare the true clustering statistic to the random clustering statistics. (I permute each Li separately because if you mix them all together, you’ll be confounded by event gender differences). I tried two clustering statistics: first, the number of F’s who were followed by an F, and second, the bucket statistic, which is defined as bucket(Li, n, k): the number of bins of size n within Li that contain at least k women, with k chosen to be at least half of n. I used the second statistic because I thought I might automatically group women together when there was ambiguity in the order of the line (people standing side by side) which I was worried would bias into the first statistic since it’s very local. For both statistics, the true clustering statistic was higher than the average random clustering statistic (regardless of the value of n, k I chose for the second statistic). For the first statistic, this was true for pretty much every random iterate, and for the second statistic, the percentage of random iterates it was true varied for depending on n and k, averaging about 90% of iterates.

[3] This is based only on the keynote, not the rest of the conference. I think this is reasonable because the keynote was watched by millions of people, it was run by the CEO of Google, and a speech like that should reflect what you want to emphasize.

[4] To be clear, I’m not really blaming Google either for the trivial-app problem (although I think their examples could’ve been better chosen, as a company they do a lot of amazing things) or the lack-of-diversity problem; they’re industry-wide symptoms. But that isn’t really reassuring.

Full data:

People entering keynote speech, 9:30 AM: 65 / 236 are women, 28% .

People exiting keynote speech, 12:00 PM: 25 / 88, 28%.

High performance web user interfaces line: 21 / 58, 36%

Accessibility office hours: 10 / 40, 25%

Machine learning office hours: 8 / 31, 26%

People sitting around on grass: 33 / 87, 38%

Access and empathy tent: 13 / 38, 34%

Android studio / google play: 6 / 46, 13%

Making music station: 4 / 18, 22%

Project loon (giant balloon): 5 / 11, 45%

Android experiments (random cute stuff): 15 / 46, 33%

Audience arranged in ring around robotic arm spewing paint to music: 17 / 48, 35%

Android pay everywhere line: 6 / 25, 24%

Engineering cinematic experiences in VR line: 15 / 62, 24%

Devtools on rails line: 24 / 82, 29%

22 comments:

AnonymousMay 19, 2016 at 1:54 PM
For the stats, I might reach for a runs test: https://en.wikipedia.org/wiki/Wald–Wolfowitz_runs_test or a chi-square test (observed data is the number of ff, fm, mf, mm pairs, expected data is p(f)*p(f)*n, p(f)*p(m)*n, etcetc)
ReplyDelete
Replies
Mike Dolan FlissMay 19, 2016 at 3:08 PM
I consistently love your blog, and this is no exception. Thanks for towing the social justice line on this machine learning stuff. One of my projects involves using public health injury data and police stop data to suggest new policing strategies that are less explicitly or implicitly racist and more life-saving. We're really going in that direction - where "You're a goddam wizard harry!" becomes a rallying cry, bringing on the fly analysis and decision making to core activities. Cool stuff - here's to hoping it's got more powerful of an impact than "my perrier drone is late."
- Mike, a social justice / environmental epidemiologist from NC
ReplyDelete
Replies
Matt DickJune 17, 2016 at 3:04 PM
I'm guessing that women were well represented at the conference because (apparently from the way you ended up there), tickets were given out on merit, so the only bias is awareness and self-selection.

I guess the judges of coding contests could know the genders of the authors, but I'm guessing gender-bias is dampened when someone is evaluating lots of code.
ReplyDelete
Replies
BloggerSeptember 30, 2017 at 3:23 PM
QUANTUM BINARY SIGNALS

Professional trading signals delivered to your cell phone daily.

Follow our trades today and gain up to 270% daily.
ReplyDelete
Replies
Gigolo call boy Play boy and sax.bodey masaaj services provider all India ki all city's Mey available koi Dhokha dhadhi nhi (100% Garnted work full sef secret,,) (call me 9336801280 full details)December 5, 2019 at 1:15 PM
Call now 09336801280 call boy Play boy service provider all India ki all City mey services available full secret service provider koi Dhokha dhadhi nhi full maza Masti key saath pasaa kamye (50000 sey 90000 per month)call me 09336801280Call now 09336801280 call boy Play boy service provider all India ki all City mey services available full secret service provider koi Dhokha dhadhi nhi full maza Masti key saath pasaa kamye (50000 sey 90000 per month)call me 09336801280Call now 09336801280 call boy Play boy service provider all India ki all City mey services available full secret service provider koi Dhokha dhadhi nhi full maza Masti key saath pasaa kamye (50000 sey 90000 per month)call me 09336801280Call now 09336801280 call boy Play boy service provider all India ki all City mey services available full secret service provider koi Dhokha dhadhi nhi full maza Masti key saath pasaa kamye (50000 sey 90000 per month)call me 09336801280Call now 09336801280 call boy Play boy service provider all India ki all City mey services available full secret service provider koi Dhokha dhadhi nhi full maza Masti key saath pasaa kamye (50000 sey 90000 per month)call me 09336801280Call now 09336801280 call boy Play boy service provider all India ki all City mey services available full secret service provider koi Dhokha dhadhi nhi full maza Masti key saath pasaa kamye (50000 sey 90000 per month)call me 09336801280Call now 09336801280 call boy Play boy service provider all India ki all City mey services available full secret service provider koi Dhokha dhadhi nhi full maza Masti key saath pasaa kamye (50000 sey 90000 per month)call me 09336801280Call now 09336801280 call boy Play boy service provider all India ki all City mey services available full secret service provider koi Dhokha dhadhi nhi full maza Masti key saath pasaa kamye (50000 sey 90000 per month)call me 09336801280
ReplyDelete
Replies
PornthepApril 29, 2020 at 9:22 AM
Very interesting.
หนังเอเชีย
ReplyDelete
Replies
Quirinus Solution LtdJanuary 4, 2021 at 2:36 AM
Social Science Assignment Help UK
Assignment Help
UK Assignment Writing Services
Online Assignment Help UK
Assignment Help UK
Economics Assignment Help
Assignment Writing Services
Assignment Writing Services UK
Data Mining Assignment Help
ReplyDelete
Replies
adminJune 2, 2021 at 3:38 AM
google 3405
google 3406
google 3407
google 3408
google 3409
google 3410
google 3411
ReplyDelete
Replies
adminJune 2, 2021 at 3:39 AM
google 3412
google 3413
google 3414
google 3415
google 3416
google 3417
google 3418
ReplyDelete
Replies
Mark SpencerJanuary 23, 2022 at 6:52 AM
It was so good to see you acknowledging this topic, it really feels great. Thanks for sharing such a valuable information which is very hard to find normally. I have subscribed to your write my paper for me website and will be promoting it to my friends and other people as well.
ReplyDelete
Replies
SlotFebruary 23, 2022 at 6:01 AM
ทุกๆอย่าง
pg slot zone
ReplyDelete
Replies
KurapikaMarch 5, 2022 at 12:42 AM
ทั้งคืน
JOKER GAMING โจ๊กเกอร์สล็อต
ReplyDelete
Replies
SlotpgMarch 18, 2022 at 4:36 PM
ได้หรือเปล่า สล็อต เว็บ ใหญ่ ที่สุด pg
ReplyDelete
Replies
hameedudhaaamApril 11, 2022 at 4:26 PM
I trully appretiate your work and tips given by you is helpful to me. I will share this information with my family & friends. This is a great website, keep the positive reviews coming. This is a great inspiring .I am pretty much pleased with your good work. You put really very helpful information. I am looking custom assignment services to reading your next post. !!!!
ReplyDelete
Replies
hameedudhaaamApril 11, 2022 at 4:37 PM
If we are going to talk about murals, it is a harsh reality that there are people who do not appreciate artworks like this. I fell bad, but there is less that I can do; I want them to realize that murals are beautiful and worth appreciating. But I simply don't know where to start. Sometimes, I think of writing as my way to influence people to appraiser things that should be appreciated. I don't know if that is going to be effective, uk assignment writing but I want to give it a try.
ReplyDelete
Replies
hameedudhaaamApril 22, 2022 at 3:14 PM
Thanks for sharing this best stuff with us! Keep sharing! I am new in the blog writing. All types blogs and posts are not helpful for the readers. Here the author is giving good thoughts and suggestions to each and every reader through this article. literature-essay essay writing help Quality of the content is the main element of the blog and this is the way of writing and presenting.
ReplyDelete
Replies
Mark SpencerMay 25, 2022 at 4:31 AM
If we are going to talk about murals, it is a harsh reality that there are people who do not appreciate artworks like this. I feel bad, but there is less that I can do; I want them to realize that murals are beautiful and do my dissertation worth appreciating. But I simply don't know where to start. Sometimes, I think of writing as my way to influence people to appraiser things that should be appreciated. I don't know if that is going to be effective, but I want to give it a try.
ReplyDelete
Replies
รวมสล็อตสุดฮิตMay 1, 2023 at 12:05 AM
pg slot เครดิต ฟรี 100 ล่าสุด เครดิตฟรี 100 เป็นโปรโมชั่นดีๆจากทาง PG SLOT ที่สมาชิกนั้นไม่จำเป็นที่ต้องแชร์โพสต์หรือกดไลก์เพจไหนโบนัสต้อนรับสมาชิกใหม่เล่นง่ายได้เงินจริง
ReplyDelete
Replies
RobertSeptember 11, 2023 at 4:50 AM
It might be my favorite essay on feminism.

Pay Someone To Do My Online Course
ReplyDelete
Replies
Mobile app development companyJanuary 19, 2024 at 2:26 AM
Good post. Thanks for sharing with us. I want to share the information about the Top app development company. Top app development companies involve clients in the development process, fostering a collaborative approach. This ensures that the final product aligns with the client's vision and meets their specific business objectives. Furthermore, these top companies provide ongoing support and maintenance, addressing any issues that may arise and keeping the app up-to-date with evolving technologies.
ReplyDelete
Replies
Astroport GlobalJune 25, 2024 at 9:43 PM
The best resort in Jim Corbett National Park is often considered to be the Jim’s Jungle Retreat. Nestled on the fringes of the park, this eco-friendly resort offers a luxurious yet rustic experience amidst nature. The retreat features charming cottages, lodges, and tents that blend seamlessly with the natural surroundings. Guests can enjoy guided wildlife safaris, nature walks, bird watching, and even a refreshing dip in the outdoor pool.

The resort’s commitment to sustainability and conservation is evident in its operations and community involvement. With top-notch amenities, delicious cuisine, and warm hospitality, Jim’s Jungle Retreat provides an unforgettable stay, making it a perfect gateway to explore the wildlife and serenity of Jim Corbett National Park.
ReplyDelete
Replies
Phone Number GeneratorAugust 11, 2024 at 4:48 AM
A Chinese Character Counter is an essential tool for those working with Mandarin text, allowing you to efficiently count the number of characters, which is especially useful for meeting writing guidelines and formatting requirements in Chinese.
ReplyDelete
Replies

Add comment