Using Baselines for Algorithm Audits
Jennifer A. Stark, Ph.D.
Nicholas Diakopoulos, Ph.D.
Science!
Lab-Coat Science
versus
Social Science
What we wanted to do:
Audit Google's main results page for biased representation of 2016 US presidential candidates
Hillary Clinton
& Donald Trump
Appearance of each candidate (gender bias?)
Political leaning of news source of the image (political bias?)
Candidates' Emotions:
Research shows that women in news media smile more than men.
Representation of emotions:
Candidates' Emotions:
Clinton's expressions are mostly happy
Does that mean that Google is biased?
Candidates' Emotions:
Trump's expressions are mostly neutral
Does that mean that Google is biased?
Representation of political ideology
News Sources
Clinton image sources mostly liberal
Trump image sources mostly liberal
What's the big story?
"Google purposely depicts Clinton as happy more than her male counterpart"
or
Clinton really is happier in general
"Google privileges images from liberal sites"
or
Liberal sources are more prevelant*?
* As indexed by Google
We need an expectation...
... a B A S E L I N E ...
A Google U N I V E R S E
Using a Baseline
Assumed Google Images to include all Google's indexed images - image universe!
Representation of dominant emotion
Automated image source political bias ratings
Allsides + Facebook
+ Manual ratings
MondoTimes + my assessment
Representation of political ideology:
What's the big story, now?
Representation of candidates' emotions in the image box ≠ Google's "universe".
Distribution of news sources across ideology in the image box ≠ Google's "universe".
Bias may have been introduced by Google's image selection algorithm.
Limitations
Emotion: Bias may also be introduced by photogs or editors
Emotion: May be bias introduced by Microsoft Emotion API
Emotion: Not all baseline images were emotion-rated.
Sources: Not all baseline sources were bias-rated / rate-able.
Sources: Not all sources were represented in the baseline.