Using Baselines for Algorithm Audits

Jennifer A. Stark, Ph.D.

Nicholas Diakopoulos, Ph.D.

Science!

Lab-Coat Science

versus

Social Science

What we wanted to do:

Audit Google's main results page for biased representation of 2016 US presidential candidates Hillary Clinton & Donald Trump

Appearance of each candidate (gender bias?)

Political leaning of news source of the image (political bias?)

Candidates' Emotions:

Research shows that women in news media smile more than men.

Representation of emotions:

Candidates' Emotions:

Clinton's expressions are mostly happy

Does that mean that Google is biased?

Candidates' Emotions:

Trump's expressions are mostly neutral

Does that mean that Google is biased?

News Sources

Representation of political ideology

News Sources

Clinton image sources mostly liberal

Trump image sources mostly liberal

What's the big story?

"Google purposely depicts Clinton as happy more than her male counterpart"

Clinton really is happier in general

"Google privileges images from liberal sites"

Liberal sources are more prevelant*?

* As indexed by Google

We need an expectation...

... a B A S E L I N E ...

A Google U N I V E R S E

Using a Baseline

Assumed Google Images to include all Google's indexed images - image universe!

Representation of dominant emotion

Automated image source political bias ratings

Allsides + Facebook

+ Manual ratings

MondoTimes + my assessment

Representation of political ideology:

What's the big story, now?

Representation of candidates' emotions in the image box ≠ Google's "universe".

Distribution of news sources across ideology in the image box ≠ Google's "universe".

Bias may have been introduced by Google's image selection algorithm.

Limitations

Emotion: Bias may also be introduced by photogs or editors

Emotion: May be bias introduced by Microsoft Emotion API

Emotion: Not all baseline images were emotion-rated.

Sources: Not all baseline sources were bias-rated / rate-able.

Sources: Not all sources were represented in the baseline.

Thank you!

@_JAStark

jastark@protonmail.com

Conference abstract:

"Using Baselines for Algorithm Audits", pg8

Code on GitHub:

comp-journalism/Baseline_Problem_for_Algorithm_Audits