Using Baselines for Algorithm Audits

Jennifer A. Stark, Ph.D.

Nicholas Diakopoulos, Ph.D.


Lab-Coat Science


Social Science

What we wanted to do:

Audit Google's main results page for biased representation of 2016 US presidential candidates Hillary Clinton & Donald Trump

  • Appearance of each candidate (gender bias?)
  • Political leaning of news source of the image (political bias?)
  • Candidates' Emotions:

    Research shows that women in news media smile more than men.
    Representation of emotions:

    Candidates' Emotions:

  • Clinton's expressions are mostly happy
  • Does that mean that Google is biased?
  • Candidates' Emotions:

  • Trump's expressions are mostly neutral
  • Does that mean that Google is biased?
  • News Sources

    Representation of political ideology

    News Sources

  • Clinton image sources mostly liberal
  • Trump image sources mostly liberal
  • What's the big story?

  • "Google purposely depicts Clinton as happy more than her male counterpart"
  • or
  • Clinton really is happier in general

  • "Google privileges images from liberal sites"
  • or
  • Liberal sources are more prevelant*?

  • * As indexed by Google

    We need an expectation...

    ... a B A S E L I N E ...

    A Google U N I V E R S E

    Using a Baseline

    Assumed Google Images to include all Google's indexed images - image universe!
    Representation of dominant emotion

    Automated image source political bias ratings

  • Allsides + Facebook
  • + Manual ratings

  • MondoTimes + my assessment
  • Representation of political ideology:

    What's the big story, now?

  • Representation of candidates' emotions in the image box ≠ Google's "universe".
  • Distribution of news sources across ideology in the image box ≠ Google's "universe".

  • Bias may have been introduced by Google's image selection algorithm.
  • Limitations

  • Emotion: Bias may also be introduced by photogs or editors
  • Emotion: May be bias introduced by Microsoft Emotion API
  • Emotion: Not all baseline images were emotion-rated.
  • Sources: Not all baseline sources were bias-rated / rate-able.
  • Sources: Not all sources were represented in the baseline.
  • Thank you!


    Conference abstract:

    "Using Baselines for Algorithm Audits", pg8

    Code on GitHub: