Amazon’s gender-biased recruiting software is a wake-up call

The recent news that Amazon inadvertently created gender-biased software for screening job applicants is a significant wake-up call for all organizations using AI. The software, which used machine learning to rank incoming resumes by comparison to resumes from people Amazon had already hired, could have discouraged recruiters from hiring women solely on the basis of their gender. Amazon, of all entities, should have known better. It should have expected and avoided this. If this can happen to Amazon, the question we really need to ask is: how many others are making the same mistake?

the wall
Photo by Rodion Kutsaev on Unsplash

Bias in hiring is a burden for our society as a whole, for tech companies in particular, and for Amazon specifically. Biased recruiting software exposes Amazon to a number of risks, among them:

  • If it hires less qualified male employees after excluding more qualified women, Amazon reduces the quality of its own workforce.
  • By employing fewer women, Amazon becomes less attractive to potential hires who prefer gender-diverse teams. (Anecdotally, a friend of mine who recently interviewed with Amazon told me that having all 7 out of 7 of their interviewers turn out to be male was a big red flag.)
  • Amazon could lose some number of customers who value gender diversity and prefer to patronize more gender diverse companies.
  • Less diverse teams frequently struggle to develop products that “work” properly for customers who don’t resemble themselves.
  • According to McKinsey research, companies that rank low in workforce diversity achieve poorer financial results than companies with higher diversity.
  • While there’s no indication that Amazon broke any laws here (and it claims that the software was only used on a trial basis), each new story like this lends ammunition to advocates of increased government regulation of AI, which Amazon might prefer to avoid rather than encourage.

What pushes this over the edge from ugly to outrageous is the fact that it was completely avoidable. To anyone familiar with both machine learning and bias in hiring the very premise of this software is flawed on its face. Think about it. “Let’s hire people who resemble the people we’ve already hired (who we think are doing a great job)! What could possibly go wrong?” Well, if you employ mostly men—and you try to hire people who most closely resemble the people you already employ—you’ll be hiring mostly men.

But doesn’t software, and machine learning in particular, magically eliminate bias by focusing exclusively on data, excluding those messy human emotions and beliefs? Quite the opposite: Machine learning is notorious for amplifying human biases. Machine learning’s strength lies in its ability to discover and distill patterns from seemingly disorganized data points. This includes the ability to discover and enshrine biases from hiring data. Historically, did Amazon or its recruiters explicitly intend to primarily hire men? Probably not—at least not consciously. But hiring mostly men is, in fact, what they did. And when the software discovered, among other patterns, that Amazon was hiring mostly men, it responded as designed by purposefully weeding out resumes that appeared to be from women (including resumes touting experiences like “women’s chess club captain”).

The good news is that Amazon may have anticipated this would happen—as it should have, given the combination of high stakes and Amazon’s considerable machine learning expertise.

The fact that Amazon apparently didn’t move this software out of trial mode, and discovered the software’s gender bias the year after work on the software began, suggests that they were aware of the potential for bias and took steps to mitigate it. However, it also appears that Amazon didn’t actually abandon the project for another 2 years, and in fact, didn’t literally “abandon” it at all. A version of the software is still in use for limited purposes, and a new effort is being launched to achieve the original objective.

The best news is that Amazon claims the software “was never used by its recruiters to evaluate candidates”… although it did not dispute that recruiters did look at the software’s recommendations. So it’s not entirely clear that no recruiters were influenced in any way by the software’s bias.

For the sake of argument let’s assume Amazon anticipated that bias would be an issue and took appropriate precautions from the outset. But imagine for a moment what could have happened at a company walking this path without caution. Let’s call this company Mississippi, or “Miss” for short. Miss decided (quite correctly) that machine learning could use historical recruiting data to develop a machine learning model for recommending which applicant resumes were most like the resumes of their high performing employees. So they assembled a bunch of super talented, well-intentioned (not consciously gender biased) software developers, gave them this technical task to accomplish, then had them “throw it over the wall” to HR. The software developers, who weren’t familiar with how bias in hiring works (not their area of expertise), unwittingly built bias into their end product by using biased data to train their machine learning model. Meanwhile, Miss’s recruiters, who don’t understand how machine learning can amplify bias, relied on the software developers to create unbiased code. This was simply a knowledge gap that could have and should have been closed, but wasn’t. The results of this effort would be software like the software Amazon developed, inflicting all of the harms discussed above with respect to Amazon. And Miss would be completely unaware of it.

If only because eliminating gender bias in hiring is so important, Miss should have included one critical step when developing this software. To eliminate bias from the software’s inception, or in the alternative, to pursue an alternative approach that avoids bias, Miss should have closed the knowledge gap between the creators and intended users of the software by facilitating conversations about machine learning and bias between the teams involved in every aspect of the project. Somebody with data literacy and recruiting domain knowledge should have sat down with these teams to align them around shared facts and goals. Had this happened, they would have quickly realized the potential for bias, and either shut the project down before it started, or limited its scope, put appropriate bias testing in place, and/or switched to an interpretable machine learning solution (by deploying an interpretable model from inception, or by adding an interpretability module like LIME) that would reveal biases. If Miss had done so, it could have eliminated the cost of developing and rolling out unsuitable software, as well as eliminated the costs it unwittingly inflicted on itself.

Is Amazon unusually “evil” for doing what it did here? I don’t think so. All organizations using machine learning and AI are facing these challenges today. It’s just surprisingly… sloppy, given Amazon’s combination of size (which gives this mistake weight) and resources (which gives it the means to avoid such missteps). To Amazon’s credit, it says no one relied on this software exclusively to select candidates. It did ultimately detect the issue, and discontinued using the software. And no doubt it will have learned from this experience to take a more proactive approach going forward.

It’s also important to recognize that the only reason we’re hearing about Amazon’s experience now is because a group of Amazon employees reached out to journalists with this story. Other companies are without a doubt using AI to evaluate resumes as well (LinkedIn immediately comes to mind). And they may all have the same problem Amazon had. Or worse. Because the problem isn’t creating a resume screening engine with machine learning. That’s a relatively elementary project at this point (particularly if you have as much training data as Amazon). The problem is creating such an engine without building in bias. So what’s to be done?

The bottom line is that organizations of all sizes need to align their teams around understanding the data they are using, including where it came from, and how it is being used, then keep those teams aligned. This alignment must include a continuing meeting of minds between the teams who are obtaining, processing, and consuming data, facilitated by people who are both data literate and domain experts.


Reach out to me if you have stories about machine learning bias you want to share, or if you want to talk about digital ethics and what you can do to protect your organization from making preventable errors.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: