Deep learning systems, which are the most headline-grabbing examples of the AI revolution—beating the best human chess and poker players, self-driving cars, etc.—impress us so very much in part because they are inscrutable. Not even the designers of these systems know exactly why they make the decisions they make. We only know that they are capable of being highly accurate…on average.
Meanwhile, software companies are developing complex systems for business and government that rely on “secret sauce” proprietary data and AI models. In order to protect their intellectual property rights, and profitability, the developers of these systems typically decline to reveal how exactly their systems work. This gives rise to a tradeoff between profit motive, which enables rapid innovation (something government in particular isn’t known for), and transparency, which enables detection and correction of mistakes and biases. And mistakes do occur…on average.

On the one hand, a lack of transparency in deep learning and proprietary AI models has led to criticism from a number of sources. Organizations like AI Now and ProPublica are surfacing circumstances where a lack of transparency leads to abuses such as discriminatory bias. The EU has instituted regulations (namely GDPR) that guarantee its citizens the right to an appeal to a human being when AI-based decisions are being made. And, last but not least, there is growing awareness that AI systems—including autonomous driving and health care systems—can be invisibly manipulated by those with a motive like fraud or simple mischief.
On the other hand, AI proponents oppose virtually any restrictions on the use of AI, arguing that such restrictions inhibit innovation and reduce competitiveness on a global scale while reducing benefits to consumers from the increased accuracy that AI is capable of delivering in areas such as medical diagnosis.
On the third hand, over the past few months I’ve seen a smattering of posts on Twitter from some incredibly smart PhD-carrying data scientists saying, in effect: machine learning just works. So just use it. Don’t worry about how it works. In fact, if you think we have to ask why a machine learning model arrived at a particular decision, then you shouldn’t be participating in the conversation about AI at all—because you lack the capacity to understand or contribute to the conversation.
So why (or when) should we require reasons for AI decisions? To put this issue in a more intuitive frame, I recommend taking a closer look at the legal perspective on AI-based decision making. This may sound counterintuitive. After all, what do lawyers know about AI? And since when have lawyers been known for making anything clearer, rather than more complicated? Let me answer this way: what the legal profession does know about—as the inheritors of thousands of years of human cultural values surrounding decision making—is what human societies consider important about decision making. Although far from perfect, the legal perspective provides a nuanced window into what our society is likely to accept, over the long run, from machine-powered decision making systems.
A February, 2018 panel discussion at NYU (now available on YouTube) delivered a strong dose of legal wisdom on this topic. Entitled “Accountability in the Age of Artificial Intelligence”, the panel included Mariano-Florentino Cuéllar, Justice of the California Supreme Court, Rashida Richardson, legislative counsel of the NYCLU, and Rachel Goodman, staff attorney at the ACLU’s Racial Justice Program, with moderator Jason Schultz, who is Professor of Clinical Law at the NYU School of Law and research lead for law and policy at the AI Now Institute at NYU.
Here are a few of the concepts shared by the panelists that really stood out. Please keep in mind that while the panel primarily talked about government decision making, the same principles of fairness apply—at least in customers’ minds, if not via consumer protection, public accommodation, and fair employment laws—to business decisions.
• Rashida Richardson expressed concern (at about minute 19) that although government officials are hopefully thinking about due process, they are under increasing pressure to make rapid decisions, which might influence them to adopt automated decision systems that have not been fully vetted, leading to undesirable consequences. (The same could be said of business users of AI systems, of course.)
• Rachel Goodman pointed out (at about minute 25) that it would be a “profound shift” to go from government adjudication of your rights as an individual, on a case-by-case basis, to allowing governments to make decisions about you based on your statistical profile—treating you and the people you most closely resemble exactly the same, without room for considering your unique circumstances.
“[With human decisions] there’s a sense that there’ll be a kind of correction that can happen if things have gone off the rails and a very bad result occurs.”
• Jason Schultz pointed out (at about minute 26:40) that we’re accustomed to holding humans accountable for their decisions, and to appealing human decisions to a higher authority, but we don’t necessarily have equivalent control mechanisms for an AI. He said: “When we have due process in the court system, one of the things that is required is that when a decision—an adjudication—is made, that it’s not just the answer, it’s the answer and why to some degree, right? Maybe not as much as we want, maybe it’s not in the form we want, but some sort of explanation. And then on some level maybe that decision gets appealed, right? So maybe at some point the Supreme Court gets to review it. And there’s a sense that there’ll be a kind of correction that can happen if things have gone off the rails and a very bad result occurs. And so when we start the look at at these sorts of questions around computation adjudication, right, I think it raises other questions because the humans in the systems of government, the humans in the system of the court, we have some sense that they can be held accountable. Maybe politically, maybe in the press, maybe actually legally.“
“We have different layers of review, and its the forced reason-giving and ability to challenge those reasons that I think begins to sound in the key of what we mean when we say ‘accountability’.”
• Concerning the purposes for requiring legal decision makers to include reasons with their decisions, Justice Cuéllar said (at about minute 29): “So, think about the way adjudication plays out in our court system. We don’t have a trial court judge have the final word on something, although sometimes they should because they’re very smart. We have different layers of review, and its the forced reason-giving and ability to challenge those reasons that I think begins to sound in the key of what we mean when we say ‘accountability’. I’ll also add a reference to the jury. [It’s not that] juries are smarter than judges, or jurors always know what the right reasons are to explain their behavior, it’s something about constraining concentrated power with members of the community that are sort of forced to pass judgment on certain aspects of the argument and give it a certain kind of legitimacy….”
“[I]deally we ought not to deprive ourselves completely of the benefit of some of these new techniques that help us spot things that even really thoughtful humans in conversation with each other are unable to spot. So I think in a way what we ought to aim for is the proper level of trust and distrust.”
• Justice Cuéllar later said (at about minute 42:20): “If we expect there is some degree of dialog, back and forth…with the affected public [as is required for many government administrative decisions], we have to have enough clarity about what it is that is actually driving the decision. And at the same time ideally we ought not to deprive ourselves completely of the benefit of some of these new techniques that help us spot things that even really thoughtful humans in conversation with each other are unable to spot. So I think in a way what we ought to aim for is the proper level of trust and distrust.” He goes on to cite the infamous Stanley Milgram experiment in which people obediently delivered potentially life-threatening electric shocks to strangers because people wearing white lab coats making them look like doctors or scientists ordered them to deliver the shocks. He then concluded: “The risk of over-relying on something that seems too scientific and technically precise is very real, I think.”
• Finally, on the subject of decision making transparency, or interpretability, Rachel Goodman called out (at about minute 43:40) an “insane” but all-too-common paradox of secret sauce decision making, wherein decision makers may refuse to tell people how they can get a good score on a ratings system, presumably to make it harder for people to game the system…while simultaneously depriving people of the knowledge they need to do what they’re supposed to be doing in the first place.
I’ll offer a couple of examples to illustrate this last point: People with low credit scores would benefit from being told “you can change these things to get better credit”. And convicts on probation from a prison term would like to know that “you can get an early discharge of your probation as long as you don’t do these things”. Can such evaluations be considered fair if the people being judged aren’t allowed to know what “these things” are?
I, for one, think that decisions involving fundamental justice must include an appeal to a human decision-maker who can reconsider the decision entirely, without being allowed to simply re-iterate an AI-generated decision. In other words, I like the EU’s human-in-the-loop requirement embodied in GDPR. Yes, this is potentially far less efficient than purely automated decision-making. But our culture isn’t ready to hand-over justice (as we see it) to machines. At least not yet. By way of example, soon autonomous vehicles will indirectly hold life-or-death power, without human oversight. But delivering death is not their purpose. Their purpose is to save lives by improving driving quality overall. So I’m pretty OK with the lack of human oversight in this scenario, once a more-than-human level of safety is achieved. In contrast, criminal trials, and death sentences, should be left to humans, as informed by (but not ultimately controlled by) the best available technology, regardless of the “accuracy” of the AI systems aiding in such decisions.
Where the purpose is justice, a human-to-human chain of accountability is critical for decisions to hold legitimacy. Drawing a line between “must be appealable to a human” and “does not have to be appealable” decisions will be more challenging in many cases. But of course, when a monopoly (as with government and certain services) is not involved, but rather a competitive market exists, best practices for customer experience already dictate access to sympathetic and empowered customer service representatives (don’t MAKE me ask to speak with a supervisor—right?) who have the power to correct AI “mistakes”, at least from the customer’s perspective.
So while we have our work cut out for us going forward, I think an answer to the question “when is transparency needed?” is closer than we may think.