What’s so important about interpretability in machine learning?
It’s a poorly kept secret that we lack insight into how complex machine learning models like neural networks make decisions. We see the data that goes in, the math that goes in, and the results that come out. But in the middle, where we want to see a chain of reasoning like a human could give us to explain decisions, there’s only a black box. Neither data scientists nor these complex machine learning models can provide insight into “why” a model chose output A rather than output B.
What does it matter whether we have an understandable explanation for why a machine learning model delivers a specific result? For example, when diagnosing whether or not a patient has cancer, isn’t it enough that the model is accurate, according to rigorous testing? I’ll look deeper into the implications of interpretability in future blog posts. But for now, the simplest answer to this question is: it’s a matter of trust (as Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin so elegantly point out).
Even the best machine learning models are less than 100% accurate. Meanwhile, the concept of accuracy in data science can wind up being a loaded deck. Accuracy is typically measured by comparing the output of a machine learning model to a subset of its own training data, rather than using real world conditions as a benchmark. So, any systematic bias in the training data is incorporated into the model invisibly. And sometimes machine learning models fall down a rabbit hole and output decisions that obviously make no sense when presented to a person. What if our own cancer diagnosis is one of those decisions?
Then, why should a customer—or an expert, or an employer, or an investor backing a machine learning-based solution—take the risk of betting on machine learning, if the reasoning behind recommendations can’t be explained, even by the data scientists who built the model?
In response, some organizations have settled on using only simpler data analysis models, like linear regression and decision trees, so that decisions made by their models can always be understood by humans. When researcher Brendan Frey was recently asked about the merits of this simplified approach, he pointed out that to say “you’ve got to look at the parameters, so therefore you have to use a simple system” is a fallacy. Not only does this approach deprive us of the incredible power of machine learning, but it requires far more detail than what we would ask a human to provide if they were making the same decision. As Frey says, you don’t have to crack a human’s head open to inspect the neurons they used to make a decision. You simply ask them to provide an explanation. By the same token, Frey envisions an extension of current machine learning output to provide explanations for each decision:
I think the future of machine learning is all about using complex deep neural nets, but training them in such a way that they actually produce an explanation at the output. So we don’t crack them open and look at the parameters, we actually train the system so that the output of the neural network is an explanation as well as a decision….You can think of this as a multi-task training problem where one output is the decision and the other output is the explanation….There are obviously some really challenging technical issues for how we get that to work, and it’s not really working well yet, but I think that is really where things will go in that regard.”
Along these lines, machine learning researchers Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin have developed a method they call LIME (Local Interpretable Model-Agnostic Explanations) to explain the decisions of machine learning models. They designed LIME to be able to explain any machine learning model—thus the “Model Agnostic” part of the acronym. LIME makes what are essentially a series of faithful, or closely tracking, approximations of the machine learning model. These approximations are “Local” in the sense that each approximation is based on the specific region of the global model nearest each decision. (Here, visualize of a plot of data points. Instead of running an analysis of the entire plot, draw a circle around a group of points, and run an analysis on just that subset.)
The math is a bit beyond my pay grade, but in essence for each decision made by the machine learning model LIME polls the model to generate a local data set of decisions that are similar to the decision in question. Then it applies a simple (“Interpretable”) data model to that local data set. It’s like creating a mini-model that analyzes the output of the complex model for each decision. Testing LIME demonstrates that these mini-models are surprisingly useful for explaining the behavior of a complex machine learning model’s’ decisions. And insights provided by LIME can be used to reveal and remedy persistent flaws in machine learning models.
For example: Ribeiro, Singh, and Guestrin trained a ML model to decide whether photos showed wolves or huskies (a breed of dogs that look similar to wolves), using a training set of photos showing both dogs and wolves. Then they used LIME to query the model to determine which areas (contiguous pixels of the same color) from a particular photo were most significant for the model’s decision. LIME revealed that the ML model had learned—quite correctly based on the training data provided—that the presence of snow in the background of photos was determinative for deciding that there were wolves in the photo. Lo and behold, it turns out that the training data set used to create the model had pictures of wolves with snow in the background, while the pictures of huskies did not have snow in the background. (As Marco later said: “We’ve actually built a great snow detector.”) LIME showed that the model made a correct decision for the wrong reason. In other words, while the model worked great with its own training data, it wouldn’t be trustworthy in the real world.
The researchers also demonstrated how LIME can be used to improve machine learning models by feature engineering. They successfully tested workflows using LIME in combination with either crowdsourced non-expert reviewers (via Amazon’s Mechanical Turk), or data science experts, to identify untrustworthy characteristics like the snow in the huskies-wolves example. When unreliable features are found, these can be eliminated from machine learning models. In the words of Ribeiro, Singh, and Guestrin:
Using LIME, we show that even non-experts are able to identify [machine learning model] irregularities when explanations are present. Further, LIME can complement these existing systems, and allow users to assess trust even when a prediction seems “correct” but is made for the wrong reasons.
Interpretability, and LIME in particular, represents a huge step towards making machine learning models less risky, more valuable, and more widespread.
Brendan Frey is Co-Founder and CEO of the startup Deep Genomics and a former Professor of Engineering and Medicine at the University of Toronto. Listen to his interview on Sam Charrington’s TWiML&AI podcast here.
Marco Túlio Ribeiro is a PhD student at the University of Washington. Sameer Singh is an associate professor at UC Irvine. Carlos Guestrin is a professor at the University of Washington. To learn more about LIME you can read their joint research paper presented at KDD 2016, watch Marco’s KDD presentation, or listen to Carlos’ TWiML&AI podcast interview. Sameer recently presented a webinar about interpretability and LIME jointly with Fast Forward Labs’ Mike Lee Williams and H2o.ai‘s Patrick Hall — I’ll provide a link when it becomes available.