Bringing Explainability to AI

Machine Intelligence has been getting smarter, and so have it's applications. How we understand and trust them, is still an open problem.

This is an abridged written version of my short talk on Sep 22, 2020, at the ‘Mphasis Laboratory of Machine Learning and Computational Thinking’ at Ashoka University.


Superhuman Machines

Over the last decade or so, we’ve been able to attain superhuman performance with machines on a variety of tasks, we never thought they’d be able to do.

Deepmind’s AlphaGo beat Fan Hui, in the game of Go, which has more possible move combinations than the number of atoms in the universe. We’ve been able to beat expert human players in games ranging from atari games to classics such as Doom. Machines have outperformed humans in critical tasks such as navigating forests for search and rescue operations and self-driving cars being able to identify traffic signs and make corresponding decisions.

Risky Scenarios

Recently, a Tesla car ran into the back of a white trailer, thinking it was a cloud in the sky, killing the driver. As we deploy these machines in more and more critical scenarios, it’s important to understand how they work. However, most neural network-based models are like black boxes. You don’t know why they are doing something.

For this exact reason, DARPA runs the XAI contest to be able to promote the understanding of these black boxes. Let’s say you are using such machines towards autonomous mobility tasks, it’s very important we understand the answer to these questions.

  • Why did you do that? ( For example, why a 5 degree right turn? )

  • Why did you not do anything else? ( Why not go straight? )

  • When do you succeed? ( in edge cases such as when an accident is imminent )

  • When do you fail? ( When are you not confident enough to make a decision? )

  • When can I trust you? ( Can I trust you out on a crowded Chandni Chowk street? )

  • When do I correct an error you make?

The Next Big Challenge

Over the years, we’ve been able to consolidate huge amounts of data, take sophisticated deep neural networks, and train them to understand patterns inside this data to extract knowledge from the data. Let’s say we understand that such a neural network does well at predicting what a street sign is, we still don’t know what it’s thinking. The next challenge lies in interpreting and explaining such decisions.

Explanations vs Interpretability

Even at this juncture, there’s a critical difference between explainability of a machine, and interpretability. Interpretability means that it can explain, end to end, how it came to a specific decision, and the rules-based on which it’s making its decisions. Explainability means that once the machine takes a decision, it goes back to explain why it did something.

This critical difference manifests itself the most as the field of the application becomes more and more risk-averse. A physician would want to know why and how a machine predicts a specific line of treatment to work and not just a handwaved explanation of that specific case. Moreover, we see that end-to-end interpretable models such as decision trees don’t provide suboptimal results because of biases in choosing features etc. On the other hand, Neural Networks which are more tricky to explain provide significantly better performance.

How this is different?

While standard machine learning, uses the data to teach a machine learning model to make predictions, it’s prone to making errors with generalisation, as we saw with the Tesla that crashed into the back of the truck.

Adding interpretability to the process allows human inspection such that if the performance is not good enough, we can improve the model or the data it’s learning from. On the other hand, human experience can be used to better vet these predictions. These procedures are critical to companies that are expected to deploy technologies that are compliant, to regulations such as GDPR that mandate that the right to an explanation must be preserved. Human decision making is preserved in the process helps retain responsibility.

Besides, we must acknowledge that we might be able to learn from these machines. Move number 37 was legendary in the AlphaGo game, that prompted Fan Hui to say :

"It's not a human move. I've never seen a human play this move. "

Techniques of Interpretation

Let’s say we’ve got a clinical decision support system, working with a physician and helping them diagnose and treat a patient. There are primarily two ways we can try to understand a machine.

  1. Explain the Model

    This helps us get a better understanding of the internal understanding of the machine. Questions we may ask are :

    • Which symptoms are most common for this disease?

    • Which drugs are most helpful for patients

  2. Explain Individual Decisions

    This helps us understand why a machine is behaving a certain way, which is critical for many real-world applications. Questions we may ask are :

    • Which particular symptoms does this patient have?

    • What drugs does this patient need to recover?

Interpreting Models

Maximized Activations is helpful to understand the internal representation of what it’s looking at. For instance, if you are classifying images, we can do this :

These explanations are very good, yet they come with their own set of limitations. This is best illustrated with different images of motorbikes. Summarizing the complex concept of a motorbike into a single image can be difficult, considering that it might have different views or colours.

Explaining Decisions

Let’s say we have a black box deep neural network that detects if there is a castle in a given image. If we create a mask and move it around the image, we can try to understand where the machine thinks the castle is.

Turns out we can accomplish a lot by just adding these tiny perturbations to the input if we repeat it enough times. What’s even better about approaches such as this, is that it doesn’t matter what kind of machine it is. As long as it’s a classifier, this approach can be utilized on it. To illustrate this better, I wrote the code to take a simple image classifier that detects what’s in a given image and spits out the answer in the form of an API. From there, I added interpretability inside the pipeline, by asking it to explain why it took the specific decision, and what parts of the image contributed to it.

Let’s look at some of these results.

Why now, and in India?

Over recent years, we’ve seen a proliferation in facial recognition technology being used by law enforcement across India. Surely, this has allowed certain positive outcomes in security, however, this technology is largely prone to biases.

Interpretable classical machine learning-based facial recognition technology, that relies on measurements between different facial landmarks is prone to problems with differing head rotations and tilts. Light intensity and angles cause huge differences in the ability to even detect these facial landmarks. Facial expression changes and ageing based changes to the facial structure are also grey areas for these models.

On the other hand, Deep Neural network-based approaches do very well in these tasks but are prone to errors. It’s been shown that demographic misrepresentation in the datasets that these models have been trained on cause up to 40% more misidentifications with black women when compared to white men. Besides, most of these models aren’t explained when it gives results, and since we do not have comprehensive localised benchmarking for these systems in India, we don’t even have estimates about how they really perform on Indian faces.

The Delhi police is now using Automated Facial Recognition System (AFRS), a software it acquired in March 2018, to screen alleged “rabble-rousers and miscreants”. - The Wire, 29 Dec 2019

Using the same technique for analyzing decisions we used before, we can identify, why a model thinks two faces are similar. When we look at misidentifications, we can see that sometimes the model looks at certain very small sections of the face, that are common across both pictures, therefore classifying them as the same. Imagine, if these same tools were provided to law enforcement, or even as evidence in lawsuits. This would enable human interpretability, therefore causing any rational human to throw out the machine’s prediction.

The Caveats

Quickness vs Correctness

Labs rush to publish results, and companies rush to replicate them in production. Correctness, on edge cases, is often not a priority

Innovation vs Regulation

Labs rush to publish results, and companies rush to replicate them in production. Correctness, on edge cases, is often not a priority

Accuracy vs Interpretability

Labs rush to publish results, and companies rush to replicate them in production. Correctness, on edge cases, is often not a priority


The code for the approaches described above, can be found here and here. These methods barely scratch the surface of the wealth of literature around Explainability and Interpretability in AI and are meant as baseline starting points towards getting introduced to the field.