Explainable AI in practice: Finding the right method to open the Black Box

Artificial Intelligence
Data Science
Human-centered AI

15. November 2024

Jonas Wacker

Team AI Development

With the rise of increasingly complex and powerful AI models, the demand for transparency is also growing, not least due to legal requirements. While such black-box models are often more effective, flexible, and accurate than, for example, easily understandable regression models, they face the problem of lack of explainability. This is where Explainable AI (XAI) comes into play, an increasingly important component in both research and practice.
The demand for XAI is growing as companies and developers recognize that many AI solutions are not used if they are not explainable—especially in critical areas. One reason: In the past, XAI was either completely omitted or merely superficial methods were applied to package results in presentations. But that no longer works.
At statworx, we advocate considering XAI as a best practice in the development process. It must be integrated into the planning from the outset. This blog post is therefore primarily aimed at data scientists who want to delve deeper into the topic of XAI and apply XAI in practice. We present ten practical criteria to find the right XAI method and thereby ensure the success of AI projects.

1. Which target audience is addressed by the explanation?

“Explain {{TERM}} to a five-year-old child” is a well-known ChatGPT prompt. It expresses the implicit understanding: Explanations must be tailored to the needs and knowledge of the target audience to be effective. And if I have no idea about a topic, perhaps an explanation for a toddler will help me the most. Unfortunately, the significance of explanations has not yet made its way into AI projects. XAI is a young research field and has so far mainly been noticed by developers who use it for debugging their AI models. However, with the spread of AI applications in more and more areas, numerous new stakeholders without a technical background are now entering the field and demanding explanations for the functionality of “their” AI. These new target groups bring different knowledge, but also motivations and questions. XAI must adequately address all these stakeholders in the future to gain acceptance and create value.

2. What business value should the explanations create?

Not every problem requires the use of AI. Likewise, not every AI necessarily requires explainability. In practice, XAI is not an end in itself but a means to create tangible added value. Therefore, it should be decided at the beginning of the project what level of explainability is required for the AI solution to be developed. If a music streaming service suggests a new song to me, I don’t want to read a twelve-page PDF report that explains the AI decision in detail. However, if an AI makes a treatment recommendation to my doctor, this detailed traceability is absolutely necessary so that my doctor can ensure that this recommendation is the most sensible and appropriate one. In short: The integration of XAI requires additional development effort, which must be justified by a clearly defined benefit for the company. Therefore, it is crucial to capture and quantify this benefit as clearly as possible before writing the first line of code.

3. What questions should the explanations answer?

An explanation is fundamentally an answer to a question. The usefulness of XAI is thus measured by how well it addresses the stakeholders’ questions. In AI projects, various types of questions may arise that require different XAI approaches and methods. Often, stakeholders want to know, for example, how a particular AI decision came about. Methods like SHAP or LIME can help identify relevant influencing factors and describe their exact impact on the prediction. A rejected applicant might want to know what they lacked for hiring. Here, “Counterfactual Explanations” or prototypes and criticisms can help them understand the decision and derive specific improvements for the next interview. Decision-makers, on the other hand, would like to know if they can trust an AI decision or not. Here, XAI methods such as “Conformal Predictions” can reveal the prediction uncertainties of the AI model. The range of possible questions is endless. Therefore, the goal should be to define the truly relevant questions and consult appropriate XAI methods for their answers.

4. How important is the accuracy of the numbers in the explanation?

In mathematics, the rule is: the more accurate, the better. And indeed: ideally, XAI should always provide us with exact calculations that describe the model behavior without error. A fundamental problem with this approach is that we apply XAI because we do not understand our model. Whether an explanation is accurate or not cannot be readily determined. Therefore, we should always understand XAI in practice as an approximation. Nevertheless, it is clear that, depending on the data and application case, some methods are more accurate than others. The popular “Shapley Values” can be calculated exactly, which, however, can lead to exploding computation time for large datasets. An approximation of the Shapley Values can often be determined in a fraction of the time. If it is only about a rough classification of the model behavior, we should be open to trading a certain degree of accuracy for more efficiency. In critical application cases, where every decimal place counts, a higher computational time must be accepted.

5. What type of data is available?

The world of data is diverse: in addition to tabular data, we encounter images, texts, audio, and graphs everywhere. Although many XAI algorithms are model-agnostic, very few are data-type-agnostic. SHAP, LIME, and others can often be abstracted and thus applied to non-tabular data. However, all too often the research is still quite thin here, and few ready-made code libraries are available. This results in a high effort for researching, implementing, and testing own algorithms. Another aspect is that many data types are associated with the application of certain model types. For example, “Random Forest” is often used for tabular data, while image data is mostly processed with neural networks like CNNs or transformers. In practice, the data type can thus limit the range of available, but especially the implementable, XAI methods. On the other hand, dealing with the data type opens the way to model-specific explanation algorithms.

6. What is the dimensionality of the data?

The more complex a causal relationship, the more difficult it is to explain. This leads to XAI developers being among the few data scientists who are more unsettled than encouraged by large datasets. In practice, the number of features (in the tabular case: columns) is particularly relevant. Here, the rule is: the more features there are and the more they correlate, the more complex the calculation of an accurate explanation becomes. In other words: through an exact calculation of SHAP, correlations between all features can be taken into account. But this property, which sounds tempting with ten features, becomes a problem for the calculation with more than 100 features.

If data with many features are available, three approaches should be considered.

1. There is often the possibility to group features (e.g., through correlation analysis) and calculate explanations with grouped features.

2.Popular methods like SHAP often offer the possibility to achieve a compromise between accuracy and computational efficiency through sampling.

3. Simpler methods that ignore feature interactions may also be suitable. For global feature importance, for example, SHAP can be replaced by the more efficient Permutation Feature Importance (PFI).

7. Which model type and framework are used for training and inference?

Besides model-agnostic methods like SHAP and ALE, the XAI toolbox contains numerous model-specific methods and more efficient implementations of existing methods. For mathematically differentiable models like neural networks, for example, “Integrated Gradients” can be applied to calculate feature importance. For tree-based models like Random Forests, Tree-SHAP offers an efficient SHAP implementation. In individual cases, model-specific methods can achieve a better explanation or increase computational efficiency. In practice, in addition to the model type, the framework in which the model was developed or in which the model inference takes place is also relevant. This is particularly because code libraries for XAI are often designed for specific frameworks and may need to be adapted at great expense. If a Python library is, for example, designed for a Scikit-Learn model (model.predict(), model.predict_proba(), model.score(), etc.), a wrapper may need to be written for models from other frameworks such as XGB, Tensorflow, or Pytorch before the code works. Model type and framework thus have direct implications for the implementation of XAI methods.

8. Is there access to the model and the training data?

To calculate explanations, one ideally has access to the model, its parameters, and the entire training data. In practice, however, one often only has access to an inference API that hides parts of the model from users. For example, those who retrieve GPT-4 via the OpenAI API do not have direct access to its model parameters. Some XAI methods, especially model-specific ones, can already be eliminated in such a scenario. Instead, model-agnostic methods like SHAP or LIME should be considered, as these work at the level of inputs and outputs.
OpenAI also does not grant access to training data (or at least a part of it). However, some XAI methods like SHAP rely on a reference dataset to draw correct conclusions about the structure and relationships of the data underlying the model. Access to the model and the training data is a factor that is particularly often overlooked and frequently leads to problems.

9. What computing infrastructure should be used to calculate the explanations?

During development, machine learning models and XAI algorithms often reside on local notebooks. These are easier to create but are not secure, reliable, and flexible enough for the deployment of the AI solution. For example, if XAI components like Shapley Values are to be calculated in addition to the model predictions, it must be predefined when and how this additional computing power will be provided. Especially large deep learning models often run on a virtual cloud computer, which is only tapped by end-users via an API. This raises the question of where, when, and how additional XAI algorithms should be executed.
There is also a second potential problem: An XAI method works in principle, but cannot be implemented time-efficiently on the available computing resources. This risk can be minimized through dedicated pre-planning of XAI components. Depending on the computing infrastructure, other solutions may also be possible: For example, the calculation of Shapley Values can be distributed across multiple computers on cloud platforms to drastically reduce computation time. In principle, XAI is not an “appendage” of a model but its own software component with individual risks and potentials.

10. How frequently should new explanations be calculated?

XAI methods differ greatly in terms of their computational efficiency. For example, those who want to calculate global feature importance will be faster with Permutation Feature Importance than with SHAP. However, the computational efficiency of a one-time execution is only one of two important factors. In practice, explanations are calculated multiple times. A weekly SHAP analysis then consumes significantly fewer resources than permutation feature importance calculated hourly. Therefore, it is important to include the recalculation frequency in the planning and development of the computing infrastructure. In an ideal world, the recalculation frequency is static, e.g., once a week. However, scenarios are also conceivable in which explanations are calculated on-demand and the calculation frequency is subject to trends, seasonalities, and random effects. Including the calculation frequency is thus essential to ensure controlled operation of the AI system.

Conclusion

The integration of Explainable AI (XAI) into the development process of AI models is no longer just an optional add-on but a necessary best practice. The path to more transparency in AI is crucial, as many solutions remain unused due to a lack of explainability. To effectively implement XAI, companies and developers must proceed strategically and carefully select their methods.
In this post, ten practical criteria were presented to help select the right XAI methods. From target audience analysis to business objectives to technical aspects such as data types and computing infrastructure—each criterion plays a crucial role.
Creating explainability is a means of value creation that must be integrated into planning from the outset. Only in this way can it be ensured that AI solutions are not only powerful but also understandable and trustworthy. Companies that consider XAI as an integral part of their AI strategy will be able to better explain their models, build trust, and ultimately implement more successful AI projects.