์ฐธ๊ณ ๋ฌธํ—Œ (references)

  1. A Guide for Making Black Box Models Explainable
  2. Explainable Deep Learning Methods in Medical Diagnosis: A Survey
  3. A Review on Explainable Artificial Intelligence for Healthcare: Why, How, and When?
  4. Explainable Artificial Intelligence (XAI): What we know and what is left toattain Trustworthy Artificial Intelligence
  5. Training calibration-based counterfactual explainers for deep learning models in medical image analysis
  6. http://xai.kaist.ac.kr/Tutorial/2023/ Kaist tutorial series

Introduction

interpretability(ํ•ด์„๊ฐ€๋Šฅ์„ฑ) vs explainability(์„ค๋ช…๊ฐ€๋Šฅ์„ฑ)

  • Level of detail:ย Interpretability focuses on understanding the inner workings of the models, while explainability focuses on explaining the decisions made. Consequently, interpretability requires a greater level of detail than explainability. interpretability๋Š” ๋ชจ๋ธ ๋‚ด๋ถ€ ์ž‘๋™ ์ž์ฒด์— ๋Œ€ํ•œ ์ดํ•ด์— ํฌ์ปค์‹ฑ์„ ๋งž์ถ˜ ๋ฐ˜๋ฉด์—, explainability๋Š” ๋ชจ๋ธ์ด ํŒ๋‹จํ•œ ๊ฒฐ์ •์— ๋Œ€ํ•ด ํ•ด์„ํ•˜๋Š” ๊ฒƒ์— ์ข€ ๋” ์ง‘์ค‘ํ•œ๋‹ค.

  • Model complexity: More complex AI models, such as deep neural networks, can be difficult to interpret because of their intricate structure and the interactions between different parts of the model. In these cases, explainability may be more viable, as it focuses on explaining decisions rather than understanding the model itself. ๋”ฅ๋Ÿฌ๋‹์˜ ๊ฒฝ์šฐ์—๋Š” model complexity๊ฐ€ ๋งค์šฐ ๋†’์•„, interpretability๋ณด๋‹ค๋Š” explainability์— ์ข€ ๋” ์ดˆ์ ์„ ๋งž์ถ”๊ฒŒ ๋œ๋‹ค.

  • Communication: Interpretability concerns the understanding of the model by AI experts and researchers, while explainability is more focused on communicating model decisions to end users. As a result, explainability requires a simpler and more intuitive presentation of information. interpretability๋Š” AI์ „๋ฌธ๊ฐ€๋ฅผ ๋Œ€์ƒ์œผ๋กœ ํ•œ๋‹ค๋ฉด, explainability๋Š” end user์—๊ฒŒ ์„ค๋“ํ•˜๋Š” ๋ชฉ์ ์œผ๋กœ ์‚ฌ์šฉ๋œ๋‹ค.

  • ๋‚ด๊ฐ€ ์ € ๋ชจ๋ธ์˜ ์ˆ˜์‹์„ ์ดํ•ดํ•˜๋ฉด ๋™์ž‘์›๋ฆฌ๋ฅผ ์–ด๋А์ •๋„ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋‹ค โ†’ interpretability๊ฐ€ ๋†’์€ ๋ชจ๋ธ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

  • ==interpretableํ•œ ๋ชจ๋ธ์ผ ์ˆ˜๋ก explainableํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋งค์šฐ ๋†’์•„์ง„๋‹ค.== ๊ทธ๋Ÿฌ๋‚˜ deep learning๊ณผ ๊ฐ™์€ interpretability๊ฐ€ ๋‚ฎ์€ ๋ชจ๋ธ๋„ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•๋“ค์„ ์ด์šฉํ•˜์—ฌ explainability๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด๊ณ  ์šฐ๋ฆฌ๋Š” ์ด๊ฒƒ์— ์ง‘์ค‘ํ•œ๋‹ค. (ref4.figure4)

(Why?) Importance of XAI

  1. Human Curiosity and Learning: Humans update their mental models when faced with unexpected events, seeking explanations to understand and learn from these occurrences. Similarly, in research, interpretability in machine learning models is essential for uncovering scientific insights that remain hidden in purely predictive models.

  2. Finding Meaning: People seek to resolve contradictions in their knowledge. When a machineโ€™s decision impacts a personโ€™s life significantly, it becomes crucial for the machine to provide explanations. ๋ชจ๋ธ์˜ ๊ฒฐ์ •์ด ํ•œ ์‚ฌ๋žŒ์˜ ์ธ์ƒ์— ์˜ํ–ฅ์„ ๋ฏธ์น  ๊ฐ€๋Šฅ์„ฑ์ด ํด์ˆ˜๋ก ์„ค๋ช…์€ ๋” ํ•„์š”ํ•ด์ง„๋‹ค.

  3. Scientific Goals: As various fields move towards quantitative methods and machine learning, models become knowledge sources. Interpretability allows for the extraction of this knowledge.

  4. Safety Measures: In critical applications like self-driving cars, ensuring the modelโ€™s decisions are error-free and understandable is vital for safety.

  5. Detecting Bias: Interpretability serves as a tool for identifying and correcting biases in machine learning models, ensuring fairness and compliance with ethical standards. ๋ชจ๋ธ ์ž์ฒด๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ํŽธํ–ฅ์— ๋Œ€ํ•ด์„œ ํ™•์ธํ•ด์•ผ ํ•œ๋‹ค. ํŠน์ • ์ฝ”ํ˜ธํŠธ(์ธ๊ตฌ์ง‘๋‹จ)์—์„œ๋งŒ ์„ฑ๋Šฅ์ด ์ข‹์€ ๊ฒƒ์€ ์•„๋‹Œ์ง€.

  6. Social Acceptance: Interpretability increases the acceptance of machines and algorithms by providing explanations for their decisions, fostering trust and understanding. ํŠนํžˆ, Healthcare ๋ถ„์•ผ์—์„œ ์˜์‚ฌ๋“ค์„ ๋Œ€์ƒ์œผ๋กœํ•œ ์ˆ˜์šฉ์„ฑ์„ ๋†’์ด๊ธฐ ์œ„ํ•จ.

  7. Managing Social Interactions: Explanations from machines can influence human actions, emotions, and beliefs, facilitating smoother interactions between humans and machines.

  8. Debugging and Auditing: Interpretability is essential for identifying and fixing errors in machine learning models, ensuring they function correctly and safely. ๋ชจ๋ธ์ด ์ž˜๋ชป ํŒ๋‹จํ•˜๊ณ  ์žˆ๋Š” ์ƒ˜ํ”Œ๋“ค์— ๋Œ€ํ•œ ๋””๋ฒ„๊น…. ๊ทธ๋ฆฌ๊ณ  ๋ชจ๋ธ ๊ฐœ์„ .

(What?) Concepts of XAI

  1. Explainability: It involves making the workings of AI models transparent and understandable, aiming to bridge the gap between the complex decision-making processes of AI and human understanding.

  2. Interpretability: This concept focuses on the extent to which a human can comprehend and predict the modelโ€™s behavior, making it easier for users to trust and effectively use AI systems.

  3. Transparency: It emphasizes the need for AI models to be open about their operations, allowing users to see how and why decisions are made, thereby fostering trust and accountability. ๋ชจ๋ธ์ด ๊ฐœ๋ฐœ๋˜๊ธฐ๊นŒ์ง€์˜ ๊ณผ์ •์ด ๋ชจ๋‘ ๋ณผ ์ˆ˜ ์žˆ๋„๋ก ๊ฐœ๋ฐฉ๋˜์–ด์žˆ๋Š”๊ฐ€? ์ด ์˜์—ญ์€ ๋ชจ๋ธ ์„ค๊ณ„๋ถ€ํ„ฐ ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ ๋ฐ ํ•™์Šต๊ณผ์ • ๊ทธ๋ฆฌ๊ณ  ๊ฒฐ์ • ๊ทผ๊ฑฐ๊นŒ์ง€ ํฌ๊ด„ํ•œ๋‹ค.

  4. Fairness: Concerned with ensuring AI decisions are impartial, non-discriminatory, and just, fairness is crucial for ethical AI applications, especially in sensitive areas like healthcare and law enforcement.

  5. Robustness: This concept highlights the importance of AI systems being reliable and consistent under varying conditions, ensuring they perform accurately and predictably across different scenarios.

  6. Satisfaction: Satisfaction refers to the degree to which users find the AI systemโ€™s explanations useful, relevant, and sufficient for their needs, impacting user acceptance and trust in AI technologies.

  7. Stability: Stability involves the consistency of AI explanations, ensuring that similar inputs produce similar explanations, which is vital for user trust and understanding.

  8. Responsibility: It encompasses creating AI systems that are ethically aligned, accountable for their actions, and designed with consideration for societal norms and values, promoting responsible usage and development of AI.

(ref.4 figure 3) Relations among XAI concepts.

Taxonomy of Interpretability Methods

  1. Intrinsic vs. Post Hoc:

    • Intrinsic interpretability involves using models that are naturally interpretable due to their simple structure, such as short decision trees or sparse linear models.
    • Post hoc interpretability refers to techniques applied after a model has been trained to analyze and explain its behavior. These methods can be used on both intrinsically interpretable models and more complex models.
  2. Result of the Interpretation Method: Interpretation methods can be differentiated by their outputs, which include:

    • Feature summary statistic: Provides summary statistics for each feature, like feature importance.
    • Feature summary visualization: Visual representations of feature summaries, such as partial dependence plots.
    • Model internals: Direct interpretation of model components, such as weights in linear models or the structure of decision trees.
    • Data point: Methods that return specific data points (existent or newly created) to explain model predictions, like counterfactual explanations or prototypes of predicted classes.
    • Intrinsically interpretable model: Approximating black box models with simpler, interpretable models to understand their behavior.
  3. Model-specific vs. Model-agnostic:

    • Model-specific tools are designed for specific types of models, such as linear models or neural networks, and often rely on the modelโ€™s internal structure for interpretation.
    • Model-agnostic tools can be applied to any machine learning model and work by analyzing the relationship between input features and model outputs without needing access to the modelโ€™s internals.
  4. Local vs. Global: Interpretation methods can also be classified based on their scopeโ€”whether they explain individual predictions (local) or the behavior of the entire model (global). (ref2.figure1)

Scope of Interpretability

  1. Algorithm Transparency: This refers to understanding how an algorithm learns a model from data and the kind of relationships it can learn. Itโ€™s about the general workings of the algorithm rather than the specifics of the learned model or individual predictions.

  2. Global, Holistic Model Interpretability: This level of interpretability is about comprehending the entire model at once, including its features, learned components (like weights or parameters), and how decisions are made. However, truly understanding a model globally is challenging, especially for models with many parameters.

  3. Global Model Interpretability on a Modular Level: While fully understanding a complex model may be impractical, itโ€™s often possible to understand parts of a model, such as the weights in linear models or the splits in decision trees. However, the interpretation of these components can be interdependent, making it difficult to isolate their effects.

  4. Local Interpretability for a Single Prediction: This focuses on explaining why a model made a specific prediction for an individual instance. Local interpretations can sometimes offer more accurate explanations than global ones, as the modelโ€™s behavior for a particular instance may be simpler than its overall behavior.

  5. (Glocal)Local Interpretability for a Group of Predictions: This involves explaining model predictions for a group of instances, either by applying global interpretation methods to the subset or by aggregating individual explanations.

Evaluation of Interpretability

  1. Application Level Evaluation (Real Task): This approach involves integrating the explanation directly into the product and having it tested by the end user. For example, a machine learning system designed for detecting fractures in X-rays would be evaluated by radiologists to assess the modelโ€™s interpretability. This method requires a well-designed experimental setup and a clear understanding of quality assessment criteria, with human explanation capabilities serving as a baseline for comparison.

  2. Human Level Evaluation (Simple Task): This is a simplified version of the application level evaluation, where experiments are not conducted with domain experts but with laypersons. This approach reduces costs and makes it easier to recruit a larger number of testers. An example could involve presenting users with different explanations for a modelโ€™s decision and asking them to select the one they find most understandable.

  3. Function Level Evaluation (Proxy Task): This level of evaluation does not involve human participants. It is most applicable when the model class has previously been evaluated in a human-level evaluation. For instance, if it is known that end users find decision trees understandable, the depth of the tree could serve as a proxy for the quality of explanation, with shorter trees being rated as more interpretable. However, itโ€™s important to ensure that the predictive performance of the model does not significantly deteriorate when simplifying it for the sake of interpretability.

Properties of Explanations

Properties of Explanation Methods

  1. Expressive Power: The complexity or simplicity of the explanations a method can generate, such as IF-THEN rules, decision trees, or natural language explanations.
  2. Translucency: The extent to which the explanation method relies on the internal workings of the machine learning model, such as its parameters.
  3. Portability: The ability of the explanation method to be applied across different machine learning models, with higher portability indicating applicability to a wider range of models.
  4. Algorithmic Complexity: The computational complexity of generating explanations, which is crucial when computation time is a limiting factor.

Properties of Individual Explanations

  1. Accuracy: The precision of an explanation in predicting unseen data, which is vital if the explanation is used for making predictions. ์˜ˆ์‹œ: ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์ด ์ง‘๊ฐ’์„ ์˜ˆ์ธกํ•˜๋Š” ๊ฒฝ์šฐ, ๋ชจ๋ธ์˜ ์„ค๋ช…์ด โ€œ์ด ์ง‘์˜ ๊ฐ€๊ฒฉ์€ ํฐ ์ •์› ๋•Œ๋ฌธ์— ๋†’์Šต๋‹ˆ๋‹คโ€๋ผ๊ณ  ํ•  ๋•Œ, ์ด ์„ค๋ช…์ด ์‹ค์ œ๋กœ ๋‹ค๋ฅธ ์ง‘๋“ค์— ๋Œ€ํ•ด์„œ๋„ ์ผ๊ด€๋˜๊ฒŒ ์ ์šฉ๋˜์–ด ์ •์›์˜ ํฌ๊ธฐ๊ฐ€ ์ง‘๊ฐ’์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ์ •ํ™•ํ•˜๊ฒŒ ๋ฐ˜์˜ํ•œ๋‹ค๋ฉด, ์ด ์„ค๋ช…์€ ๋†’์€ ์ •ํ™•๋„๋ฅผ ๊ฐ€์ง„๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์„ค๋ช… ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ์ž์ฒด๊ฐ€ ์ข‹์€์ง€ ๋ณดํ†ต ์„ค๋ช…์„ ์œ„ํ•œ ๋ณ„๋„์˜ surrogate model์ด ์žˆ์„ ๋•Œ.
  2. Fidelity: The accuracy with which the explanation approximates the prediction of the original model, highlighting the importance of the explanationโ€™s relevance to the modelโ€™s decisions. ์‹ค์ œ ๋ชจ๋ธ๊ณผ ์„ค๋ช… ๋ชจ๋ธ์˜ ์˜ˆ์ธก์ž์ฒด๊ฐ€ ๋น„์Šทํ•œ์ง€
    • ์˜ˆ์‹œ: ์–ด๋–ค ๋ณต์žกํ•œ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์ด ํŠน์ • ํ™˜์ž์—๊ฒŒ ์•ฝ์ด ํšจ๊ณผ๊ฐ€ ์—†์„ ๊ฒƒ์ด๋ผ๊ณ  ์˜ˆ์ธกํ–ˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด ๋ด…์‹œ๋‹ค. ์ด ๋ชจ๋ธ์˜ ์˜ˆ์ธก์„ ์„ค๋ช…ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ ๊ฐ„๋‹จํ•œ ๊ฒฐ์ • ํŠธ๋ฆฌ๊ฐ€ ์‹ค์ œ ๋ชจ๋ธ์˜ ์˜ˆ์ธก ๊ณผ์ •์„ ์ž˜ ๋ฐ˜์˜ํ•˜์—ฌ, ์™œ ์•ฝ์ด ํšจ๊ณผ๊ฐ€ ์—†์„ ๊ฒƒ์œผ๋กœ ์˜ˆ์ธก๋˜์—ˆ๋Š”์ง€๋ฅผ ์ •ํ™•ํžˆ ์„ค๋ช…ํ•œ๋‹ค๋ฉด, ์ด ๊ฒฐ์ • ํŠธ๋ฆฌ๋Š” ๋†’์€ ์ถฉ์‹ค๋„๋ฅผ ๊ฐ€์ง„ ์„ค๋ช…์ด ๋ฉ๋‹ˆ๋‹ค.
  3. Consistency: The similarity of explanations across models trained on the same task and producing similar predictions, indicating the reliability of the explanation method. ์˜ˆ์‹œ: ๋‘ ๊ฐœ์˜ ์„œ๋กœ ๋‹ค๋ฅธ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์ด ๋น„์Šทํ•œ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๋†“์•˜๋‹ค๊ณ  ํ•ฉ์‹œ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋ชจ๋ธ์˜ ์„ค๋ช…์ด โ€œ์ด ํ™˜์ž์—๊ฒŒ ์•ฝ์ด ํšจ๊ณผ๊ฐ€ ์—†๋Š” ์ฃผ๋œ ์ด์œ ๋Š” ๊ณ ํ˜ˆ์•• ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹คโ€๋ผ๊ณ  ํ•˜๊ณ , ๋‘ ๋ฒˆ์งธ ๋ชจ๋ธ๋„ ๋™์ผํ•œ ์ด์œ ๋ฅผ ์ œ์‹œํ•œ๋‹ค๋ฉด, ์ด ์„ค๋ช…๋“ค์€ ๋†’์€ ์ผ๊ด€์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  4. Stability: The uniformity of explanations for similar instances, ensuring that minor variations in data do not lead to significantly different explanations. ์˜ˆ์‹œ: ์–ด๋–ค ์ง‘๊ฐ’ ์˜ˆ์ธก ๋ชจ๋ธ์ด ๋น„์Šทํ•œ ํŠน์„ฑ(์˜ˆ: ์œ„์น˜, ํฌ๊ธฐ, ๋ฐฉ์˜ ์ˆ˜ ๋“ฑ)์„ ๊ฐ€์ง„ ๋‘ ์ง‘์— ๋Œ€ํ•ด ์˜ˆ์ธก์„ ํ–ˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด ๋ด…์‹œ๋‹ค. ์ด ๋‘ ์ง‘์— ๋Œ€ํ•œ ์„ค๋ช…์ด ํฌ๊ฒŒ ๋‹ค๋ฅด์ง€ ์•Š๋‹ค๋ฉด(์˜ˆ: โ€œ์ด ์ง‘์˜ ๊ฐ€๊ฒฉ์€ ์œ„์น˜ ๋•Œ๋ฌธ์— ๋†’์Šต๋‹ˆ๋‹คโ€), ์ด ๋ชจ๋ธ์˜ ์„ค๋ช…์€ ๋†’์€ ์•ˆ์ •์„ฑ์„ ๊ฐ€์ง„๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  5. Comprehensibility: The ease with which humans can understand the explanations, which varies depending on the audience and is crucial for the practical utility of explanations. ์˜ˆ์‹œ: ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์ด ๋ณต์žกํ•œ ์ˆ˜ํ•™์  ๊ณต์‹์„ ์‚ฌ์šฉํ•ด ์„ค๋ช…์„ ์ œ๊ณตํ•˜๋Š” ๋Œ€์‹ , โ€œ์ด ์ง‘์€ ์ตœ๊ทผ์— ๋ฆฌ๋ชจ๋ธ๋ง๋˜์–ด ๊ฐ€๊ฒฉ์ด ๋†’์Šต๋‹ˆ๋‹คโ€์™€ ๊ฐ™์ด ์ผ๋ฐ˜์ธ๋„ ์‰ฝ๊ฒŒ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ์–ธ์–ด๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด, ์ด ์„ค๋ช…์€ ๋†’์€ ์ดํ•ด์„ฑ์„ ๊ฐ€์ง„๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  6. Certainty: Whether the explanation reflects the modelโ€™s confidence in its predictions, providing insight into the modelโ€™s reliability. ์˜ˆ์‹œ: ๋ชจ๋ธ์ด ์–ด๋–ค ํ™˜์ž์—๊ฒŒ ์•ฝ์ด ํšจ๊ณผ๊ฐ€ ์žˆ์„ ๊ฒƒ์ด๋ผ๊ณ  ์˜ˆ์ธกํ•˜๋ฉด์„œ, ์ด ์˜ˆ์ธก์— ๋Œ€ํ•œ ํ™•์‹ ๋„(์˜ˆ: โ€œ์ด ์˜ˆ์ธก์€ 90%์˜ ํ™•๋ฅ ๋กœ ์ •ํ™•ํ•ฉ๋‹ˆ๋‹คโ€)๋ฅผ ํ•จ๊ป˜ ์ œ๊ณตํ•œ๋‹ค๋ฉด, ์ด ์„ค๋ช…์€ ๋ชจ๋ธ์˜ ํ™•์‹ค์„ฑ์„ ์ž˜ ๋ฐ˜์˜ํ•˜๊ณ  ์žˆ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  7. Degree of Importance: The explanationโ€™s ability to highlight the importance of features or parts of the explanation, aiding in understanding the modelโ€™s decision-making process. ์˜ˆ์‹œ: ์ง‘๊ฐ’ ์˜ˆ์ธก ๋ชจ๋ธ์˜ ์„ค๋ช…์ด โ€œ์ด ์ง‘์˜ ๊ฐ€๊ฒฉ์€ ํฐ ์ •์›๊ณผ ์ตœ์‹  ์ฃผ๋ฐฉ ๋•Œ๋ฌธ์— ๋†’์Šต๋‹ˆ๋‹ค. ๊ทธ ์ค‘์—์„œ๋„ ์ •์›์ด ๊ฐ€์žฅ ํฐ ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹คโ€๋ผ๊ณ  ํŠน์ • ์š”์†Œ์˜ ์ค‘์š”๋„๋ฅผ ๋ช…์‹œํ•œ๋‹ค๋ฉด, ์ด ์„ค๋ช…์€ ํŠน์„ฑ์˜ ์ค‘์š”๋„๋ฅผ ์ž˜ ๋ฐ˜์˜ํ•˜๊ณ  ์žˆ๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  8. Novelty: The explanationโ€™s indication of whether a data instance comes from a distribution far removed from the training data, affecting the modelโ€™s accuracy and the explanationโ€™s usefulness.
    • ์˜ˆ์‹œ: ๋ชจ๋ธ์ด ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ์—†๋˜ ์ƒˆ๋กœ์šด ์œ ํ˜•์˜ ์ง‘์— ๋Œ€ํ•œ ์˜ˆ์ธก์„ ํ•  ๋•Œ, โ€œ์ด ์ง‘์€ ์šฐ๋ฆฌ ๋ฐ์ดํ„ฐ์— ์—†๋˜ ํŠน์ดํ•œ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์–ด ์˜ˆ์ธก์— ๋ถˆํ™•์‹ค์„ฑ์ด ๋†’์Šต๋‹ˆ๋‹คโ€๋ผ๊ณ  ์„ค๋ช…ํ•œ๋‹ค๋ฉด, ์ด๋Š” ์‹ ๊ทœ์„ฑ์„ ์ž˜ ๋ฐ˜์˜ํ•˜๋Š” ์„ค๋ช…์ž…๋‹ˆ๋‹ค.
  9. Representativeness: The scope of instances an explanation covers, ranging from explanations that apply to the entire model to those specific to individual predictions. ์˜ˆ์‹œ: ์–ด๋–ค ์„ค๋ช…์ด ๋‹จ์ง€ ํ•œ ๋‘ ๊ฐœ์˜ ์˜ˆ์ธก์—๋งŒ ์ ์šฉ๋˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๋ชจ๋ธ์ด ๋งŒ๋“  ๋ชจ๋“  ์˜ˆ์ธก์— ๋Œ€ํ•ด ์ผ๋ฐ˜์ ์ธ ์›๋ฆฌ๋ฅผ ์„ค๋ช…ํ•œ๋‹ค๋ฉด(์˜ˆ: โ€œ์ด ๋ชจ๋ธ์€ ์ฃผ๋กœ ์œ„์น˜์™€ ํฌ๊ธฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ง‘๊ฐ’์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹คโ€), ์ด ์„ค๋ช…์€ ๋†’์€ ๋Œ€ํ‘œ์„ฑ์„ ๊ฐ€์ง„๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Methods

CAM based method

GradCAM

  • Model specific, Local, Post-hoc, heatmap
  • Grad-CAM(Gradient-weighted Class Activation Mapping)์€ ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง(CNN)์˜ ๊ฒฐ์ • ๊ณผ์ •์„ ์‹œ๊ฐํ™”ํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ํŠนํžˆ, CNN์ด ์ด๋ฏธ์ง€ ๋‚ด ํŠน์ • ๋ถ€๋ถ„์— ์ฃผ๋ชฉํ•˜์—ฌ ํด๋ž˜์Šค๋ฅผ ๊ฒฐ์ •ํ–ˆ๋‹ค๋Š” ๊ฒƒ์„ ์‹œ๊ฐ์ ์œผ๋กœ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๋ชจ๋ธ์˜ ๋งˆ์ง€๋ง‰ ํ•ฉ์„ฑ๊ณฑ ์ธต์—์„œ ์–ป์€ ํŠน์ง• ๋งต(feature maps)์— ๋Œ€ํ•œ ๊ทธ๋ž˜๋””์–ธํŠธ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•ด๋‹น ํด๋ž˜์Šค์— ๋Œ€ํ•œ ๊ฐ ํŠน์ง• ๋งต์˜ ์ค‘์š”๋„๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

Grad-CAM์„ ๊ตฌํ˜„ํ•˜๋Š” ๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  1. ๋ชฉํ‘œ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ์†์‹ค ํ•จ์ˆ˜์˜ ๊ทธ๋ž˜๋””์–ธํŠธ ๊ณ„์‚ฐ: ๋จผ์ €, ๋ชฉํ‘œ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ์†์‹ค ํ•จ์ˆ˜()๋ฅผ ์ •์˜ํ•˜๊ณ , ์ด ์†์‹ค ํ•จ์ˆ˜์— ๋Œ€ํ•ด ๋งˆ์ง€๋ง‰ ํ•ฉ์„ฑ๊ณฑ ์ธต์˜ ์ถœ๋ ฅ ํŠน์ง• ๋งต()์— ๋Œ€ํ•œ ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๋Š” ํŠน์ง• ๋งต์˜ ์ธ๋ฑ์Šค์ž…๋‹ˆ๋‹ค.

  2. ํŠน์ง• ๋งต์˜ ์ค‘์š”๋„ ๊ฐ€์ค‘์น˜ ๊ณ„์‚ฐ: ๊ฐ ํŠน์ง• ๋งต์— ๋Œ€ํ•œ ๊ทธ๋ž˜๋””์–ธํŠธ์˜ ๊ธ€๋กœ๋ฒŒ ํ‰๊ท ์„ ๊ณ„์‚ฐํ•˜์—ฌ, ํ•ด๋‹น ํŠน์ง• ๋งต์ด ๋ชฉํ‘œ ํด๋ž˜์Šค์˜ ๊ฒฐ์ •์— ์–ผ๋งˆ๋‚˜ ์ค‘์š”ํ•œ์ง€ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฐ€์ค‘์น˜()๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค.

    ์—ฌ๊ธฐ์„œ ๋Š” ํŠน์ง• ๋งต์˜ ์ฐจ์›์„ ๋‚˜ํƒ€๋‚ด๋Š” ์ƒ์ˆ˜(๋„ˆ๋น„ ๋†’์ด), ์™€ ๋Š” ๊ฐ๊ฐ ํŠน์ง• ๋งต์˜ ์„ธ๋กœ ๋ฐ ๊ฐ€๋กœ ์ธ๋ฑ์Šค์ž…๋‹ˆ๋‹ค.

  3. ๊ฐ€์ค‘์น˜์™€ ํŠน์ง• ๋งต์˜ ์„ ํ˜• ๊ฒฐํ•ฉ์œผ๋กœ ํด๋ž˜์Šค ํ™œ์„ฑํ™” ๋งต ์ƒ์„ฑ: ๋ชฉํ‘œ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ๊ฐ ํŠน์ง• ๋งต์˜ ์ค‘์š”๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฐ€์ค‘์น˜()์™€ ํ•ด๋‹น ํŠน์ง• ๋งต()์„ ๊ณฑํ•œ ํ›„, ๋ชจ๋“  ํŠน์ง• ๋งต์— ๋Œ€ํ•ด ํ•ฉํ•˜์—ฌ ํด๋ž˜์Šค ํ™œ์„ฑํ™” ๋งต(CAM, )์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

    ์—ฌ๊ธฐ์„œ ReLU(Rectified Linear Unit) ํ•จ์ˆ˜๋Š” ์Œ์ˆ˜ ๊ฐ’์„ ์ œ๊ฑฐํ•จ์œผ๋กœ์จ, ๋ชจ๋ธ์ด ํด๋ž˜์Šค๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•œ ์˜์—ญ๋งŒ์„ ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค.

Grad-CAM ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์ƒ์„ฑ๋œ ํด๋ž˜์Šค ํ™œ์„ฑํ™” ๋งต์€ ์›๋ณธ ์ด๋ฏธ์ง€ ์œ„์— ์˜ค๋ฒ„๋ ˆ์ด๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์ด ํŠน์ • ํด๋ž˜์Šค๋ฅผ ์ธ์‹ํ•˜๊ธฐ ์œ„ํ•ด ์ด๋ฏธ์ง€์˜ ์–ด๋–ค ๋ถ€๋ถ„์— ์ฃผ๋ชฉํ–ˆ๋Š”์ง€ ์‹œ๊ฐ์ ์œผ๋กœ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ธฐ๋ฒ•์€ ๋ชจ๋ธ์˜ ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ์„ ๋†’์ด๊ณ , ๋ชจ๋ธ์ด ์˜ฌ๋ฐ”๋ฅธ ๊ทผ๊ฑฐ๋กœ ๊ฒฐ์ •์„ ๋‚ด๋ ธ๋Š”์ง€ ๊ฒ€์ฆํ•˜๋Š” ๋ฐ ์œ ์šฉํ•˜๊ฒŒ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

Perturbation based Method

  • Input์— ๋ณ€ํ˜•(perturbation)์„ ๊ฐ€ํ•จ์œผ๋กœ์จ ๋ชจ๋ธ์˜ ํŒ๋‹จ๊ทผ๊ฑฐ๋ฅผ ํ•ด์„ํ•˜๋ ค๋Š” ๋ฐฉ๋ฒ•

LIME

  • ๋Œ€ํ‘œ์ ์ธ Surrogate model์„ ๋งŒ๋“ค์–ด์„œ ์„ค๋ช…ํ•˜๋Š” XAI ๋ฐฉ๋ฒ• ์ค‘์— ํ•˜๋‚˜.

  • Model agnostic, Local, Post-hoc, surrogate $$$\text{explanation}(x) = \arg\min_{g \in G} \left[ L(f, g, \pi_x) + \Omega(g) \right]$$

  • is the loss function comparing the explainable model to the original function .

  • is the proximity measure which defines the neighborhood around instance .

  • is a measure of the complexity of the model .

  • The explainable model could be a Lasso regression or a Decision Tree.

  • ํ•˜๋‚˜์˜ instance๋ฅผ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•œ interpretable surrogate model์„ ๋งŒ๋“ค๊ณ  ๊ทธ ๋ชจ๋ธ์„ ํ•ด์„ํ•จ์œผ๋กœ์จ ๋ชจ๋ธ์˜ ์„ค๋ช…์„ ์‹œ๋„ํ•˜๋Š” ๋ฐฉ๋ฒ•

RISE (Randomized input sampling for explanation)

  • Model agnostic, Local, Post-hoc, heatmap
    • ์›๋ณธ ์ด๋ฏธ์ง€์— randomํ•œ mask๋ฅผ ์ƒ์„ฑํ•˜์—ฌ, ์ด๋ฅผ ํ†ตํ•ด์„œ ๊ธฐ์กด prediction์— ๋ณ€ํ™”๋Ÿ‰์„ weight sumํ•˜์—ฌ ์ค‘์š”ํ•œ ์˜์—ญ์„ ์ฐพ์•„๋‚ด๋Š” ๋ฐฉ๋ฒ•

Counterfactual explanation

  • Model agnostic, Local, Post-hoc, example
  • ์‹ค์ œ input ๋ฐ์ดํ„ฐ์˜ prediction์ด ๋ฐ”๋€”๋งŒํ•œ perturbation ์ฃผ์–ด์„œ, ๋Œ€์กฐ์ ์ธ predcition sample์„ ํ†ตํ•ด ๋ชจ๋ธ์„ ํ•ด์„ํ•˜๋Š” ๋ฐฉ๋ฒ•
  • ์ •ํ˜•๋ฐ์ดํ„ฐ์—์„œ๋Š” SHAP value๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ๋น„์ •ํ˜•๋ฐ์ดํ„ฐ์—์„œ๋Š” generative model์„ ํ™œ์šฉํ•œ counterfactual๊ฐ€ ๋ฐœ์ „ํ•˜๊ณ  ์žˆ์Œ.

(https://www.nature.com/articles/s41598-021-04529-5)

Concept based

  • ์—ฐ๊ตฌ์ž๊ฐ€ ํ™•์ธํ•˜๊ณ  ์‹ถ์€ ์ปจ์…‰์„ ์ •์˜ํ•˜๊ณ , ์ด๋ฅผ ๋ชจ๋ธ์ด ๋‚ด๋ถ€์ ์œผ๋กœ ํ™œ์šฉํ•˜๊ณ  ์žˆ๋Š”์ง€๋ฅผ ํ™•์ธํ•จ์œผ๋กœ์จ ํ•ด์„ํ•˜๋Š” ๋ฐฉ๋ฒ•
  • Properties for concept-based explanation
    • Meaningfulness: semantically meaningful on its own ๊ฐœ๋…์ด ์Šค์Šค๋กœ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์–ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋งํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ๊ทธ ๊ฐœ๋…์ด ๋ฌด์—‡์ธ์ง€ ์‚ฌ๋žŒ์ด ๋ณด๊ณ  ๋ฐ”๋กœ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, โ€˜์ข…์–‘โ€™์ด๋ผ๋Š” ๊ฐœ๋…์€ ์˜๋ฃŒ ์ด๋ฏธ์ง€ ๋ถ„์„์—์„œ ์ค‘์š”ํ•˜๋ฉฐ, ์Šค์Šค๋กœ ์˜๋ฏธ๊ฐ€ ์žˆ๊ณ , ์‚ฌ๋žŒ๋“ค์ด ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐœ๋…์ž…๋‹ˆ๋‹ค.
    • Coherency: perceptually similar to each other while being different from examples of other concepts. ๊ฐœ๋…์ด ํ‘œํ˜„ํ•˜๋Š” ์˜ˆ์‹œ๋“ค์ด ์„œ๋กœ ๋น„์Šทํ•˜๋ฉด์„œ๋„, ๋‹ค๋ฅธ ๊ฐœ๋…์˜ ์˜ˆ์‹œ๋“ค๊ณผ๋Š” ํ™•์‹คํžˆ ๊ตฌ๋ถ„๋  ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, โ€˜์ข…์–‘โ€™์ด๋ผ๋Š” ๊ฐœ๋…์— ์†ํ•˜๋Š” ์ด๋ฏธ์ง€๋“ค์€ ์„œ๋กœ ๋น„์Šทํ•œ ํŠน์ง•์„ ๊ณต์œ ํ•˜์ง€๋งŒ, โ€˜์ •์ƒ ์กฐ์งโ€™์ด๋ผ๋Š” ๋‹ค๋ฅธ ๊ฐœ๋…์˜ ์ด๋ฏธ์ง€์™€๋Š” ํ™•์—ฐํžˆ ๋‹ค๋ฅธ ํŠน์ง•์„ ๋ณด์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•จ์œผ๋กœ์จ, AI๊ฐ€ ํ•ด๋‹น ๊ฐœ๋…์„ ์ •ํ™•ํ•˜๊ฒŒ ์ธ์‹ํ•˜๊ณ  ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
    • Importance: its presence is necessary for the true prediction of samples in that class. ๊ฐœ๋…์˜ ์กด์žฌ๊ฐ€ ํ•ด๋‹น ํด๋ž˜์Šค์˜ ์ƒ˜ํ”Œ์„ ์ •ํ™•ํ•˜๊ฒŒ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ ํ•„์ˆ˜์ ์ด์–ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์ฆ‰, ๊ทธ ๊ฐœ๋…์ด ์—†์œผ๋ฉด AI๊ฐ€ ์˜ฌ๋ฐ”๋ฅธ ๊ฒฐ์ •์„ ๋‚ด๋ฆฌ๋Š” ๋ฐ ์–ด๋ ค์›€์„ ๊ฒช์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, โ€˜์ข…์–‘โ€™์ด๋ผ๋Š” ๊ฐœ๋…์€ ์•”์„ ์ง„๋‹จํ•˜๋Š” AI ๋ชจ๋ธ์—์„œ ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. โ€˜์ข…์–‘โ€™์˜ ์กด์žฌ ์—ฌ๋ถ€๊ฐ€ ์ง„๋‹จ ๊ฒฐ๊ณผ์— ํฐ ์˜ํ–ฅ์„ ๋ฏธ์น  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

Network Dissection

  • Model-specific, Local, Post-hoc, metric
  • ๊ฐ convolution layer์—์„œ feature map์„ ์›๋ณธ ํฌ๊ธฐ๋กœ upsamplingํ•จ.
  • ํ™œ์„ฑ๊ฐ’์„ ์„ค์ •ํ•˜๊ณ , ๊ฐ conv feaure map์„ 0 ๋˜๋Š” 1์˜ binary๋กœ ๋งŒ๋“ฌ.
  • ์ธ๊ฐ„์ด ์ •์˜ํ•œ simentic segmentaion๊ณผ overlapํ•˜์—ฌ ์–ด๋–ค ๊ฐœ๋…์„ ์ฃผ๋กœ ๋ณด๊ณ  ์žˆ๋Š”์ง€๋ฅผ ์ถ”๋ก 

TCAV (Testing concept activation vector)

  • Model-specific, Global, Post-hoc, metric
  • ํ™•์ธํ•˜๊ณ ์žํ•˜๋Š” ํŠน์ • ๊ฐœ๋…์„ ์ •์˜ํ•˜๊ณ , ์ด๋ฅผ ๋ฐ˜์˜ํ•˜๋Š” concept๋ฐ์ดํ„ฐ์…‹์„ ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•ด์„ํ•˜๋Š” ๋ฐฉ๋ฒ•.
  • ๊ฒฐ๊ณผ์ ์œผ๋กœ TCAV score๋ฅผ ํ†ตํ•ด์„œ ๋ชจ๋ธ์— ์šฐ๋ฆฌ๊ฐ€ ํ™•์ธํ•˜๊ณ ์žํ•˜๋Š” ๊ฐœ๋…์ด ํ™œ์šฉ๋˜๋Š”์ง€ ์ •๋Ÿ‰์ ์œผ๋กœ ํ‰๊ฐ€ํ•จ.

Attention based

(To be continue)

Evaluation of XAI

Content

  • ์ •ํ™•์„ฑ(Correctness): ์„ค๋ช…์ด ์„ค๋ช…ํ•˜๊ณ ์ž ํ•˜๋Š” ๋ธ”๋ž™๋ฐ•์Šค ๋ชจ๋ธ์— ๋Œ€ํ•ด ์–ผ๋งˆ๋‚˜ ์ถฉ์‹คํ•œ์ง€๋ฅผ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์˜ ์ถ”๋ก ์„ ์ •ํ™•ํžˆ ๋ฐ˜์˜ํ•˜๋Š”์ง€ ํ™•์ธํ•˜์—ฌ ์„ค๋ช…์˜ ์‹ ๋ขฐ์„ฑ์„ ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ: ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์ด ์‚ฌ์ง„ ์† ๊ณ ์–‘์ด๋ฅผ ์ธ์‹ํ–ˆ๋‹ค๋ฉด, ์ •ํ™•์„ฑ์€ ๋ชจ๋ธ์ด ๊ณ ์–‘์ด์˜ ํŠน์ง•(์˜ˆ: ๊ท€, ๋ˆˆ)์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํŒ๋‹จํ–ˆ๋Š”์ง€๋ฅผ ์„ค๋ช…ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ์™„์ „์„ฑ(Completeness): ๋ชจ๋ธ์˜ ํ–‰๋™์ด๋‚˜ ์ถ”๋ก ์„ ์–ผ๋งˆ๋‚˜ ์ž˜ ์„ค๋ช…ํ•˜๋Š”์ง€ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ ๋‘ ๊ฐ€์ง€๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค:
    • ์ถ”๋ก  ์™„์ „์„ฑ(Reasoning-completeness): ์„ค๋ช…์ด ๋ชจ๋ธ์˜ ๋‚ด๋ถ€ ์—ญํ•™์„ ์–ผ๋งˆ๋‚˜ ์ž˜ ์„ค๋ช…ํ•˜๋Š”์ง€ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.
    • ์ถœ๋ ฅ ์™„์ „์„ฑ(Output-completeness): ์„ค๋ช…์ด ๋ชจ๋ธ ์ถœ๋ ฅ์„ ์–ผ๋งˆ๋‚˜ ์ž˜ ์„ค๋ช…ํ•˜๋Š”์ง€ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.
      • ์˜ˆ์‹œ: ๋ชจ๋ธ์ด ์—ฌ๋Ÿฌ ํŠน์ง•์„ ์‚ฌ์šฉํ•ด ๊ฒฐ์ •์„ ๋‚ด๋ ธ๋‹ค๋ฉด, ๊ทธ ๋ชจ๋“  ํŠน์ง•์ด ์„ค๋ช…์— ํฌํ•จ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ณ ์–‘์˜ ๊ท€์™€ ๋ˆˆ์„ ๊ฐ€์ง€๊ณ  ํŒ๋‹จํ–ˆ๋‹ค๋ฉด, ์„ค๋ช… ์•ˆ์—๋Š” ๊ท€์™€ ๋ˆˆ์ด ๋ชจ๋‘ ์žˆ์–ด์•ผํ•จ.
  • ์ผ๊ด€์„ฑ(Consistency): ๋™์ผํ•œ ์ž…๋ ฅ์ด ๋™์ผํ•œ ์„ค๋ช…์„ ์ƒ์„ฑํ•˜๋Š”์ง€ ํ‰๊ฐ€ํ•˜์—ฌ, ์„ค๋ช… ๋ฐฉ๋ฒ•์˜ ๊ฒฐ์ •๋ก ์  ํŠน์„ฑ๊ณผ ๊ตฌํ˜„ ๋ถˆ๋ณ€์„ฑ์„ ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ: ๋‘ ์‚ฌ์šฉ์ž๊ฐ€ ๋™์ผํ•œ ๋ฐ์ดํ„ฐ๋กœ ๋ชจ๋ธ์„ ์ฟผ๋ฆฌํ•  ๋•Œ, ๋‘˜ ๋‹ค ๋™์ผํ•œ ์„ค๋ช…์„ ๋ฐ›์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ์—ฐ์†์„ฑ(Continuity): ์ž…๋ ฅ์˜ ์†Œํญ ๋ณ€ํ™”๊ฐ€ ์„ค๋ช…์— ํฐ ๋ณ€ํ™”๋ฅผ ์ผ์œผํ‚ค์ง€ ์•Š์•„์•ผ ํ•˜๋ฉฐ, ์„ค๋ช… ํ•จ์ˆ˜์˜ ๋ถ€๋“œ๋Ÿฌ์›€์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ: ์‚ฌ์šฉ์ž๊ฐ€ ์ด๋ฏธ์ง€์˜ ํ”ฝ์…€์„ ์•ฝ๊ฐ„ ์กฐ์ •ํ–ˆ์„ ๋•Œ, ์„ค๋ช…์˜ ๋ณ€ํ™”๊ฐ€ ๋ฏธ๋ฏธํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ๋Œ€์กฐ์„ฑ(Contrastivity): ๋‹ค๋ฅธ ๊ฒฐ๊ณผ๋‚˜ ์ด๋ฒคํŠธ์™€ ๋น„๊ตํ•˜์—ฌ ์„ค๋ช…์ด ์–ผ๋งˆ๋‚˜ ๊ตฌ๋ถ„๋˜๋Š”์ง€ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ: โ€œ์™œ ์ด ์ด๋ฏธ์ง€๊ฐ€ ๊ณ ์–‘์ด๋กœ ๋ถ„๋ฅ˜๋˜์—ˆ๋Š”๊ฐ€?โ€์™€ โ€œ์™œ ์ด ์ด๋ฏธ์ง€๊ฐ€ ๊ฐ•์•„์ง€๊ฐ€ ์•„๋‹Œ๊ฐ€?โ€์— ๋Œ€ํ•œ ์„ค๋ช…์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ณต๋ณ€๋Ÿ‰ ๋ณต์žก์„ฑ(Covariate Complexity): ์„ค๋ช…์— ์‚ฌ์šฉ๋œ ํŠน์ง•์˜ ๋ณต์žก์„ฑ๊ณผ ์ดํ•ด ๊ฐ€๋Šฅ์„ฑ์„ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ: ๊ณ ๊ธ‰ ํ†ต๊ณ„ ๋Œ€์‹  ๊ฐ„๋‹จํ•˜๊ณ  ์ง๊ด€์ ์ธ ์–ธ์–ด๋กœ ์„ค๋ช…์„ ์ œ๊ณตํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

Representation

  • ๊ฐ„๊ฒฐ์„ฑ(Compactness): ์„ค๋ช…์˜ ํฌ๊ธฐ๋ฅผ ๋‹ค๋ฃจ๋ฉฐ, ์ธ๊ฐ„์˜ ์ธ์ง€ ํ•œ๊ณ„ ๋‚ด์—์„œ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ฐ„๊ฒฐํ•จ์„ ์ถ”๊ตฌํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ: ํ•„์š”ํ•œ ์ •๋ณด๋งŒ์„ ๊ฐ„๋žตํ•˜๊ฒŒ ์ œ๊ณตํ•˜์—ฌ ์‚ฌ์šฉ์ž๊ฐ€ ์‰ฝ๊ฒŒ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ตฌ์„ฑ(Composition): ์„ค๋ช…์˜ ํ˜•์‹, ์กฐ์ง, ๊ตฌ์กฐ๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ๋ช…ํ™•์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ: ์‚ฌ์šฉ์ž๊ฐ€ ์ดํ•ดํ•˜๊ธฐ ์‰ฌ์šด ํ˜•์‹(์˜ˆ: ์‹œ๊ฐ์  ๊ทธ๋ž˜ํ”„)์œผ๋กœ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ์‹ ๋ขฐ๋„(Confidence): ์„ค๋ช…์— ํ™•์‹  ๋˜๋Š” ๋‹ค๋ฅธ ํ™•๋ฅ  ์ •๋ณด๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๋Š”์ง€, ๊ทธ๋ฆฌ๊ณ  ๊ทธ ์ •ํ™•์„ฑ์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ: ๋ชจ๋ธ์ด ๊ฒฐ์ •์— ๋Œ€ํ•ด ์–ผ๋งˆ๋‚˜ ํ™•์‹ ํ•˜๋Š”์ง€๋ฅผ ์ˆ˜์น˜๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

User

  • ์ƒํ™ฉ์  ๋งฅ๋ฝ(Context): ์„ค๋ช…์ด ์‚ฌ์šฉ์ž์˜ ํ•„์š”์™€ ์ „๋ฌธ ์ง€์‹ ์ˆ˜์ค€์„ ๊ณ ๋ คํ•˜๋Š” ์ •๋„๋ฅผ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ: ์˜๋ฃŒ ๋ถ„์•ผ ์ „๋ฌธ๊ฐ€์—๊ฒŒ๋Š” ์ „๋ฌธ ์šฉ์–ด๋ฅผ ์‚ฌ์šฉํ•œ ์„ค๋ช…์„, ์ผ๋ฐ˜ ์‚ฌ์šฉ์ž์—๊ฒŒ๋Š” ๊ฐ„๋‹จํ•œ ์–ธ์–ด๋กœ ์„ค๋ช…์„ ์ œ๊ณตํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ์ผ์น˜์„ฑ(Coherence): ๊ธฐ์กด์˜ ๋ฐฐ๊ฒฝ ์ง€์‹, ๋ฏฟ์Œ ๋ฐ ์ผ๋ฐ˜์ ์ธ ํ•ฉ์˜์™€ ์–ผ๋งˆ๋‚˜ ์ผ์น˜ํ•˜๋Š”์ง€ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ: ์„ค๋ช…์ด ์‚ฌ์šฉ์ž์˜ ๊ธฐ์กด ์ง€์‹๊ณผ ์ผ์น˜ํ•˜์—ฌ, ์„ค๋ช…์ด ํ•ฉ๋ฆฌ์ ์œผ๋กœ ๋А๊ปด์ ธ์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ์ œ์–ด ๊ฐ€๋Šฅ์„ฑ(Controllability): ์‚ฌ์šฉ์ž๊ฐ€ ์„ค๋ช…๊ณผ ์ƒํ˜ธ ์ž‘์šฉํ•˜๊ณ , ์ œ์–ดํ•˜๊ฑฐ๋‚˜ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋Š” ์ •๋„๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ: ์‚ฌ์šฉ์ž๊ฐ€ ์„ค๋ช…์„ ์กฐ์ •ํ•˜์—ฌ ์ž์‹ ์—๊ฒŒ ๋” ์ ํ•ฉํ•˜๊ฒŒ ๋งŒ๋“ค ์ˆ˜ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

Correctness (faithfulness)

  • ์„ค๋ช…์˜ ๊ฒฐ๊ณผ๊ฐ€ ์ •๋ง ๋ชจ๋ธ ์˜ˆ์ธก์— ์žˆ์–ด์„œ ์ค‘์š”ํ•œ ๋ถ€๋ถ„์ด ๋งž์•˜๋Š”์ง€๋ฅผ ์ธก์ •ํ•˜๋Š” ์ง€ํ‘œ

Deletion and Insertion

  • heatmap์œผ๋กœ ๋‚˜์˜จ ์ค‘์š”๋„๋ฅผ ๋†’์€ ์ˆœ์œผ๋กœ ์ง€์›Œ๊ฐ€๋ฉด์„œ(์ฑ„์›Œ๊ฐ€๋ฉด์„œ) model์˜ prediction ๋ณ€ํ™”๋ฅผ ์ •๋Ÿ‰์ ์œผ๋กœ ์ˆ˜์น˜ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•

Completeness

Sanity check

  • ๋ช…๋ฐฑํ•œ ์˜์—ญ์˜ task๋ฅผ ๋‘๊ณ  ํ™•์ธํ•˜๋Š” ๋ฐฉ๋ฒ•
  • ์‚ผ๊ฐํ˜• ๋งž์ถ”๊ธฐ task๋ฅผ ์ฃผ๊ณ , ๋ชจ๋ธ์˜ ์„ค๋ช…์ด ์ •ํ™•ํžˆ ์‚ผ๊ฐํ˜•์˜ ์˜์—ญ์„ ๋ฎ๊ณ  ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๋Š” ๋ฐฉ๋ฒ•.

Robustness (continuity)

  • ์กฐ๊ธˆ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋„ ๊ฐ•๊ฑดํ•œ ์„ค๋ช…์„ ์ œ๊ณตํ•ด์•ผ ํ•จ.