研究综述：可解释性工作

简介

随着人工智能技术的快速发展，模型的可解释性日益成为研究热点。可解释性不仅有助于理解模型决策的内在逻辑，还能增强用户对模型的信任，满足监管要求，并为模型改进提供方向。

本文综述了可解释性研究的三个主要方向：模型方向可解释性、特征方向可解释性和认知方向可解释性，总结了各方向的关键方法、研究成果和最新进展。

模型方向可解释性

1. 逻辑模型树

Quinlan (1992)：线性回归树
Chan and Loh (2004)、Landwehr, Hall, and Frank (2005)：斜线决策边界
Djeundje and Crook (2018)：转移矩阵应用

2. 生成模型

Reizinger et al. (2022)：VAE的独立机制分析
Giuliani et al. (2024)：自编码器降维识别违约聚类
Bashar and Nayak (2020)、Kim et al. (2024)：GAN异常检测，类似于VAE

3. 树模型扩展

İrsoy, Yıldız, and Alpaydın (2012)：软决策树（sigmoid路径+梯度下降），它在内部节点使用软决策，通过sigmoid函数给出选择左右子节点的概率。这种树的构建是增量的，需要时添加新节点，并使用梯度下降法学习参数。
Bertsimas et al. (2024)：广义软树支持图像处理，产生Fuzzy Regression Tree。
T. Chen and Guestrin (2016)：金融/医疗场景中，以硬约束（单调、交互、特征增益权重）将业务规则植入模型，降低黑箱风险，提升决策可解释性。确保模型决策符合业务认知，降低badcase解释成本。

4. 关联分析

Mantegna and Stanley (2000)：最小生成树构建指标层次

特征方向可解释性

1. 特征归因方法

Redell (2019)：R²分解法
RankingSHAP
Lemaire, Féraud, and Voisine (2008)：分离变量重要性与变量值重要性
Strumbelj and Kononenko (2010)：进行抽样计算优化

2. 特征关系分析

机器学习中的特征冗余性和应用及Liang et al. (2023)：协同训练量化特征冗余
Janzing et al. (2019)：父节点Shapley归因
Horel and Giesecke (2020)：该检验方法可以用来评估单个变量对神经网络预测的影响，并可以用来对变量的影响进行排序。对偏导数的平方进行了取值，以避免正负值之间的抵消，确保可微性，并有助于区分大值和小值。
Bussmann et al. (2021)：Shapley值量化样本相似性，做标签分层。通过计算Shapley值，可以确定每个公司（数据点）的特征贡献。相似性网络通过测量这些贡献之间的距离来评估不同公司之间的相似性。样本间比较：通过比较不同样本的SHAP值集合，可以评估样本间的相似性或差异性。例如，如果两个公司的SHAP值集合相似，这可能表明它们在信用风险特征上有相似之处。
Cooper (2022)、Talaei, Oztekin, and Motiwalla (2025)：SHAP监督聚类
Liang et al. (2023)、Baron and Kenny：不仅考虑了变量的冗余性和协同作用，还考虑了变量的中介作用和调节作用。如何判断变量冗余，即是否重要。就是判断ε是否符合正态分布。思路类似于Janzing et al. (2019)相当于针对异常程度~父节点跑了一个回归问题，用SHAP拆分分析。同时稳定学习也是在多个数据集上判断x是否稳定的显著。
Janzing et al. (2019)：使用合作博弈论中的Shapley值来量化异常值得分中祖先节点的贡献程度，从而确定目标变量的异常值可以归因于祖先节点的异常值。其实相当于针对异常程度~父节点跑了一个回归问题，用SHAP拆分分析。

3. 特征工程应用

H. Chen, Lundberg, and Lee (2022)：SHAP损失归因诊断协变量偏移，识别出模型在测试数据上表现不佳的原因

认知方向可解释性

1. 决策理论

Kahneman (2011)、Lu and Zhang (2023)：双系统理论（系统1/系统2）。任务和信息复杂性刺激了系统2深度思考。提供丰富信息和结构化机器解释可以增强人类在决策任务中的认识和积极认知推理。
Bayer and Renou (2024)：复杂度决定人机交互优势。研究发现在简单任务中，与其他人互动的参与者表现更好；而在复杂问题中，与算法互动的参与者表现更佳。进一步的实验表明，这些差异是由AI推理正确性的知识驱动的，而不是因为AI是非人类。此外，通过引入专家条件的额外实验，研究指出，当参与者知道他们面对的是人类专家而不是AI时，他们的推理质量与AI条件下相似，这表明是战略不确定性的类型而非来源影响了推理质量。

2. 认知偏差机制

Tversky and Kahneman (1974)：启发式偏差
Tversky and Kahneman (1981)、Hsee, Imas, and Li (2024)：框架效应
Chang et al. (2024)：量化偏见。研究发现，人们在做决策时倾向于重视定量信息，偏好在数值描述维度上占优的选项。这种"量化偏见"是由数字在比较决策中易于使用的感觉驱动的；对数字更熟悉的人（主观数值能力较高者）更可能表现出量化偏见。随着量化变得越来越普遍，数字的比较流畅性可能会系统地扭曲决策。这些发现表明，对某些选择特征进行量化可能对决策方式产生重要影响。
Baule, Korn, and Kuntz (2019)、Bellalah, Ben Amar, and Clark (2024)：后悔理论。投资者的回报不仅取决于实际的投资结果，还取决于与最佳可能结果的差距。

3. 可信度研究

Vafa, Rambachan, and Mullainathan (2024)：用户通过有限交互泛化模型能力边界，导致强模型（如GPT-4）在高风险场景因认知错位而失效/过度信任模型能力。
Lyu et al. (2024)：C1/C2因果提示。C1代表"慢思考"过程，即评论内容驱动情感评分；C2代表"快思考"过程，即情感评分驱动评论内容。
B. Fogg (2009a)、B. J. Fogg and Euchner (2019)：行为模型。阐述了BJ Fogg的行为模型（FBM），即Fogg Behavior Model，它包含三个核心要素：B = MAP，动机（Motivation）、能力（Ability）和触发器（Triggers）/提示（Prompt）。这三个要素共同决定了一个人是否会执行某个特定的行为。行为网格（Behavior Grid）：Fogg描述了一个框架，它映射出15种行为变化的方式，并按变化类型和持续时间组织行为类型。
B. Fogg (2009b)：本文《Creating Persuasive Technologies: An Eight-Step Design Process》由BJ Fogg撰写，提出了设计说服性技术的八个步骤。这些步骤基于行业实践中的成功案例，从定义说服目标开始，匹配目标受众和适当的技术渠道，到模仿成功的说服设计、进行快速试验、测量行为结果，并在小成功的基础上进一步发展。

4. 其他

Daniel Kahneman 的研究主要关注思维方式和决策心理学。

他与Amos Tversky合作的论文《判断不确定性：启发式和偏见》(Judgment under uncertainty: Heuristics and biases)(Tversky and Kahneman 1974)提出了代表性启发式、可得性启发式和调整启发式三个主要的启发式原则。这些启发式原则在日常判断中非常有用，但有时也会导致严重的系统性错误。这篇论文对认知偏差提供了深入的研究。
- Kahneman, Lovallo, and Sibony (2019)：系统1的问题，为了减少判断中的错误，需要一个有纪律的过程。这强调了在进行战略决策时，采取结构化和系统化的方法的重要性，以确保决策过程的严谨性和减少可能由认知偏差或随机误差引起的错误。
- Kahneman, Lovallo, and Sibony (2011)：文章提出了一种系统性的方法（慢速思维），帮助管理者通过12个问题清单来审查决策过程，从而识别和减少可能影响团队决策的认知偏差。检查推荐团队是否存在自我利益偏见、情感偏见、团体迷思等。
他们后来发表的论文《前景理论：决策分析》(Prospect theory: An analysis of decision under risk)(Kahneman and Tversky 1979)介绍了概率效用理论。概率效用理论提出人们在面对风险决策时的行为模式，强调人们对收益和损失的评估是相对于某一参照点的，并且在面对收益时倾向于风险规避，而在面对损失时倾向于风险追求。这篇论文帮助人们理解了人们在风险决策中的行为规律。
Tversky and Kahneman (1981)：即使是相同的决策问题，不同的表述方式（即框架）也会导致人们做出不同的选择。这种现象被称为框架效应。文章通过多个问题情境展示了当问题以不同的方式呈现时，人们的偏好如何发生逆转，这些情境包括关于金钱结果和生命损失的问题。在正面框架下，人们倾向于选择确定性较小但风险较小的选项；而在负面框架下，人们则更愿意冒险，以避免确定的损失。这种偏好的变化并不是因为问题的实际内容发生了变化，而仅仅是因为问题的表述方式不同。人们倾向于正面确定性（收益）或负面不确定的（损失）。Hsee, Imas, and Li (2024)：框架和风险的关系。在O框架中，决策是以每种可能结果的独立呈现来表述的，这可能使个体更关注于自己的选择和结果，从而可能导致更偏向于个人利益的选择。这种表述方式可能强调了个人决策的独立性和结果的确定性，因此可以被视为一种正向框架。相比之下，R框架强调的是参与者行为之间的关系以及它们如何共同影响结果。这种表述方式可能增加了个体对合作和集体利益的考虑，因为它突出了协同作用和集体最优选择的可能性。这种框架可能使个体感觉到选择最大化集体利益是一种更有社会价值和协调性的决策，即使这可能涉及更高的风险。
Kahneman 还发表了《选择、价值和框架》(Choices, values, and frames)(Kahneman and Tversky 1984)这篇综述文章，总结了他和Tversky的研究成果，并探讨了价值的心理物理学、概率的心理物理学、决策框架、心理账户和决策价值与经验价值的关系。这篇文章展示了人们在决策过程中的一些常见问题和错误，并提供了一些方法和策略，帮助人们更好地做出决策。
最后，Kahneman 还写了一本名为《思考，快与慢》(Thinking, Fast and Slow)(Kahneman 2011)的书，全面介绍了他对思维方式和决策心理学的研究。这本书涵盖了他的研究成果和理论，通过案例和实验，揭示了人们思维和决策过程中的一些常见问题和错误，并提供了一些方法和策略，帮助人们更好地做出决策。阅读这本书可以对该领域的理论和研究有一个整体的认识。

Bendoly et al. (2010)：这篇文章讨论了《生产与运营管理期刊》（POM）成立三十周年以来最具影响力的十篇论文之一(Kumar and Singhal 2024)。在行为运营研究中，个体和群体动态对运营决策和行为的影响。文中分析了多种心理学和社会学理论，包括认知偏差、目标设定、反馈和控制理论，以及群体动态等，这些理论如何影响运营环境（如超市多队列设置）中的决策和行为。此外，章节还讨论了系统动态对运营的影响，包括反馈结构和动态的误解，以及这些误解如何导致运营问题。
Varpio et al. (2020)：理论框架更多关注于理论的应用和发展，而概念框架则侧重于研究的合理性（research gap）和重要性。
Jabareen (2009)：而理论框架通常指的是一个更广泛的理论基础，可能包括概念框架作为其组成部分，但也可能包含更多的理论变量和因果关系。
Glaser and Strauss (1967)：作者批评了当时社会学界对"大理论"的过分依赖，认为这种理论往往与实证研究脱节，而他们提倡的是基于数据生成的理论，这种理论更具有适应性和工作能力。"A few men (like Parsons and Merton) have seen through this charismatic view of the great men sufficiently to generate 'grand' theories on their own."

参考文献

Bashar, Md Abul, and Richi Nayak. 2020. “TAnoGAN: Time Series Anomaly Detection with Generative Adversarial Networks.” In 2020 IEEE Symposium Series on Computational Intelligence (SSCI), 1778–85. IEEE.

Baule, Rainer, Olaf Korn, and Laura-Chlo Kuntz. 2019. “Markowitz with Regret.” Journal of Economic Dynamics and Control 103: 1–24.

Bayer, Ralph-Christopher, and Ludovic Renou. 2024. “Interacting with Man or Machine: When Do Humans Reason Better?” Management Science Articles in Advance: 1–15.

Bellalah, Makram, Amine Ben Amar, and Ephraim Clark. 2024. “Regret-Aversion over Different Maturities: Application to Energy Futures Markets.” Economics Letters 241: 111812.

Bendoly, Elliot, Rachel Croson, Paulo Goncalves, and Kenneth Schultz. 2010. “Bodies of Knowledge for Research in Behavioral Operations.” Production and Operations Management 19 (4): 434–52.

Bertsimas, Dimitris, Lisa Everest, Jiayi Gu, Matthew Peroni, and Vasiliki Stoumpou. 2024. “Deep Trees for (Un)structured Data: Tractability, Performance, and Interpretability.” arXiv Preprint arXiv:2410.21595.

Bussmann, Niklas, Paolo Giudici, Dimitri Marinelli, and Jochen Papenbrock. 2021. “Explainable Machine Learning in Credit Risk Management.” Computational Economics 57 (203–216): 203–16.

Chan, Kin-Yee, and Wei-Yin Loh. 2004. “LOTUS: An Algorithm for Building Accurate and Comprehensible Logistic Regression Trees.” Journal of Computational and Graphical Statistics 13 (4): 826–52. https://doi.org/10.1198/106186004X13064.

Chang, Linda W, Erika L Kirgios, Sendhil Mullainathan, and Katherine L Milkman. 2024. “Does Counting Change What Counts? Quantification Fixation Biases Decision-Making.” Proceedings of the National Academy of Sciences 121 (46): e2400215121.

Chen, Hugh, Scott M Lundberg, and Su-In Lee. 2022. “Explaining a Series of Models by Propagating Shapley Values.” Nature Communications 13 (1): 4512.

Chen, Tianqi, and Carlos Guestrin. 2016. “Xgboost: A Scalable Tree Boosting System.” In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 785–94.

Cooper, Aidan. 2022. “Supervised Clustering: Cluster Analysis Using SHAP Values.” Aidan Cooper. https://www.aidancooper.co.uk/supervised-clustering-shap-values/.

Djeundje, Viani Biatat, and Jonathan Crook. 2018. “Incorporating Heterogeneity and Macroeconomic Variables into Multi-State Delinquency Models for Credit Cards.” European Journal of Operational Research 271 (2): 697–709. https://doi.org/10.1016/j.ejor.2018.05.040.

Fogg, B. J., and Jim Euchner. 2019. “Designing for Behavior Change—New Models and Moral Issues.” Research-Technology Management 62 (5): 14–19.

Fogg, BJ. 2009a. “A Behavior Model for Persuasive Design.” In Proceedings of the 4th International Workshop on Persuasive Technology, 1–7. ACM.

———. 2009b. “Creating Persuasive Technologies: An Eight-Step Design Process.” Persuasive’09. Claremont, California, USA: Stanford University. http://captology.stanford.edu/.

Giuliani, Alessandro, Roberto Savona, Salvatore Carta, Gianmarco Addari, and Alessandro Sebastian Podda. 2024. “Corporate Risk Stratification Through an Interpretable Autoencoder-Based Model.” Computers and Operations Research.

Glaser, Barney G., and Anselm L. Strauss. 1967. The Discovery of Grounded Theory: Strategies for Qualitative Research. New Brunswick, NJ: Transaction Publishers.

Horel, Enguerrand, and Kay Giesecke. 2020. “Significance Tests for Neural Networks.” Journal of Machine Learning Research 21: 1–29.

Hsee, Christopher K, Alex Imas, and Xilin Li. 2024. “How Framing Influences Strategic Interactions.” Management Science, 1–13.

İrsoy, Ozan, Olcay Taner Yıldız, and Ethem Alpaydın. 2012. “Soft Decision Trees.” In 21st International Conference on Pattern Recognition (ICPR 2012), 1819–22. Tsukuba, Japan: IEEE.

Jabareen, Yosef. 2009. “Building a Conceptual Framework: Philosophy, Definitions, and Procedure.” International Journal of Qualitative Methods 8 (4): 49–58.

Janzing, Dominik, Kailash Budhathoki, Lenon Minorics, and Patrick Blobaum. 2019. “[Causal Structure Based Root Cause Analysis of Outliers]().” Preprint.

Kahneman, Daniel. 2011. Thinking, Fast and Slow. macmillan.

Kahneman, Daniel, Dan Lovallo, and Olivier Sibony. 2011. “Before You Make That Big Decision….” Harvard Business Review 89 (6): 50–60.

———. 2019. “A Structured Approach to Strategic Decisions.” MIT Sloan Management Review 67 (Spring): 1–10.

Kahneman, Daniel, and Amos Tversky. 1979. “Prospect Theory: An Analysis of Decision Under Risk.” Econometrica 47 (2): 263–92.

———. 1984. “Choices, Values, and Frames.” American Psychologist 39 (4): 341–50.

Kim, Jang Ho, Seyoung Kim, Yongjae Lee, Woo Chang Kim, and Frank J Fabozzi. 2024. “Enhancing Mean–Variance Portfolio Optimization Through GANs-Based Anomaly Detection.” Annals of Operations Research 235 (1-2): 1–23.

Kumar, Subodha, and Vinod R Singhal. 2024. “Ten Most Influential Papers from the First Thirty Years of the Production and Operations Management Journal.” Production and Operations Management 1: 3.

Landwehr, Niels, Mark Hall, and Eibe Frank. 2005. “Logistic Model Trees.” Machine Learning 59 (1-2): 161–205.

Lemaire, Vincent, Raphaël Féraud, and Nicolas Voisine. 2008. “Contact Personalization Using a Score Understanding Method.” In International Joint Conference on Neural Networks (IJCNN). IEEE.

Liang, Paul Pu, Yun Cheng, Xiang Fan, Chun Kai Ling, Suzanne Nie, Richard Chen, Zihao Deng, Faisal Mahmood, Ruslan Salakhutdinov, and Louis-Philippe Morency. 2023. “Quantifying & Modeling Feature Interactions: An Information Decomposition Framework.” arXiv Preprint arXiv:2302.12247. https://arxiv.org/abs/2302.12247.

Lu, Tian, and Yingjie Zhang. 2023. “1 + 1 > 2? Information, Humans, and Machines.” Arizona State University, Peking University. https://ssrn.com/abstract=4045718.

Lyu, Zhiheng, Zhijing Jin, Fernando Gonzalez, Rada Mihalcea, Bernhard Schölkopf, and Mrinmaya Sachan. 2024. “Do LLMs Think Fast and Slow? A Causal Study on Sentiment Analysis.” In Findings of the Association for Computational Linguistics: EMNLP, 9353–72. Association for Computational Linguistics.

Mantegna, Rosario N., and H. Eugene Stanley. 2000. An Introduction to Econophysics: Correlations and Complexity in Finance. Cambridge, UK: Cambridge University Press.

Quinlan, J. Ross. 1992. “Learning with Continuous Classes.” In Proceedings AI’92 (Adams & Sterling, Eds), 343–48. Singapore: World Scientific.

Redell, Nickalus. 2019. “Shapley Decomposition of r-Squared in Machine Learning Models.” arXiv Preprint arXiv:1908.09718.

Reizinger, Peter, Lorenzo Gresele, Jessy Brady, Johan von Kügelgen, et al. 2022. “Embrace the Gap: VAEs Perform Independent Mechanism Analysis.” arXiv Preprint arXiv:2205.13574.

Strumbelj, Erik, and Igor Kononenko. 2010. “An Efficient Explanation of Individual Classifications Using Game Theory.” Journal of Machine Learning Research 11: 1–18.

Talaei, Nolan M., Asil Oztekin, and Luvai Motiwalla. 2025. “From Rants to Raves: Unraveling Movie Critics’ Reviews with Explainable Artificial Intelligence.” Annals of Operations Research. https://doi.org/10.1007/s10479-025-06484-0.

Tversky, Amos, and Daniel Kahneman. 1974. “Judgment Under Uncertainty: Heuristics and Biases: Biases in Judgments Reveal Some Heuristics of Thinking Under Uncertainty.” Science 185 (4157): 1124–31.

———. 1981. “The Framing of Decisions and the Psychology of Choice.” Science 211 (4481): 453–58.

Vafa, Keyon, Ashesh Rambachan, and Sendhil Mullainathan. 2024. “Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function.” In Proceedings of the 41st International Conference on Machine Learning, 235:to appear. PMLR. PMLR.

Varpio, Lara, Elise Paradis, Sebastian Uijtdehaage, and Meredith Young. 2020. “The Distinctions Between Theory, Theoretical Framework, and Conceptual Framework.” Academic Medicine 95 (7): 989–94.

研究综述：可解释性工作

研究综述系列导航

可解释性工作研究综述

简介

模型方向可解释性

1. 逻辑模型树

2. 生成模型

3. 树模型扩展

4. 关联分析

特征方向可解释性

1. 特征归因方法

2. 特征关系分析

3. 特征工程应用

认知方向可解释性

1. 决策理论

2. 认知偏差机制

3. 可信度研究

4. 其他

参考文献

研究综述系列导航

研究综述：可解释性工作

研究综述 系列导航

简介

模型方向可解释性

1. 逻辑模型树

2. 生成模型

3. 树模型扩展

4. 关联分析

特征方向可解释性

1. 特征归因方法

2. 特征关系分析

3. 特征工程应用

认知方向可解释性

1. 决策理论

2. 认知偏差机制

3. 可信度研究

4. 其他

参考文献

研究综述 系列导航

研究综述系列导航

研究综述系列导航