研究综述：决策学习与优化技术

引言

本文综述了决策学习与优化技术的核心理论、方法及应用场景，涵盖规划算法、效率评估、多目标优化、模拟优化技术、不确定性处理、公平性与伦理等多个方面，旨在为相关领域的研究和实践提供系统性参考。

一、决策学习

基础理论发展

1. 规划算法

1.1 线性规划奠基

Chames, Cooper, and Ferguson (1955)奠定了线性规划理论基础，单纯形法从1952年概念诞生，到1961年正式命名，再到1970年代在管理科学和运筹学中广泛应用(A. Charnes and Cooper 1977)。

单纯形法演进：

A. Charnes and Cooper (1961)将单纯形法应用于管理决策：单纯形法（Simplex Method）通过在满足所有约束条件的情况下，调整决策变量的值，来找到最大化收入的最佳组合。它从一个初始的可行解开始，然后逐步移动到"邻边"的解，直到找到最优解。
Ehrgott (2005)对单纯形法进行了形式化描述，定义了输入（基可行解）和输出（最优基）。

1.2 组合优化问题

Ehrgott (2005)研究了背包问题（多目标）、旅行商问题（TSP）等组合优化问题，并提出了分支定界算法作为解决方案。

背包问题：考虑质量最大和物体颜色多样化。
旅行商问题（TSP）：找到一个通过n个城市的最短旅程。

1.3 扩展方法

Nagurney, Thore, and Pan (1994)对迭代算法与动态规划、Lagrangian乘子法进行了对比分析，并提出了度量经济负外部性的变分不等式理论。

列生成算法：Daş and Gzara (2024) 用于解决大规模问题
内点法（IPM）：Deng et al. (2024) 结合障碍函数与路径追踪
ADMM：Deng et al. (2024) 通过分解子问题交替求解

2. 效率评估

方法	核心贡献	关键文献
Farrell方法	多投入因素效率评估	Farrell (1957)
FDH模型	非参数法，避免生产函数假设	Deprins, Simar, and Tulkens (1984)
DEA模型	多投入多产出相对效率评估	Abraham Charnes, Cooper, and Rhodes (1978)
随机前沿分析（SFA）	区分对称干扰（v_i）与非对称干扰（u_i）	Aigner, Lovell, and Schmidt (1977), Lovell (1995)

Papaioannou and Podinovski (2024) 指出，前人工作即 Deprins, Simar, and Tulkens (1984) 提出的"Free Disposal Hull"可以被理解为一种更为灵活的生产前沿表示方法，它允许在生产过程中存在非凸的生产集。"Free Disposal Hull"可以翻译为"自由处置壳"，其中"自由处置"指的是输入和输出的自由处置性质，"壳"则暗示了这种模型为生产可能性设定了一个边界或"壳"。

Abraham Charnes, Cooper, and Rhodes (1978) 与 Espana et al. (2023) 指出：

Charnes, A. 和 Cooper, W.W. 是数据包络分析（DEA）模型的创始人之一，他们与Rhodes E. 一起在1978年发表了关于决策单元效率测量的研究。
Sadeghi, Toloo, and Sahoo (2025) 介绍了规模经济和范围经济的概念。
Mohsenirad and Triantis (2025) 指出DEA的有效性依赖于DMUs的可比性，即它们共享一个共同的生产技术，即同质性，通过权重限制和聚类分析方法可以缓解这一问题。

3. 多目标优化

3.1 问题特性

Ehrgott (2005) 指出多目标优化问题的核心是目标冲突，解决方案是寻找Pareto最优解。

3.2 解法演进

标量化方法：

加权和法：A. Charnes and Cooper (1977)
ϵ-约束法：Ehrgott (2005) 选择一个目标进行优化，而其他目标则被添加为约束，并且每个目标都有一个上界ϵ。因此，不需要对准则进行加总。
Chebyshev加权法：Ehrgott (2008) 目标是将可行点与理想点之间的距离最小化。

分层决策：

α-strong-weak模型：Lagos and Prokopyev (2023)
多目标规划：Boubaker et al. (2024) 分层逐步实现，实现利润最大和损失最小的多目标规划（MOP）

4. 动态规划与强化学习

决策图框架：Bergman et al. (2016)
强化学习应用：
- Model-Free and Model-Based RL and Jaimungal (2021): 在强化学习（Reinforcement Learning, RL）中，Model-Free（无模型）方法是一种不依赖于环境内部模型的学习方式。与之相对的是Model-Based（有模型）方法，后者通过构建和利用环境模型来预测状态转移和奖励信息，从而选择最优策略。
- Bello et al. (2017) 介绍了一种利用神经网络和强化学习解决组合优化问题的框架。

二、模拟优化技术

(一) 响应曲面法（RSM）

发展脉络

Box and Wilson (1951)：RSM 奠基
Irizarry, Wilson, and Trevino (2001)：算法背景描述，把实验、输入、输出都描述清楚。
Ebru Angün et al. (2002)：多阶段优化实现

关键对比

vs. Taguchi方法：Myers (1999) Khuri and Cornell (1996) 专注方差降低
Compact Model：Den Hertog and Stehouwer (2002) 四步骤框架
Angun and Kleijnen (2024) 用90%分位数替代期望值处理机会约束

(二) 仿真技术

1. 理论核心

分布类型：Law (2015) 连续/离散/经验分布
建模框架：Michael C. Fu et al. (2014) 状态变量+决策变量+目标函数

2. 关键挑战

随机数固定：Donohue, Houck, and Myers (1993)
自相关性：Law (2015) (Ch6)
热启动：Law (2015) (Ch9) Time plot、做 warmups 找出调整系数。
实验设计：Law (2015) (Ch12) AB测试 vs. 阶乘因子

3. 前沿方向

主动学习：Cohn (1993) 早期工作 → Wilson and Sahinidis (2017) 最大化模型误差采样
无导数优化（DFO）：Cozad, Sahinidis, and Miller (2014) AIC/EMS采样逻辑

(三) 排序与选择（R&S）

Nelson (2022)：可行解有限场景
扩展框架：Shen, Hong, and Zhang (2021) R&S + 分类（R&S+C）

三、不确定性处理

(一) VUCA框架

维度	应对方法	代表文献
波动性	SPO框架	Elmachtoub and Grigas (2022)
不确定性	数据驱动鲁棒优化	Dimitris Bertsimas, Gupta, and Kallus (2018)
复杂性	熵度量市场深度	Olbryś and Ostrowski (2021)
模糊性	动态情景树	Geyer and Ziemba (2008), D. Bertsimas and Mundru (2021)

(二) 决策变量交互

替代效应：Feldman et al. (2022) 客户选择行为
互补效应：Michael C. Fu et al. (2024) 公司间微观结构依赖；Tulabandhula, Sinha, and Patidar (2020) choice set

四、公平性与伦理

1. 偏差来源

历史数据偏见：Das et al. (2021) 模型传递歧视
预测技术离散化：Fuster et al. (2022) 资质推断

2. 解决方案

2.1 预处理阶段

特征删除：Kasmi (2021)
重采样

2.2 建模阶段

分群阈值调整：Kleinberg et al. (2018)
数据去偏：Jain et al. (2024) D3M通过识别和移除导致模型在少数群体上表现不佳的训练样本来实现去偏。

2.3 评估指标

统计奇偶性差异：Hort et al. (2024) 包括统计奇偶性差异（Statistical Parity Difference, SPD）、平等机会（Equality of Opportunity）、均等化机会（Equalized Odds）。真实资质相同的客群A与B → 模型因数据/特征偏差导致TPR(A) > TPR(B) → 模型对B的正例识别率低 → 信贷决策中B被判定为高风险 → 利率更高 → 资源向A倾斜，B被不公平对待。
机会均等: D'Amour et al. (2020); L. T. Liu et al. (2018); Hardt, Price, and Srebro (2016) TPR 一致

五、应用场景

1. 工程实现

工具链：阿里巴巴达摩院 (2024) (MindOpt`mindoptpy`), Eckman et al. (2021) (仓库)
流程管理：张鑫航 (2023) 预测→优化→分发三阶段

2. 典型问题

2.1 流量分发

离线分配：Tomlin (2000) 熵平滑
在线分发：Devanur and Hayes (2009) 竞争比率分析

2.2 领域应用

金融：Jaimungal (2021) DRL
物流：Kandula, Roy, and Akartunal? (2024) 包装箱尺寸优化
医疗：林方全 (2023) 能源调度

参考文献

Aigner, Dennis J., C. A. Knox Lovell, and Peter Schmidt. 1977. "Formulation and Estimation of Stochastic Frontier Production Function Models." Journal of Econometrics 6 (1): 21–37.

Angun, Ebru, and Jack Kleijnen. 2024. "The Cost of Risk-Aversion in Inventory Management: An (s,s) Case Study." 2024-005. CentER, Center for Economic Research.

Angün, Ebru, Jack PC Kleijnen, Dick Den Hertog, and Gül Gürkan. 2002. "Response Surface Methodology Revisited." In Proceedings of the 2002 Winter Simulation Conference, edited by Enver Yücesan, Chiang-Huang Chen, James L Snowdon, and John M Charnes, 157–64. Piscataway, NJ: IEEE.

Angün, E., J. Kleijnen, D. D. Hertog, and G. Gürkan. 2009. "Response Surface Methodology with Stochastic Constraints for Expensive Simulation." Journal of the Operational Research Society 60 (6): 735–46.

Aouad, Ali, Adam N. Elmachtoub, Kris J. Ferreira, and Ryan McNellis. 2023. "Market Segmentation Trees." Manufacturing & Service Operations Management 25 (1): 1–19.

Barton, Russell R., and Martin Meckesheimer. 2006. "Chapter 18 Metamodel-Based Simulation Optimization." Handbooks in Operations Research and Management Science 13 (C): 535–74. https://doi.org/10.1016/S0927-0507(06)13018-2.

Beldiceanu, Nicolas, and Helmut Simonis. 2016. "ModelSeeker: Extracting Global Constraint Models from Positive Examples." In Data Mining and Constraint Programming: Foundations of a Cross-Disciplinary Approach, edited by Christian Bessiere, Luc De Raedt, Lars Kotthoff, Siegfried Nijssen, Barry O’Sullivan, and Dino Pedreschi, 77–95. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-50137-6_4.

Bello, Irwan, Hieu Pham, Quoc V. Le, Mohammad Norouzi, and Samy Bengio. 2017. "Neural Combinatorial Optimization with Reinforcement Learning." https://arxiv.org/abs/1611.09940.

Bergman, David, Andre A. Cire, Willem-Jan van Hoeve, and John Hooker. 2016. Decision Diagrams for Optimization. Springer.

Bertsimas, Dimitris, Vishal Gupta, and Nathan Kallus. 2018. "Data-Driven Robust Optimization." Mathematical Programming 167 (1-2): 235–92.

Bertsimas, Dimitris, and Nathan Kallus. 2019. "From Predictive to Prescriptive Analytics." Management Science 65 (11): 5072–95.

Bertsimas, Dimitris, and Cheol Woo Kim. "A Machine Learning Approach to Two-Stage Adaptive Robust Optimization." European Journal of Operations Research. https://arxiv.org/abs/arXiv:2307.12409v1 [cs.LG].

Bertsimas, Dimitris, and Berk Öztürk. 2019. "Global Optimization via Optimal Decision Trees." Journal of Global Optimization, 1–41.

Bertsimas, Dimitris, and Bartolomeo Stellato. 2021. "The Voice of Optimization." Machine Learning 110 (2): 249–77.

Bertsimas, D., and N. Mundru. 2021. "Optimization-Based Scenario Reduction for Data-Driven Two-Stage Stochastic Optimization." Operations Research 71 (4): 1343–61.

Bessiere, Christian, Luc De Raedt, Lars Kotthoff, Siegfried Nijssen, Barry O’Sullivan, and Dino Pedreschi, eds. n.d. Data Mining and Constraint Programming: Foundations of a Cross-Disciplinary Approach. Lecture Notes in Computer Science. Springer Cham. https://doi.org/10.1007/978-3-319-50137-6.

Bharadwaj, Vijay, Peiji Chen, Wenjing Ma, Chandrashekhar Nagarajan, John Tomlin, Sergei Vassilvitskii, Erik Vee, and Jian Yang. 2012. "SHALE: An Efficient Algorithm for Allocation of Guaranteed Display Advertising." arXiv Preprint arXiv:1203.3619, March. https://arxiv.org/abs/1203.3619.

Boubaker, Sabri, Tu D. Q. Le, Riadh Manita, and Thanh Ngo. 2024. "Balancing Bank Profits and Nonperforming Loans: A Multiple Objective Programming Approach." Annals of Operations Research. https://doi.org/10.1007/s10479-024-05831-x.

Box, George EP, and Kenneth B Wilson. 1951. "On the Experimental Attainment of Optimum Conditions." Journal of the Royal Statistical Society 13 (1): 1–45.

Chames, A., W. W. Cooper, and R. O. Ferguson. 1955. "Optimal Estimation of Executive Compensation by Linear Programming." Management Science I: 138–51.

Charnes, Abraham, William W Cooper, and Edwardo Rhodes. 1978. "Measuring the Efficiency of Decision Making Units." European Journal of Operational Research 2 (6): 429–44.

Charnes, A., and W. W. Cooper. 1961. Management Models and Industrial Applications of Linear Programming. New York: John Wiley & Sons.

———. 1977. "Goal Programming and Multiple Objectives Optimization: Part i." European Journal of Operational Research 1: 39–54.

Chen, Ye, Pavel Berkhin, Bo Anderson, and Nikhil R. Devanur. 2011. "Real-Time Bidding Algorithms for Performance-Based Display Ad Allocation." In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego, CA: ACM.

Cohn, David. 1993. "Neural Network Exploration Using Optimal Experiment Design." In Advances in Neural Information Processing Systems, edited by J. Cowan, G. Tesauro, and J. Alspector. Vol. 6. Morgan-Kaufmann. https://proceedings.neurips.cc/paper_files/paper/1993/file/d840cc5d906c3e9c84374c8919d2074e-Paper.pdf.

Cooper, William W., and Andrew B. Whinston. 1994. New Directions in Computational Economics. Boston: Kluwer Academic Publishers.

Cozad, Alison, Nikolaos V. Sahinidis, and David C. Miller. 2014. "Learning Surrogate Models for Simulation-Based Optimization." AIChE Journal 60 (7): 2211–27. https://doi.org/10.1002/aic.14418.

D’Amour, Alexander, Hansa Srinivasan, James Atwood, Pallavi Baljekar, D. Sculley, and Yoni Halpern. 2020. "Fairness Is Not Static: Deeper Understanding of Long Term Fairness via Simulation Studies." In Conference on Fairness, Accountability, and Transparency (FAT* ’20), 10. ACM. https://doi.org/10.1145/3351095.3372878.

Das, Sanjiv, Michele Donini, Jason Gelman, Kevin Haas, Mila Hardt, Jared Katzman, Krishnaram Kenthapadi, Pedro Larroy, Pinar Yilmaz, and Bilal Zafar. 2021. "Fairness Measures for Machine Learning in Finance."

Das, Sanjiv, Richard Stanton, and Nancy Wallace. 2023. "Algorithmic Fairness." Journal Article. Annual Review of Financial Economics 15 (Volume 15, 2023): 565–93. https://doi.org/https://doi.org/10.1146/annurev-financial-110921-125930.

Daş, Gülesin Sena, and Fatma Gzara. 2024. "Column Generation Based Solution for Bi-Objective Gate Assignment Problems." Mathematical Methods of Operations Research, 1–29.

Den Hertog, D, and HP Stehouwer. 2002. "Optimizing Color Picture Tubes by High-Cost Nonlinear Programming." Eur J Opl Res 140 (2): 197–211.

Deng, Qi, Qing Feng, Wenzhi Gao, Dongdong Ge, Bo Jiang, Yuntian Jiang, Jingsong Liu, et al. 2024. "An Enhanced Alternating Direction Method of Multipliers-Based Interior Point Method for Linear and Conic Optimization." INFORMS Journal on Computing.

Deprins, Dominique, Leopold Simar, and Henry Tulkens. 1984. "Measuring Labor-Efficiency in Post Offices." In The Performance of Public Enterprises: Concepts and Measurement, edited by M. Marchand, P. Pestieau, and H. Tulkens, 243–68. North-Holland.

Devanur, Nikhil R., and Thomas P. Hayes. 2009. "The Adwords Problem: Online Keyword Matching with Budgeted Bidders Under Random Permutations." In Proceedings of the EC’09, 5–10. Stanford, California, USA: ACM; ACM. https://doi.org/10.1145/1566374.1566406.

Ding, Wenhao, Chejian Xu, Mansur Arief, Haohong Lin, Bo Li, and Ding Zhao. 2023. "A Survey on Safety-Critical Driving Scenario Generation-a Methodological Perspective." IEEE Transactions on Intelligent Transportation Systems. https://doi.org/10.1109/TITS.2023.XXXXXXX.

Donohue, Joan M, Ernest C Houck, and Raymond H Myers. 1993. "An Interactive Approach to Multi-Response Simulation Optimization." Management Science 39 (8): 929–42.

Eckman, Brian, Jordan Jalving, and Juan Pablo Vielma. 2021. "JuMP: A Modeling Language for Mathematical Optimization." SIAM Review 63 (2): 295–320.

Ehrgott, Matthias. 2005. Multicriteria Optimization. 2nd ed. Springer.

———. 2008. "Multicriteria Optimization." In International Series in Operations Research & Management Science, 102:1–37. Springer US.

Eltantawy, Sondos, and Christine M. Anderson-Cook. 2018. "A Tutorial on Design of Experiments for Simulation: A Systematic and Efficient Approach to Sensitivity Analysis." Quality Engineering 30 (1): 116–33.

Elmachtoub, Adam N., and Thanasis Stathopoulos. 2020. "Data-Driven Optimization of Stochastic Systems with Statistical Guarantees." Management Science 66 (10): 4381–4402.

Elmachtoub, Adam N., and Georgios B. Grigas. 2022. "Smart "Predict, then Optimize"." Management Science 68 (8): 5780–5804.

Espana, Manuela, José L. Zofío, and Laureano F. Escudero. 2023. "A Comprehensive Guide to Data Envelopment Analysis." European Journal of Operational Research 304 (3): 801–16.

Feldman, Jacob, Yiding Feng, Suvrit Sra, and Dylan J. Foster. 2022. "Efficiently Learning Market Equilibria from Samples." Advances in Neural Information Processing Systems 35: 27431–43.

Feng, Q., Q. Deng, W. Gao, D. Ge, B. Jiang, Y. Jiang, J. Liu, et al. 2024. "An Enhanced Alternating Direction Method of Multipliers-Based Interior Point Method for Linear and Conic Optimization." INFORMS Journal on Computing.

Farrell, M. J. 1957. "The Measurement of Productive Efficiency." Journal of the Royal Statistical Society. Series A (General) 120 (3): 253–81.

Fuster, Andreas, Paul Goldsmith-Pinkham, and Maximilian Goldzmidt. 2022. "Consumer Credit: Machine Learning, Bias, and Fairness." Annual Review of Economics 14 (1): 265–90.

Geyer, Axel, and William T. Ziemba. 2008. "Optimal Versus Naive Diversification: How Inefficient is the 1/N Portfolio Strategy?" Review of Financial Studies 21 (5): 1915–53.

Hardt, Moritz, Eric Price, and Nati Srebro. 2016. "Equality of Opportunity in Supervised Learning." In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 3315–23. Curran Associates, Inc.

Hort, M., T. Kammerlander, F. Pallasch, and J. Pöschko. 2024. "Bias Mitigation for Machine Learning Models in Credit Scoring: A Systematic Literature Review." European Journal of Operational Research.

Irizarry, Rafael, James R. Wilson, and Jorge Trevino. 2001. "Response Surface Methodology for Stochastic Simulation: A Review." In Proceedings of the 2001 Winter Simulation Conference, edited by B. Peters, J. Smith, D. Medeiros, and M. Rohrer, 143–50. Piscataway, NJ: IEEE.

Jain, Saurabh, Byron C. Wallace, and Hoda Heidari. 2024. "Dataset Debugging: A Machine Learning Approach to Identifying and Mitigating Performance Disparities." Journal of Machine Learning Research 25 (23): 1–40.

Jaimungal, S. 2021. "Deep Reinforcement Learning for Optimal Execution." Journal of Machine Learning Research 22 (1): 1–42.

Kandula, S., S. Roy, and B. Akartunali. 2024. "A Data-Driven Approach for Optimizing Package Sizes in Logistics Networks." Transportation Science.

Kasmi, Sarah. 2021. "Feature Selection for Fair Classification: A Survey and New Perspectives." arXiv Preprint arXiv:2107.00683.

Kleinberg, Jon, Sendhil Mullainathan, and Manish Raghavan. 2018. "Inherent Trade-Offs in the Fair Determination of Risk Scores." 8 (1): 1–45.

Khuri, A. I., and J. A. Cornell. 1996. Response Surfaces: Designs and Analyses. 2nd ed. New York: Marcel Dekker.

Lagos, Francisco J., and Oleg Prokopyev. 2023. "Multi-Objective Optimization with α-Strong and α-Weak Pareto Optimal Solutions." Operations Research Letters 51 (2): 159–64.

Law, Averill M. 2015. Simulation Modeling and Analysis. 5th ed. McGraw-Hill.

林方全. 2023. "医疗资源调度中的优化算法研究." 中国运筹学会会刊 11 (2): 45–62.

Liu, L. T., M. Sim, and K. T. Teo. 2018. "Distributionally Robust Optimization: A Review." INFORMS Journal on Computing 30 (3): 433–51.

Lovell, C. A. Knox. 1995. "Production Frontiers and Productive Efficiency." In Handbook of Applied Economic Statistics, edited by A. Ullah and D. E. A. Giles, 561–98. New York: Marcel Dekker.

Michael C. Fu, Jianqiang Hu, and Yijie Peng. 2014. "Simulation Optimization: A Review of Algorithms and Applications." Handbook of Simulation Optimization, 1–32.

Michael C. Fu, Yijie Peng, and Rong Qu. 2024. "Simulation Optimization for Stochastic Systems: A Review and New Perspectives." INFORMS Journal on Computing.

Mohsenirad, S. M., and K. Triantis. 2025. "Addressing Heterogeneity in Data Envelopment Analysis: A Survey and New Approaches." European Journal of Operational Research.

Myers, Raymond H. 1999. "Response Surface Methodology: A Retrospective and Literature Survey." Journal of Quality Technology 31 (1): 30–45.

Nagurney, Anna, Paul Thore, and Jianhua Pan. 1994. "A Comparison of Alternative Algorithms for Spatial Price Equilibrium Problems." In New Directions in Computational Economics, edited by William W Cooper and Andrew B Whinston, 199–224. Boston: Kluwer Academic Publishers.

Nelson, Barry L. 2022. "Selecting the Best System." In Handbook of Simulation Optimization, edited by S. Andradóttir, K. J. Healey, B. L. Nelson, and S. Pasupathy, 353–78. Springer International Publishing.

Olbryś, Anna, and Adam Ostrowski. 2021. "Entropy as a Measure of Market Depth in Order-Driven Markets." Entropy 23 (3): 310.

Papaioannou, E., and V. V. Podinovski. 2024. "Free Disposal Hull and Data Envelopment Analysis: A Historical Perspective and New Developments." European Journal of Operational Research.

Sadeghi, S., M. Toloo, and B. K. Sahoo. 2025. "Economies of Scale and Scope in Data Envelopment Analysis: A Critical Review." Omega.

Shen, Z., L. Hong, and B. L. Nelson. 2021. "A Classification-Based Approach to Ranking and Selection." Operations Research 69 (3): 823–40.

Tomlin, John. 2000. "The Price of Anarchy in Supply Chains with Linear Costs." Operations Research 48 (5): 785–95.

Tulabandhula, Theja, Saurabh Sinha, and Sachin Patidar. 2020. "Choice Set Generation for Assortment Optimization Under the Multinomial Logit Model." Management Science 66 (3): 1046–67.

Wilson, James R., and Nikolaos V. Sahinidis. 2017. "Adaptive Sampling for Simulation Optimization with Stochastic Constraints." INFORMS Journal on Computing 29 (2): 255–69.

张鑫航. 2023. "智能决策系统中的预测-优化-分发框架研究." 计算机学报 46 (5): 901–18.