lynx   »   [go: up one dir, main page]

IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v324y2025i1p104-117.html
   My bibliography  Save this article

Deep Controlled Learning for Inventory Control

Author

Listed:
  • Temizöz, Tarkan
  • Imdahl, Christina
  • Dijkman, Remco
  • Lamghari-Idrissi, Douniel
  • van Jaarsveld, Willem
Abstract
The application of Deep Reinforcement Learning (DRL) to inventory management is an emerging field. However, traditional DRL algorithms, originally developed for diverse domains such as game-playing and robotics, may not be well-suited for the specific challenges posed by inventory management. Consequently, these algorithms often fail to outperform established heuristics; for instance, no existing DRL approach consistently surpasses the capped base-stock policy in lost sales inventory control. This highlights a critical gap in the practical application of DRL to inventory management: the highly stochastic nature of inventory problems requires tailored solutions. In response, we propose Deep Controlled Learning (DCL), a new DRL algorithm designed for highly stochastic problems. DCL is based on approximate policy iteration and incorporates an efficient simulation mechanism, combining Sequential Halving with Common Random Numbers. Our numerical studies demonstrate that DCL consistently outperforms state-of-the-art heuristics and DRL algorithms across various inventory settings, including lost sales, perishable inventory systems, and inventory systems with random lead times. DCL achieves lower average costs in all test cases while maintaining an optimality gap of no more than 0.2%. Remarkably, this performance is achieved using the same hyperparameter set across all experiments, underscoring the robustness and generalizability of our approach. These findings contribute to the ongoing exploration of tailored DRL algorithms for inventory management, providing a foundation for further research and practical application in this area.

Suggested Citation

  • Temizöz, Tarkan & Imdahl, Christina & Dijkman, Remco & Lamghari-Idrissi, Douniel & van Jaarsveld, Willem, 2025. "Deep Controlled Learning for Inventory Control," European Journal of Operational Research, Elsevier, vol. 324(1), pages 104-117.
  • Handle: RePEc:eee:ejores:v:324:y:2025:i:1:p:104-117
    DOI: 10.1016/j.ejor.2025.01.026
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377221725000463
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2025.01.026?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Alexander L. Stolyar & Qiong Wang, 2022. "Exploiting Random Lead Times for Significant Inventory Cost Savings," Operations Research, INFORMS, vol. 70(4), pages 2496-2516, July.
    2. Gérard P. Cachon & Karan Girotra & Serguei Netessine, 2020. "Interesting, Important, and Impactful Operations Management," Manufacturing & Service Operations Management, INFORMS, vol. 22(1), pages 214-222, January.
    3. De Moor, Bram J. & Gijsbrechts, Joren & Boute, Robert N., 2022. "Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management," European Journal of Operational Research, Elsevier, vol. 301(2), pages 535-545.
    4. Itir Z. Karaesmen & Alan Scheller–Wolf & Borga Deniz, 2011. "Managing Perishable and Aging Inventories: Review and Future Research Directions," International Series in Operations Research & Management Science, in: Karl G. Kempf & Pınar Keskinocak & Reha Uzsoy (ed.), Planning Production and Inventories in the Extended Enterprise, chapter 0, pages 393-436, Springer.
    5. Xueying Ding & Xiao Liao & Wei Cui & Xiangliang Meng & Ruosong Liu & Qingshan Ye & Donghe Li, 2024. "A Deep Reinforcement Learning Optimization Method Considering Network Node Failures," Energies, MDPI, vol. 17(17), pages 1-13, September.
    6. Linwei Xin, 2021. "Technical Note—Understanding the Performance of Capped Base-Stock Policies in Lost-Sales Inventory Models," Operations Research, INFORMS, vol. 69(1), pages 61-70, January.
    7. Wang, Yihua & Minner, Stefan, 2024. "Deep reinforcement learning for demand fulfillment in online retail," International Journal of Production Economics, Elsevier, vol. 269(C).
    8. Jinzhi Bu & Xiting Gong & Xiuli Chao, 2023. "Asymptotic Optimality of Base-Stock Policies for Perishable Inventory Systems," Management Science, INFORMS, vol. 69(2), pages 846-864, February.
    9. Linwei Xin & David A. Goldberg, 2016. "Optimality Gap of Constant-Order Policies Decays Exponentially in the Lead Time for Lost Sales Models," Operations Research, INFORMS, vol. 64(6), pages 1556-1565, December.
    10. Dehaybe, Henri & Catanzaro, Daniele & Chevalier, Philippe, 2024. "Deep Reinforcement Learning for inventory optimization with non-stationary uncertain demand," European Journal of Operational Research, Elsevier, vol. 314(2), pages 433-445.
    11. Haijema, René & Minner, Stefan, 2019. "Improved ordering of perishables: The value of stock-age information," International Journal of Production Economics, Elsevier, vol. 209(C), pages 316-324.
    12. Xiuli Chao & Xiting Gong & Cong Shi & Chaolin Yang & Huanan Zhang & Sean X. Zhou, 2018. "Approximation Algorithms for Capacitated Perishable Inventory Systems with Positive Lead Times," Management Science, INFORMS, vol. 64(11), pages 5038-5061, November.
    13. James R. Bradley & Lawrence W. Robinson, 2005. "Improved Base-Stock Approximations for Independent Stochastic Lead Times with Order Crossover," Manufacturing & Service Operations Management, INFORMS, vol. 7(4), pages 319-329, November.
    14. Boute, Robert N. & Gijsbrechts, Joren & van Jaarsveld, Willem & Vanvuchelen, Nathalie, 2022. "Deep reinforcement learning for inventory control: A roadmap," European Journal of Operational Research, Elsevier, vol. 298(2), pages 401-412.
    15. Wei Chen & Milind Dawande & Ganesh Janakiraman, 2014. "Fixed-Dimensional Stochastic Dynamic Programs: An Approximation Scheme and an Inventory Application," Operations Research, INFORMS, vol. 62(1), pages 81-103, February.
    16. Volodymyr Mnih & Koray Kavukcuoglu & David Silver & Andrei A. Rusu & Joel Veness & Marc G. Bellemare & Alex Graves & Martin Riedmiller & Andreas K. Fidjeland & Georg Ostrovski & Stig Petersen & Charle, 2015. "Human-level control through deep reinforcement learning," Nature, Nature, vol. 518(7540), pages 529-533, February.
    17. Kaynov, Illya & van Knippenberg, Marijn & Menkovski, Vlado & van Breemen, Albert & van Jaarsveld, Willem, 2024. "Deep Reinforcement Learning for One-Warehouse Multi-Retailer inventory management," International Journal of Production Economics, Elsevier, vol. 267(C).
    18. Disney, Stephen M. & Maltz, Arnold & Wang, Xun & Warburton, Roger D.H., 2016. "Inventory management for stochastic lead times with order crossovers," European Journal of Operational Research, Elsevier, vol. 248(2), pages 473-486.
    19. Thomas E. Morton, 1971. "The Near-Myopic Nature of the Lagged-Proportional-Cost Inventory Problem with Lost Sales," Operations Research, INFORMS, vol. 19(7), pages 1708-1716, December.
    20. Steven Nahmias, 1975. "Optimal Ordering Policies for Perishable Inventory—II," Operations Research, INFORMS, vol. 23(4), pages 735-749, August.
    21. Woonghee Tim Huh & Ganesh Janakiraman & John A. Muckstadt & Paat Rusmevichientong, 2009. "Asymptotic Optimality of Order-Up-To Policies in Lost Sales Inventory Systems," Management Science, INFORMS, vol. 55(3), pages 404-420, March.
    22. Morris A. Cohen & Dov Pekelman, 1978. "LIFO Inventory Systems," Management Science, INFORMS, vol. 24(11), pages 1150-1162, July.
    23. Powell, Warren B., 2019. "A unified framework for stochastic optimization," European Journal of Operational Research, Elsevier, vol. 275(3), pages 795-821.
    24. Marcus Ang & Karl Sigman & Jing-Sheng Song & Hanqin Zhang, 2017. "Closed-Form Approximations for Optimal ( r , q ) and ( S , T ) Policies in a Parallel Processing Environment," Operations Research, INFORMS, vol. 65(5), pages 1414-1428, October.
    25. Lotte van Hezewijk & Nico Dellaert & Tom Van Woensel & Noud Gademann, 2023. "Using the proximal policy optimisation algorithm for solving the stochastic capacitated lot sizing problem," International Journal of Production Research, Taylor & Francis Journals, vol. 61(6), pages 1955-1978, March.
    26. Joren Gijsbrechts & Robert N. Boute & Jan A. Van Mieghem & Dennis J. Zhang, 2022. "Can Deep Reinforcement Learning Improve Inventory Management? Performance on Lost Sales, Dual-Sourcing, and Multi-Echelon Problems," Manufacturing & Service Operations Management, INFORMS, vol. 24(3), pages 1349-1368, May.
    27. Paul Zipkin, 2008. "Old and New Methods for Lost-Sales Inventory Systems," Operations Research, INFORMS, vol. 56(5), pages 1256-1263, October.
    28. Kumar Muthuraman & Sridhar Seshadri & Qi Wu, 2015. "Inventory Management with Stochastic Lead Times," Mathematics of Operations Research, INFORMS, vol. 40(2), pages 302-327, February.
    29. David A. Goldberg & Dmitriy A. Katz-Rogozhnikov & Yingdong Lu & Mayank Sharma & Mark S. Squillante, 2016. "Asymptotic Optimality of Constant-Order Policies for Lost Sales Inventory Models with Large Lead Times," Mathematics of Operations Research, INFORMS, vol. 41(3), pages 898-913, August.
    30. Francesco Stranieri & Fabio Stella & Chaaben Kouki, 2024. "Performance of deep reinforcement learning algorithms in two-echelon inventory control systems," International Journal of Production Research, Taylor & Francis Journals, vol. 62(17), pages 6211-6226, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Zihao & Wang, Wenlong & Liu, Tianjun & Chang, Jasmine & Shi, Jim, 2025. "IoT-driven dynamic replenishment of fresh produce in the presence of seasonal variations: A deep reinforcement learning approach using reward shaping," Omega, Elsevier, vol. 134(C).
    2. Jinzhi Bu & Xiting Gong & Xiuli Chao, 2023. "Asymptotic Optimality of Base-Stock Policies for Perishable Inventory Systems," Management Science, INFORMS, vol. 69(2), pages 846-864, February.
    3. Li, Zhaolin (Erick) & Liang, Guitian & Fu, Qi (Grace) & Teo, Chung-Piaw, 2023. "Base-Stock Policies with Constant Lead Time: Closed-Form Solutions and Applications," Working Papers BAWP-2023-01, University of Sydney Business School, Discipline of Business Analytics.
    4. Ding, Jingying & Peng, Zhenkang, 2024. "Heuristics for perishable inventory systems under mixture issuance policies," Omega, Elsevier, vol. 126(C).
    5. Hailun Zhang & Jiheng Zhang & Rachel Q. Zhang, 2020. "Simple Policies with Provable Bounds for Managing Perishable Inventory," Production and Operations Management, Production and Operations Management Society, vol. 29(11), pages 2637-2650, November.
    6. Jake Clarkson & Michael A. Voelkel & Anna‐Lena Sachs & Ulrich W. Thonemann, 2023. "The periodic review model with independent age‐dependent lifetimes," Production and Operations Management, Production and Operations Management Society, vol. 32(3), pages 813-828, March.
    7. van Hezewijk, Lotte & Dellaert, Nico P. & van Jaarsveld, Willem L., 2025. "Scalable deep reinforcement learning in the non-stationary capacitated lot sizing problem," International Journal of Production Economics, Elsevier, vol. 284(C).
    8. De Moor, Bram J. & Gijsbrechts, Joren & Boute, Robert N., 2022. "Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management," European Journal of Operational Research, Elsevier, vol. 301(2), pages 535-545.
    9. Fleuren, Tijn, 2025. "Stochastic approaches for production-inventory planning : Applications to high-tech supply chains," Other publications TiSEM 1fe1bbe5-fd90-4077-8606-d, Tilburg University, School of Economics and Management.
    10. Jinzhi Bu & Xiting Gong & Dacheng Yao, 2019. "Technical Note—Constant-Order Policies for Lost-Sales Inventory Models with Random Supply Functions: Asymptotics and Heuristic," Operations Research, INFORMS, vol. 68(4), pages 1063-1073, July.
    11. Pahr, Alexander & Grunow, Martin & Amorim, Pedro, 2025. "Learning from the aggregated optimum: Managing port wine inventory in the face of climate risks," European Journal of Operational Research, Elsevier, vol. 323(2), pages 671-685.
    12. Ralfs, Jana & Pham, Dai T. & Kiesmüller, Gudrun P., 2025. "Optimal outbound shipment policy for an inventory system with advance demand information," European Journal of Operational Research, Elsevier, vol. 324(1), pages 92-103.
    13. Cui, Geng & Imura, Naoto & Nishinari, Katsuhiro & Ezaki, Takahiro, 2025. "On order smoothing interpolating the order-up-to and constant order policies," Omega, Elsevier, vol. 136(C).
    14. Linwei Xin, 2021. "Technical Note—Understanding the Performance of Capped Base-Stock Policies in Lost-Sales Inventory Models," Operations Research, INFORMS, vol. 69(1), pages 61-70, January.
    15. Yanyi Xu & Sang-Phil Kim & Arnab Bisi & Maqbool Dada & Suresh Chand, 2018. "Base-Stock Models for Lost Sales: A Markovian Approach," Purdue University Economics Working Papers 1305, Purdue University, Department of Economics.
    16. Hansen, Ole & Transchel, Sandra & Friedrich, Hanno, 2023. "Replenishment strategies for lost sales inventory systems of perishables under demand and lead time uncertainty," European Journal of Operational Research, Elsevier, vol. 308(2), pages 661-675.
    17. Linwei Xin & David A. Goldberg, 2016. "Optimality Gap of Constant-Order Policies Decays Exponentially in the Lead Time for Lost Sales Models," Operations Research, INFORMS, vol. 64(6), pages 1556-1565, December.
    18. Xiuli Chao & Xiting Gong & Cong Shi & Chaolin Yang & Huanan Zhang & Sean X. Zhou, 2018. "Approximation Algorithms for Capacitated Perishable Inventory Systems with Positive Lead Times," Management Science, INFORMS, vol. 64(11), pages 5038-5061, November.
    19. Verleijsdonk, Peter & van Jaarsveld, Willem & Kapodistria, Stella, 2024. "Scalable policies for the dynamic traveling multi-maintainer problem with alerts," European Journal of Operational Research, Elsevier, vol. 319(1), pages 121-134.
    20. Shouchang Chen & Yanzhi Li & Yi Yang & Weihua Zhou, 2021. "Managing Perishable Inventory Systems with Age‐differentiated Demand," Production and Operations Management, Production and Operations Management Society, vol. 30(10), pages 3784-3799, October.

    More about this item

    Keywords

    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:324:y:2025:i:1:p:104-117. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.
    Лучший частный хостинг