Machine Learning Models for Optimizing Online Order Fulfillment - Forecasting Lead Time and Late Delivery Risk

Document Type : Research Paper

Authors

1 Lecturer, Department of Mechatronics & Industrial Engineering, Chittagong University of Engineering & Technology

2 Department of Mechatronics & Industrial Engineering, Chittagong University of Engineering & Technology, Bangladesh.

3 Assistant Professor, Department of Mechatronics & Industrial Engineering, Chittagong University of Engineering & Technology

4 Department of Mechatronics & Industrial Engineering, Chittagong University of Engineering & Technology

Abstract

Objective: With an emphasis on evaluating delivery lead times and anticipating late delivery risks, this study investigates the application of machine learning models to anticipate delivery performance in the e-commerce industry.
Methods: The DataCo Smart Supply Chain dataset, which contains a variety of order fulfillment attributes, was used to train and evaluate several models, including Linear Regression, Decision Tree, Random Forest, and XGBoost.
Results: The results demonstrate that XGBoost outperforms competing models in both regression and classification tests. The model achieved an R-squared value of 0.70 and a root mean square error (RMSE) of 0.88 days in forecasting delivery lead time. The categorization of late delivery risk achieved an accuracy of 0.89, precision of 0.92, recall of 0.89, and an F1-score of 0.90. The analysis of feature importance revealed that the chosen shipping method is the foremost predictor of both delivery time and the likelihood of late delivery, followed by order status and latitude for predicting late delivery risk, and latitude in conjunction with cycle time features for predicting delivery time.
Conclusion: These findings underscore the significant potential of machine learning to enhance delivery performance predictions in e-commerce, enabling companies to set realistic delivery expectations, optimize logistics operations, and proactively mitigate the risk of late deliveries. This research enhances the domain of data-driven supply chain management and emphasizes the importance of accurate delivery predictions for success in the competitive online retail landscape.

Keywords


Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Carbonneau, R., Laframboise, K., & Vahidov, R. (2018). Application of machine learning techniques for supply chain demand forecasting. European Journal of Operational Research, 184(3), 1140–1154. https://doi.org/10.1016/j.ejor.2006.12.004 
Chen, T., & Guestrin, C. (2016, August). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). ACM. https://doi.org/10.1145/2939672.2939785
Chopra, S., & Meindl, P. (2016). Supply chain management: Strategy, planning, and operation (6th ed.). Pearson Education.
Choudhury, S., Singh, R., & Kumar, A. (2022). Predicting delivery delays in e-commerce supply chains using gradient boosting models. International Journal of Production Research, 60(18), 5560–5575. https://doi.org/10.1080/00207543.2021.2011442
Constante, F., Silva, F., & Pereira, A. (2019). DataCo SMART SUPPLY CHAIN FOR BIG DATA ANALYSIS [Data set]. Mendeley Data. https://doi.org/10.17632/8gx2fvg2k6.5
Davis-Sramek, B., Hopkins, C. D., & Richey, R. G., Jr. (2023). The new dynamics of customer service: Satisfaction and retention in the age of e-commerce. Journal of Business Logistics, 44(1), 57–80. https://doi.org/10.1111/jbl.12296 
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874. https://doi.org/10.1016/j.patrec.2005.10.010 
Fierro, A., Tordecilla, R. D., Juan, A. A., & Serra, I. (2018). A simheuristic algorithm for stochastic inventory routing problems in e-commerce. In Proceedings of the 2018 Winter Simulation Conference (pp. 3256–3267). IEEE. https://doi.org/10.1109/WSC.2018.8632295 
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
Gzara, F., Su, J. S., & Nasiri, M. M. (2023). Designing e-commerce logistics networks for time-definite delivery. Transportation Research Part E: Logistics and Transportation Review, 170, 103010. https://doi.org/10.1016/j.tre.2022.103010 
Ho, T. K. (1995). Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition (Vol. 1, pp. 278–282). IEEE. https://doi.org/10.1109/ICDAR.1995.598994 
Huang, B., Li, Q., Zhao, X., & Zhong, Y. (2019). A data-driven approach to identify critical factors affecting delivery time for truckload shipments. Transportation Research Part E: Logistics and Transportation Review, 128, 289–305. https://doi.org/10.1016/j.tre.2019.06.011 
Li, B., Wang, X., & Wang, S. (2021). Dynamic delivery time quotation in e-commerce considering customer behavior. Electronic Commerce Research and Applications, 45, 101015. https://doi.org/10.1016/j.elerap.2020.101015 
Li, J., Zhang, Z., & Wang, Y. (2019). A hybrid LSTM-XGBoost model for delivery time prediction in supply chain management. Expert Systems with Applications, 136, 1–10. https://doi.org/10.1016/j.eswa.2019.06.012
Lin, C. C., Chen, C. W., & Chen, C. Y. (2019). A machine learning approach for routing optimization with heterogeneous delivery fleet. International Journal of Production Economics, 215, 63–75. https://doi.org/10.1016/j.ijpe.2018.07.005 
Liu, X., Liu, Y., Liu, N., & Zhang, J. (2023). Data-driven robust aggregate production planning considering delivery time and demand uncertainty. International Journal of Production Economics, 257, 108742. https://doi.org/10.1016/j.ijpe.2023.108742
Mahmoud Jaafarnejad, S., Sorkheh, B., Bavrsad, & Neysi, A. H. (2025). Investigating and ranking the factors affecting integrated supply chain performance in context of Industry 4.0 by using fuzzy ANP method. Management Science and Information Technology, 2(1), 70–89.
Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to linear regression analysis (5th ed.). Wiley.
Nguyen, T. T., Pham, H. T., & Le, D. T. (2021). Machine learning-based delivery time prediction in logistics IoT systems. IEEE Access, 9, 123456–123469. https://doi.org/10.1109/ACCESS.2021.3105562
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Powers, D. M. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2(1), 37–63.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. https://doi.org/10.1007/BF00116251
Samvedi, A., & Jain, V. (2018). Time series based approach to predict supply chain lead time. Journal of Manufacturing Technology Management, 29(1), 108–130. https://doi.org/10.1108/JMTM-03-2017-0049 
Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002 
Wang, X., Li, X., & Leung, S. C. (2019). Home delivery service: Impact of delivery performance and customer satisfaction. International Journal of Production Economics, 208, 526–536. https://doi.org/10.1016/j.ijpe.2018.12.010 
Wu, C., & Chen, Y. (2020). Delivery delay prediction in courier services using decision trees and random forests. Transportation Research Part E: Logistics and Transportation Review, 138, 101959. https://doi.org/10.1016/j.tre.2020.101959
Yu, Y., Wang, X., & Zhong, R. Y. (2017). E-commerce logistics in supply chain management: Practice perspective. Procedia CIRP, 52, 179–185. https://doi.org/10.1016/j.procir.2016.11.002