734 - Predictive Algorithm for Child Undernutrition: Performance over 10-years
Friday, April 25, 2025
5:30pm – 7:45pm HST
Publication Number: 734.6483
Binoy V. Shah, University of Massachusetts Medical School, Worcester, MA, United States; Andres Colubri, University of Massachusetts Medical School, Worcester, MA, United States; Nisha Fahey, UMass Chan Medical School, Worcester, MA, United States; Somashekhar M.. Nimbalkar, Bhaikaka University, Karamsad, Gujarat, India; Apurv Soni, University of Massachusetts Medical School, Worcester, MA, United States
Assistant Professor and Director University of Massachusetts Medical School Worcester, Massachusetts, United States
Background: More than a third of India’s children suffer some form of undernutrition and, despite a reduction in these indicators over the past decade, India is projected to fall short of the Sustainable Development Goals (SDG) targets set for 2030. The modest improvement in the metrics is also uneven such that disparities across socioeconomic class have widened. Thus, there is a need for more precise methodology to address this public health crisis in India Objective: The objective of this study was to independently validate a predictive model developed by our team based on 2015-16 data from India on new data collected in 2019-2021 and assess whether machine learning models offer an improvement over logistic regression model Design/Methods: Data from the National Health and Family Survey (NFHS-5) was used to analyze child undernutrition predictive model (model A) previously published in the literature using NFHS-4 data. This model uses data available at time of childbirth to predict probability of undernutrition in first five years of life. Model performance was assessed using discriminative and calibration properties. Potential enhancement of model performance was explored using Lasso feature selection. These selected variables were then used to train model (model B) with logistic regression, XGBoost, and neural networks, to build a new model using the NFHS-5 data. Results: Among the 134,670 children included in the analysis, approximately half (68,536, or 51%) were classified as undernourished based on CIAF definition. Model A, trained on NFHS-4 data, demonstrated moderate discrimination with an AUC of 65.5%. Despite its accuracy of 57.85% and high sensitivity (90.44%), the model’s low specificity (22.81%) resulted in a substantial number of false positives and an overestimation of risk (table 1). For Model B, predictor variables were selected using the Lasso method. This retained all predictors identified in the previous model (based on clinical evidence) and introduced additional predictors. The new model, trained on NFHS-5 data, achieved a similar AUC of 68.0%, indicating moderate level of discrimination. However, it showed improved overall accuracy (62.97%) with a more balanced trade-off between sensitivity (65.64%) and specificity (60.20%), leading to fewer false positives (table 2). Figure 1 shows the model performance for both models showing the efficiency of traditional logistic models
Conclusion(s): The updated predictive model, trained and tested using NFHS-5, shows only a slightly improved accuracy and a more balanced trade-off, reducing false positives compared to the older logistic regression model
Table 1 - Model A Hosmer-Lemeshow goodness of fit table for distribution of observed CIAF prevalence vs predicted prevalence across decile groups
Table 2 - Model B Hosmer-Lemeshow goodness of fit table for distribution of observed CIAF prevalence vs predicted prevalence across decile groups
Figure 1 - ROC curves for model evaluation using NFHS 5 Data