789 - Combining NHANES Data and Machine Learning to Develop an Online Screener for Youth Prediabetes and Diabetes
Monday, April 28, 2025
7:00am – 9:15am HST
Publication Number: 789.4514
Yan Chak Li, Icahn School of Medicine at Mount Sinai, Staten Island, NY, United States; Catherine McDonough, Icahn School of Medicine at Mount Sinai, New York, NY, United States; Bian Liu, Icahn School of Medicine at Mount Sinai, New YOrk, NY, United States; Gaurav Pandey, Icahn School of Medicine at Mount Sinai, New York, NY, United States; Nita Vangeepuram, Icahn School of Medicine at Mount Sinai, New York, NY, United States
Associate Professor Icahn School of Medicine at Mount Sinai New York, New York, United States
Background: The prevalence of prediabetes and diabetes (preDM/DM) among youth is increasing rapidly in the United States. Early screening tools for and diagnosis of this complex disorder are crucial, but existing screening guidelines and tools are geared towards only health providers, not the general public. Among adults, easily usable questionnaire-based screeners have greatly expanded diabetes awareness and screening. However, no similar tools exist for youth. Objective: This study aimed to address this gap by developing a publicly accessible, online screener for youth preDM/DM risk using machine learning methods, and the rich data collected in the National Health and Nutrition Examination Survey (NHANES). Design/Methods: This process leveraged a curated multi-domain NHANES dataset (1999-2018) including 95 potential preDM/DM risk factors for over 15,000 youth aged 12-19 years. Figure 1 provides an overview of the process we used to build our online youth preDM/DM screener. Specifically, we applied a systematic machine learning approach consisting of feature selection and classification methods to 80% of this data set to identify top variables that were effective at predicting youth preDM/DM status. The final model based on these variables was evaluated on the remaining 20% of the data. We also compared predictive performance of our screener to that based on four preDM/DM risk factors identified in the current American Diabetes Association (ADA) guidelines for children and adolescents. Results: Our machine learning-based model, comprising 14 preDM/DM risk-related variables (Table 1, bold), outperformed the adapted ADA guideline in identifying preDM/DM cases (Table 2). The model was transformed into a publicly available online 15-question screener ( https://rstudio-connect.hpc.mssm.edu/POND/#section-screener) to assess preDM/DM risk.
Conclusion(s): This study demonstrated the potential of machine learning-based approaches for developing effective screeners for youth preDM/DM risk. The online screener we developed addresses the previous lack of easily usable youth preDM/DM risk assessment tools and includes risk factors not considered in current screening guidelines (e.g., social determinants of diabetes risk, screen time, and self-reported perceived health status). Our work can help raise awareness about diabetes risk and prevention strategies among youth.
Figure 1. Overview of the process for building our online youth preDM/DM screener.
Table 1: Questions included in the screener and related screener variables identified from NHANES (in bold) using machine learning.
Table 2: Performance measures of the final machine learning-based screener and modified ADA guidelines on 100 bootstrapped versions of the validation dataset.