Academic dropout concerns all levels of education, and a vast body of research attempts to interpret and minimize the issue. Machine learning models have become a topic of interest regarding the latter objective as a means of statistical prediction. This research delved into using machine learning models to predict student dropouts using data from the Education Longitudinal Study 2002 (ELS:2002), over 4000 variables, and 16,000 observations, comparatively utilizing various models and determining which makes the most accurate predictions on academic dropout. The dataset, which nationally represents the educational progress of the student body from high school to postsecondary years, contains a wide array of academic and non-academic characteristics of students, including academic performance, socioeconomic status, school engagement, and family background. We aimed to develop predictive models by using these attributes to identify students at risk of dropping out of high school. We made use of a variety of supervised, machine learning models appropriate for classification-focused prediction. The data first undergoes feature selection based on the number of missing values in each feature column to ensure that only high-quality attributes contribute to the prediction. Subsequently, various preprocessing methods, such as imputation, feature encoding, and oversampling, are applied to further format and clean the dataset. Precision, recall, and F1-score metrics were utilized to assess the model's classification performance. Ultimately, we found that applying XGBoost to the ELS:2002 dataset, which was preprocessed with SMOTE (Synthetic Minority Oversampling Technique) oversampling and data imputation, yielded the most favorable balance between these metrics. This research agrees with previous studies that confirm XGBoost’s effectiveness in predicting student dropout while contributing findings with the ELS:2002 dataset, which, to our knowledge, has yet to be utilized for dropout prediction via machine learning methods.
A Machine Learning Approach to Predicting Student Dropouts on the ELS:2002 Study
Category
Student Abstract Submission