Skip to main content
Article

Addressing Class Imbalance in Predicting Student Academic Outcomes: A Comparative Study of Resampling Techniques with Machine Learning Classifiers in Higher Education 

Authors
  • Tri Rochmadi
  • Anantian Mahendra Tirta Saputra

Abstract

The increasing availability of educational data presents a significant opportunity for higher education institutions to proactively identify and support students at risk of academic failure or dropout. However, datasets in this domain are often characterized by a severe class imbalance, where successful students vastly outnumber those who drop out or struggle, posing a substantial challenge for standard predictive modeling techniques. This study addresses this issue by conducting a comprehensive, comparative analysis of machine learning classifiers and data resampling techniques to accurately predict student academic outcomes—categorized as Graduate, Dropout, or Enrolled. Using a dataset of 4,424 undergraduate students from a Portuguese institution, we evaluate six distinct classifiers, including Logistic Regression, Random Forest, and Support Vector Machines. The models are first trained on the original, imbalanced data to establish a performance baseline, which reveals a significant weakness in identifying the minority 'Enrolled' class. Subsequently, we implement a suite of oversampling, undersampling, and hybrid resampling techniques, such as SMOTE, ADASYN, and RandomUnderSampler, to balance the training data. The results demonstrate that data resampling, particularly oversampling, provides a significant performance improvement across all models. The combination of a Random Forest classifier with the ADASYN technique emerged as the most effective approach, achieving the highest macro-averaged F1-score of 0.7081. Crucially, this method substantially improved the model's ability to correctly classify the underrepresented 'Enrolled' students. This research validates a robust methodology for handling imbalanced data in educational analytics and underscores the necessity of such techniques for building fair and effective early-warning systems. The findings provide a clear pathway for institutions to leverage AI for more equitable and targeted student support, ultimately fostering higher retention and success rates.

Keywords: Class Imbalance, Educational Data Mining, Learning Analytics, Predictive Modelling, Student Registration

How to Cite:

Rochmadi, T. & Saputra, A., (2025) “Addressing Class Imbalance in Predicting Student Academic Outcomes: A Comparative Study of Resampling Techniques with Machine Learning Classifiers in Higher Education ”, Artificial Intelligence in Learning 1(3), 211-227. doi: https://doi.org/10.63913/ail.v1i3.32

Downloads:
Download PDF
View PDF

87 Views

25 Downloads

Published on
2025-09-23