Artificial Intelligence in Learning

Enhancing Open Access to Data Science Education: Analyzing Skill Patterns Using LDA and K-Means Clustering in the Learning Path Index Dataset

2025-05-29T15:08:47+07:00

This study examines the application of Latent Dirichlet Allocation (LDA) and K-Means clustering techniques to analyze the Learning Path Index Dataset, with the aim of identifying and categorizing data science education skills. By employing these machine learning models, the research reveals distinct skill patterns and clusters that characterize the dataset, highlighting prevalent skills and potential gaps in data science education accessible through open educational resources (OER). The findings demonstrate specific clusters of beginner to advanced data science topics, offering insights into the accessibility and distribution of educational content. These results can guide educators and platform developers in enhancing the structure and delivery of data science education, thereby improving learner outcomes and resource allocation. The study also discusses the broader implications for educational strategy and policy, emphasizing the role of targeted analytics in optimizing educational offerings in an increasingly digital landscape. Future research directions include expanding the dataset and applying similar analytical frameworks to other fields within open education to further validate and refine these findings.

Predicting University Rankings Using Random Forest Regression on Institutional Metrics: A Data Mining Approach for Enhancing Higher Education Decision-Making

2025-05-29T15:07:32+07:00

This study investigates the prediction of university rankings using Random Forest regression, leveraging institutional metrics as input features. The primary objective is to enhance the decision-making process in higher education by providing a data-driven model capable of forecasting rankings with greater transparency and accuracy. The research utilizes a comprehensive dataset containing institutional metrics such as research quality, teaching effectiveness, international outlook, and industry impact. Random Forest regression is chosen for its robustness, handling both linear and non-linear relationships between features and the target ranking variable. Feature selection techniques, including correlation analysis and dimensionality reduction, are applied to identify key metrics that influence rankings. Through rigorous model training and hyperparameter tuning, an optimal Random Forest model is developed indicating strong predictive accuracy. Evaluation metrics such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R² are used to assess model performance. The feature importance analysis reveals that research quality and research environment have the highest impact on university rankings, followed by teaching and international outlook. These findings align with common assumptions in higher education rankings, while also revealing the potential of less-studied metrics, such as industry impact and international student population, to influence rankings. This study contributes to the field of open education by presenting a transparent and accessible method for predicting university rankings. It empowers students, administrators, and policymakers with a data-driven approach to assess institutional performance. The research also highlights the limitations of current ranking systems and suggests avenues for future studies, including the use of multi-year datasets and alternative machine learning models. , , , ,

Predicting Online Course Popularity Using LightGBM: A Data Mining Approach on Udemy's Educational Dataset

2025-05-29T15:11:28+07:00

The increasing demand for online education has led to a rapid expansion of platforms such as Udemy, where predicting the popularity of courses can provide valuable insights for course creators and platform managers. This research aims to predict the popularity of online courses on Udemy using LightGBM, a powerful gradient boosting framework that is well-suited for classification tasks. The study begins with a dataset overview, which includes key course features such as payment type (is_paid), price, number of lectures, course level, content duration, subject, published timestamp, and number of subscribers. The preprocessing steps involved handling missing values, encoding categorical variables, and extracting temporal features from the publication date to capture trends over time. Exploratory Data Analysis (EDA) is conducted to uncover patterns and relationships within the dataset, including descriptive statistics and visualizations to understand distributions and correlations between variables. A correlation heatmap is used to identify significant associations between the predictors and the target variable, course popularity (measured by the number of subscribers). The core of the study employs the LightGBM model, which is trained using a train-test split approach and evaluated based on performance metrics such as accuracy, precision, and recall. The results show that features such as the number of lectures, price, and content duration have the greatest influence on course popularity, while certain features like course level show a limited impact. A comparative analysis with a baseline model reveals that LightGBM outperforms simple mean-based predictions in terms of predictive accuracy. The findings underscore the importance of course content structure and pricing strategies for increasing enrollment. Finally, the study discusses limitations, such as the lack of course quality metrics, and suggests avenues for future research, including the exploration of more advanced machine learning techniques and incorporating additional data sources for a more comprehensive model.

Machine Learning for Wage Growth Prediction: Analyzing the Role of Experience, Education, and Union Membership in Workforce Earnings Using Gradient Boosting

2025-05-29T15:10:51+07:00

This research investigates the application of machine learning, specifically gradient boosting, to predict wage growth by analyzing the roles of experience, education, and union membership. As labor market dynamics become increasingly complex, accurate wage prediction models are essential for informing workforce planning and educational strategies. This study utilizes a dataset that includes variables such as years of experience, education level, union affiliation, and industry type. Gradient boosting, a powerful ensemble learning algorithm, is employed to predict wages and is evaluated against a baseline linear regression model. The model’s performance is assessed using Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE), showing that gradient boosting significantly outperforms linear regression in terms of predictive accuracy. Feature importance analysis reveals that education level (schooling) is the most influential factor in wage prediction, followed by years of experience, union membership, and marital status. The study highlights the importance of education and union support in driving wage growth, offering valuable insights for policymakers and workforce planners. Despite promising results, limitations such as dataset constraints and the need for broader socioeconomic factors suggest avenues for future research. Further exploration into the integration of alternative machine learning algorithms, such as Random Forest or Neural Networks, and the inclusion of more diverse variables could improve model robustness and generalizability. The findings have practical applications in AI-powered workforce development systems, offering a data-driven approach to career guidance, educational planning, and labor market policy development. This research underscores the potential of AI and machine learning to enhance economic modeling and workforce development strategies.

Predicting User Engagement in E-Learning Platforms Using Decision Tree Classification: Analyzing Early Activity and Device Interaction Patterns

2025-05-29T15:10:16+07:00

This study investigates the prediction of user engagement in e-learning platforms by applying a Decision Tree classification model. Early user activity and device interaction patterns are explored as key predictors of engagement levels. With increasing demand for personalized learning strategies, identifying patterns of engagement early in the learning process can provide valuable insights for improving retention and learner outcomes. The dataset used in this study consists of various features, including user activity metrics (e.g., homework completion, task performance) and device interaction data (e.g., operating system, device type). After preprocessing and feature selection, a Decision Tree classifier was trained on the dataset to predict user engagement. The model's performance was evaluated using accuracy, precision, recall, and F1-score metrics. The results revealed that the Decision Tree model achieved an accuracy of 74.24%, with precision for the low-engagement class significantly lower than that for high-engagement users, indicating challenges in predicting less-engaged users. The study highlights the potential of using early engagement signals to predict learner behavior, providing a foundation for the development of personalized interventions. While the model provides useful insights, the study also acknowledges limitations, including dataset imbalance and limited generalizability across different e-learning platforms. Future research could explore the inclusion of additional engagement indicators, such as emotional response or interaction with course content, and the use of more advanced machine learning techniques. Overall, this research contributes to the growing body of knowledge on AI-driven user engagement prediction in e-learning, offering practical implications for improving student retention and learning outcomes.