Comparative Evaluation of Machine Learning Algorithms for Early Prediction of Student Mental Health Risk

Tushita¹, Aashima², Manpreet Singh³
^{1, 2}Research Scholar, ³Assistant Professor,^{1, 2, 3}Maharaja Surajmal Institute (GGSIPU), New Delhi, India

Abstract: In recent years, psychological distress has become a serious concern worldwide. Academic competition, financial problems, unhealthy lifestyle patterns, and societal expectations together contribute to increased levels of stress and anxiety among students. Due to these continuous pressures, students may develop mental health problems that negatively affect their well-being and academic performance. Therefore, early identification of vulnerable students is essential in order to provide timely intervention and preventive support. This research presents a systematic machine learning–based framework developed using survey data containing demographic and behavioral attributes, which are used to predict mental health risk. The main objective of this study is to identify patterns that indicate psychological vulnerability. In this research, four techniques are applied: Logistic Regression, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Random Forest. These techniques are implemented to classify students into high- risk and low-risk groups. Before the development of the model, several operations were performed on the dataset, including preprocessing steps such as treating missing values and encoding features. After that, the numerical variables were normalized and the dataset was divided into training and testing subsets in order to achieve robustness and better generalization capability of the model. Furthermore, the model was evaluated using commonly used performance metrics such as accuracy, precision, recall, F1-score, and confusion matrix. From the experimental results, it was observed that the Random Forest algorithm produced the most accurate classification and balanced metric performance compared with the other models. In the later phase of the project, an ensemble majority voting strategy, improved statistical validation, and a structured evaluation framework were also included to improve the stability of the predictions. Therefore, the proposed system provides an evidence based approach that can help educational institutions proactively identify students at risk of mental health problems and provide timely support and intervention.

Keywords: Student Mental Health, Machine Learning, Random Forest, SVM, Predictive Analytics, Risk Classification

References:

N. A. Semary, W. Ahmed, K. Amin, P. Pławiak, and M. Hammad, "Improving sentiment classification using a ROBERTa-based hybrid model," Frontiers in Human Neuroscience, vol. 17, p. 1292010, Dec. 2023.
M. A. Jahin, M. S. H. Shovon, M. F. Mridha, M. R. Islam, and Y. Watanobe, "A hybrid transformer and attention-based recurrent neural network for robust and interpretable sentiment analysis of tweets," Scientific Reports, vol. 14, no. 24882, 2024.
K. L. Tan, C. P. Lee, K. S. M. Anbananthen, and K. M. Lim, "RoBERTa-LSTM: A hybrid model for sentiment analysis with transformer and recurrent neural network," IEEE Access, vol. 10, pp. 21517-21525, 2022, doi:10.1109/ACCESS.2022.3152828.
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proc. NAACL-HLT, 2019, pp. 4171-4186.
A. Vaswani et al., "Attention is all you need," in Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 5998-6008.
S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
K. Cho et al., "Learning phrase representations using RNN encoder-decoder for statistical machine translation," in Proc. EMNLP, 2014.
Y. Kim, "Convolutional neural networks for sentence classification," in Proc. EMNLP, 2014.
T. Mikolov et al., "Efficient estimation of word representations in vector space," arXiv:1301.3781, 2013.
J. Pennington, R. Socher, and C. Manning, "GloVe: Global vectors for word representation," in Proc. EMNLP, 2014.
N. Reimers and I. Gurevych, "Sentence-BERT: Sentence embeddings using Siamese BERT-networks," in Proc. EMNLP-IJCNLP, 2019.
H. Tan et al., "A hybrid RoBERTa-GRU model for sentiment analysis," Applied Sciences, 2023.
H. Tan et al., "Hybrid RoBERTa-LSTM architecture for sentiment classification," IEEE Access, 2022.
N. Umer, M. Imran, and S. Ullah, "Combining CNN and LSTM for sentiment analysis in social media data," IEEE Access, 2021.
A. Rahat, S. Islam, and M. R. Islam, "Twitter US airline sentiment analysis using machine learning," Procedia Computer Science, 2019.
M. Kumar et al., "Comparative study of machine learning techniques for sentiment analysis," Journal of Information Science, 2020.
A. Goodrum et al., "Sentiment analysis in social media: Applications and challenges," IEEE Transactions on Computational Social Systems, 2020.
M. Bansal et al., "Transformer-based multilingual sentiment analysis using XLM-ROBERTa," Expert Systems with Applications, 2023.
S. Gupta et al., "Aspect-based sentiment analysis using hybrid CNN-RNN models," Knowledge-Based Systems, 2022.
H. Basiri et al., "Sentiment analysis during COVID-19 using deep learning models," Information Processing & Management, 2021.

IITM Journal of Information Technology

ISSN (P) 2395-5457 | Single Blind Peer Reviewed Journal

Published By

INSTITUTE OF INNOVATION IN TECHNOLOGY & MANAGEMENT
Affiliated to GGSIPU, NAAC Grade ‘A’, ISO 14001:2015, 17020:2012, 21001:2018 & 50001:2018 Certified,

A Grade by GNCTD, A++ Grade by SFRC

Comparative Evaluation of Machine Learning Algorithms for Early Prediction of Student Mental Health Risk

IITM JOURNAL OF INFORMATION TECHNOLOGY