Table of Contents

International Journal of Computer Science Engineering Techniques

ISSN: 2455-135X

Volume 10, Issue 3 | Published: May – June – 2026

Author

Saurabh Sharma, Vaibhav Paliwal, Manohar Singh, Bhavesh Kumawat, Smita Dandge

Abstract

Predicting student academic performance at an early stage is a critical challenge in modern educational institutions. This paper presents an end-to-end machine learning pipeline that applies the Decision Tree ID3 algorithm to predict student grade categories and identify at-risk students before final examinations. The system accepts ten student-level features — including attendance, previous scores, study hours, internal marks, assignment completion, participation score, lab performance, number of backlogs, parental education, and internet access — and classifies each student into one of four grade categories: Distinction (≥75%), First Class (60–74%), Pass (40–59%), or Fail (<40%). A standard preprocessing pipeline comprising median imputation, label encoding, and standard scaling is applied before model training. The trained model is deployed via a Flask REST API with a browser-based HTML interface enabling real-time prediction with confidence scores, interpretable decision paths, and proactive risk warnings. Experimental evaluation on a 500-record synthetic dataset yields an accuracy of 57%, precision of 60%, and F1-score of 57%. Results confirm that attendance and previous academic scores are the two dominant predictors of student outcomes, consistent with prior literature. The system provides educators with an interpretable, actionable tool for early intervention.

Keywords

attendance prediction, decision tree, educational data mining, ID3 algorithm, machine learning, student performance prediction

Conclusion

This paper presented a complete, end-to-end machine learning system for student performance prediction using the Decision Tree ID3 algorithm. The system processes ten student-level features through a standard preprocessing pipeline, classifies students into four grade categories, and delivers real-time predictions via a Flask web interface. Experimental results on a 500-record synthetic dataset achieve 57% accuracy with 60% precision. Attendance and previous academic score are identified as the most influential predictors, reinforcing findings from the existing literature. Future work will replace the synthetic dataset with real institutional records, benchmark ensemble methods (Random Forest, Gradient Boosting) against the current Decision Tree baseline, and extend the interface with a teacher dashboard supporting batch CSV upload and automated email alerts for at-risk students.

References

[1] C. Romero and S. Ventura, “Educational data mining: A survey from 1995 to 2005,” Expert Systems with Applications, vol. 33, no. 1, pp. 135–146, 2007. [2] P. Cortez and A. Silva, “Using data mining to predict secondary school student performance,” in Proc. 5th Future Business Technology Conference (FUBUTEC), Porto, Portugal, pp. 5–12, 2008. [3] S. Pal, “Mining educational data to reduce dropout rates of engineering students,” Int. Journal of Information Engineering and Electronic Business, vol. 4, no. 2, pp. 1–7, 2012. [4] H. Hamsa, S. Indiradevi, and J. J. Kizhakkethottam, “Student academic performance prediction model using decision tree and fuzzy genetic algorithm,” Procedia Technology, vol. 25, pp. 326–332, 2016. [5] M. Hlosta, Z. Zdrahal, and J. Zendulka, “Ouroboros: Early identification of at-risk students without prior knowledge,” in Proc. 7th Int. Learning Analytics & Knowledge Conference (LAK), Vancouver, pp. 6–15, 2017. [6] Scikit-learn Developers, “scikit-learn: Machine learning in Python,” Version 1.3, 2023. [Online]. Available: https://scikit-learn.or