TY - JOUR
T1 - Predicting and analyzing secondary education placement-test scores
T2 - A data mining approach
AU - Şen, Baha
AU - Uçar, Emine
AU - Delen, Dursun
PY - 2012/8/1
Y1 - 2012/8/1
N2 - Understanding the factors that lead to success (or failure) of students at placement tests is an interesting and challenging problem. Since the centralized placement tests and future academic achievements are considered to be related concepts, analysis of the success factors behind placement tests may help understand and potentially improve academic achievement. In this study using a large and feature rich dataset from Secondary Education Transition System in Turkey we developed models to predict secondary education placement test results, and using sensitivity analysis on those prediction models we identified the most important predictors. The results showed that C5 decision tree algorithm is the best predictor with 95% accuracy on hold-out sample, followed by support vector machines (with an accuracy of 91%) and artificial neural networks (with an accuracy of 89%). Logistic regression models came out to be the least accurate of the four with and overall accuracy of 82%. The sensitivity analysis revealed that previous test experience, whether a student has a scholarship, student's number of siblings, previous years' grade point average are among the most important predictors of the placement test scores.
AB - Understanding the factors that lead to success (or failure) of students at placement tests is an interesting and challenging problem. Since the centralized placement tests and future academic achievements are considered to be related concepts, analysis of the success factors behind placement tests may help understand and potentially improve academic achievement. In this study using a large and feature rich dataset from Secondary Education Transition System in Turkey we developed models to predict secondary education placement test results, and using sensitivity analysis on those prediction models we identified the most important predictors. The results showed that C5 decision tree algorithm is the best predictor with 95% accuracy on hold-out sample, followed by support vector machines (with an accuracy of 91%) and artificial neural networks (with an accuracy of 89%). Logistic regression models came out to be the least accurate of the four with and overall accuracy of 82%. The sensitivity analysis revealed that previous test experience, whether a student has a scholarship, student's number of siblings, previous years' grade point average are among the most important predictors of the placement test scores.
KW - Classification
KW - Data mining
KW - Prediction
KW - Sensitivity analysis
KW - SETS
UR - http://www.scopus.com/inward/record.url?scp=84859217470&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2012.02.112
DO - 10.1016/j.eswa.2012.02.112
M3 - Article
AN - SCOPUS:84859217470
SN - 0957-4174
VL - 39
SP - 9468
EP - 9476
JO - Expert Systems with Applications
JF - Expert Systems with Applications
IS - 10
ER -