Predicting and analyzing secondary education placement-test scores: A data mining approach

Baha Şen, Emine Uçar, Dursun Delen

Research output: Contribution to journalArticle

36 Citations (Scopus)

Abstract

Understanding the factors that lead to success (or failure) of students at placement tests is an interesting and challenging problem. Since the centralized placement tests and future academic achievements are considered to be related concepts, analysis of the success factors behind placement tests may help understand and potentially improve academic achievement. In this study using a large and feature rich dataset from Secondary Education Transition System in Turkey we developed models to predict secondary education placement test results, and using sensitivity analysis on those prediction models we identified the most important predictors. The results showed that C5 decision tree algorithm is the best predictor with 95% accuracy on hold-out sample, followed by support vector machines (with an accuracy of 91%) and artificial neural networks (with an accuracy of 89%). Logistic regression models came out to be the least accurate of the four with and overall accuracy of 82%. The sensitivity analysis revealed that previous test experience, whether a student has a scholarship, student's number of siblings, previous years' grade point average are among the most important predictors of the placement test scores.

Original languageEnglish
Pages (from-to)9468-9476
Number of pages9
JournalExpert Systems with Applications
Volume39
Issue number10
DOIs
StatePublished - 1 Aug 2012
Externally publishedYes

Fingerprint

Data mining
Education
Students
Sensitivity analysis
Decision trees
Support vector machines
Logistics
Neural networks

Keywords

  • Classification
  • Data mining
  • Prediction
  • Sensitivity analysis
  • SETS

Cite this

@article{375d6cf50aea44478c6874daa6845c1a,
title = "Predicting and analyzing secondary education placement-test scores: A data mining approach",
abstract = "Understanding the factors that lead to success (or failure) of students at placement tests is an interesting and challenging problem. Since the centralized placement tests and future academic achievements are considered to be related concepts, analysis of the success factors behind placement tests may help understand and potentially improve academic achievement. In this study using a large and feature rich dataset from Secondary Education Transition System in Turkey we developed models to predict secondary education placement test results, and using sensitivity analysis on those prediction models we identified the most important predictors. The results showed that C5 decision tree algorithm is the best predictor with 95{\%} accuracy on hold-out sample, followed by support vector machines (with an accuracy of 91{\%}) and artificial neural networks (with an accuracy of 89{\%}). Logistic regression models came out to be the least accurate of the four with and overall accuracy of 82{\%}. The sensitivity analysis revealed that previous test experience, whether a student has a scholarship, student's number of siblings, previous years' grade point average are among the most important predictors of the placement test scores.",
keywords = "Classification, Data mining, Prediction, Sensitivity analysis, SETS",
author = "Baha Şen and Emine U{\cc}ar and Dursun Delen",
year = "2012",
month = "8",
day = "1",
doi = "10.1016/j.eswa.2012.02.112",
language = "English",
volume = "39",
pages = "9468--9476",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Ltd",
number = "10",

}

Predicting and analyzing secondary education placement-test scores : A data mining approach. / Şen, Baha; Uçar, Emine; Delen, Dursun.

In: Expert Systems with Applications, Vol. 39, No. 10, 01.08.2012, p. 9468-9476.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Predicting and analyzing secondary education placement-test scores

T2 - A data mining approach

AU - Şen, Baha

AU - Uçar, Emine

AU - Delen, Dursun

PY - 2012/8/1

Y1 - 2012/8/1

N2 - Understanding the factors that lead to success (or failure) of students at placement tests is an interesting and challenging problem. Since the centralized placement tests and future academic achievements are considered to be related concepts, analysis of the success factors behind placement tests may help understand and potentially improve academic achievement. In this study using a large and feature rich dataset from Secondary Education Transition System in Turkey we developed models to predict secondary education placement test results, and using sensitivity analysis on those prediction models we identified the most important predictors. The results showed that C5 decision tree algorithm is the best predictor with 95% accuracy on hold-out sample, followed by support vector machines (with an accuracy of 91%) and artificial neural networks (with an accuracy of 89%). Logistic regression models came out to be the least accurate of the four with and overall accuracy of 82%. The sensitivity analysis revealed that previous test experience, whether a student has a scholarship, student's number of siblings, previous years' grade point average are among the most important predictors of the placement test scores.

AB - Understanding the factors that lead to success (or failure) of students at placement tests is an interesting and challenging problem. Since the centralized placement tests and future academic achievements are considered to be related concepts, analysis of the success factors behind placement tests may help understand and potentially improve academic achievement. In this study using a large and feature rich dataset from Secondary Education Transition System in Turkey we developed models to predict secondary education placement test results, and using sensitivity analysis on those prediction models we identified the most important predictors. The results showed that C5 decision tree algorithm is the best predictor with 95% accuracy on hold-out sample, followed by support vector machines (with an accuracy of 91%) and artificial neural networks (with an accuracy of 89%). Logistic regression models came out to be the least accurate of the four with and overall accuracy of 82%. The sensitivity analysis revealed that previous test experience, whether a student has a scholarship, student's number of siblings, previous years' grade point average are among the most important predictors of the placement test scores.

KW - Classification

KW - Data mining

KW - Prediction

KW - Sensitivity analysis

KW - SETS

UR - http://www.scopus.com/inward/record.url?scp=84859217470&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2012.02.112

DO - 10.1016/j.eswa.2012.02.112

M3 - Article

AN - SCOPUS:84859217470

VL - 39

SP - 9468

EP - 9476

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

IS - 10

ER -