A comparative analysis of data mining methods in predicting NCAA bowl outcomes

Dursun Delen, Douglas Cogdell, Nihat Kasap

Research output: Contribution to journalArticle

21 Citations (Scopus)

Abstract

Predicting the outcome of a college football game is an interesting and challenging problem. Most previous studies have concentrated on ranking the bowl-eligible teams according to their perceived strengths, and using these rankings to predict the winner of a specific bowl game. In this study, using eight years of data and three popular data mining techniques (namely artificial neural networks, decision trees and support vector machines), we have developed both classification- and regression-type models in order to assess the predictive abilities of different methodologies (classification versus regression-based classification) and techniques. In the end, the results showed that the classification-type models predict the game outcomes better than regression-based classification models, and of the three classification techniques, decision trees produced the best results, with better than an 85% prediction accuracy on the 10-fold holdout sample. The sensitivity analysis on trained models revealed that the non-conference team winning percentage and average margin of victory are the two most important variables among the 28 that were used in this study.

Original languageEnglish
Pages (from-to)543-552
Number of pages10
JournalInternational Journal of Forecasting
Volume28
Issue number2
DOIs
StatePublished - 1 Apr 2012
Externally publishedYes

Fingerprint

Comparative analysis
Data mining
Ranking
Decision tree
Prediction accuracy
Support vector machine
Football
Predictive ability
Margin
Decision support
Methodology
Artificial neural network
Sensitivity analysis

Keywords

  • Classification
  • College football
  • Knowledge discovery
  • Machine learning
  • Prediction
  • Regression

Cite this

@article{7ee6590b5a4d434d99f54abb8ad0a612,
title = "A comparative analysis of data mining methods in predicting NCAA bowl outcomes",
abstract = "Predicting the outcome of a college football game is an interesting and challenging problem. Most previous studies have concentrated on ranking the bowl-eligible teams according to their perceived strengths, and using these rankings to predict the winner of a specific bowl game. In this study, using eight years of data and three popular data mining techniques (namely artificial neural networks, decision trees and support vector machines), we have developed both classification- and regression-type models in order to assess the predictive abilities of different methodologies (classification versus regression-based classification) and techniques. In the end, the results showed that the classification-type models predict the game outcomes better than regression-based classification models, and of the three classification techniques, decision trees produced the best results, with better than an 85{\%} prediction accuracy on the 10-fold holdout sample. The sensitivity analysis on trained models revealed that the non-conference team winning percentage and average margin of victory are the two most important variables among the 28 that were used in this study.",
keywords = "Classification, College football, Knowledge discovery, Machine learning, Prediction, Regression",
author = "Dursun Delen and Douglas Cogdell and Nihat Kasap",
year = "2012",
month = "4",
day = "1",
doi = "10.1016/j.ijforecast.2011.05.002",
language = "English",
volume = "28",
pages = "543--552",
journal = "International Journal of Forecasting",
issn = "0169-2070",
publisher = "Elsevier",
number = "2",

}

A comparative analysis of data mining methods in predicting NCAA bowl outcomes. / Delen, Dursun; Cogdell, Douglas; Kasap, Nihat.

In: International Journal of Forecasting, Vol. 28, No. 2, 01.04.2012, p. 543-552.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A comparative analysis of data mining methods in predicting NCAA bowl outcomes

AU - Delen, Dursun

AU - Cogdell, Douglas

AU - Kasap, Nihat

PY - 2012/4/1

Y1 - 2012/4/1

N2 - Predicting the outcome of a college football game is an interesting and challenging problem. Most previous studies have concentrated on ranking the bowl-eligible teams according to their perceived strengths, and using these rankings to predict the winner of a specific bowl game. In this study, using eight years of data and three popular data mining techniques (namely artificial neural networks, decision trees and support vector machines), we have developed both classification- and regression-type models in order to assess the predictive abilities of different methodologies (classification versus regression-based classification) and techniques. In the end, the results showed that the classification-type models predict the game outcomes better than regression-based classification models, and of the three classification techniques, decision trees produced the best results, with better than an 85% prediction accuracy on the 10-fold holdout sample. The sensitivity analysis on trained models revealed that the non-conference team winning percentage and average margin of victory are the two most important variables among the 28 that were used in this study.

AB - Predicting the outcome of a college football game is an interesting and challenging problem. Most previous studies have concentrated on ranking the bowl-eligible teams according to their perceived strengths, and using these rankings to predict the winner of a specific bowl game. In this study, using eight years of data and three popular data mining techniques (namely artificial neural networks, decision trees and support vector machines), we have developed both classification- and regression-type models in order to assess the predictive abilities of different methodologies (classification versus regression-based classification) and techniques. In the end, the results showed that the classification-type models predict the game outcomes better than regression-based classification models, and of the three classification techniques, decision trees produced the best results, with better than an 85% prediction accuracy on the 10-fold holdout sample. The sensitivity analysis on trained models revealed that the non-conference team winning percentage and average margin of victory are the two most important variables among the 28 that were used in this study.

KW - Classification

KW - College football

KW - Knowledge discovery

KW - Machine learning

KW - Prediction

KW - Regression

UR - http://www.scopus.com/inward/record.url?scp=84856708562&partnerID=8YFLogxK

U2 - 10.1016/j.ijforecast.2011.05.002

DO - 10.1016/j.ijforecast.2011.05.002

M3 - Article

AN - SCOPUS:84856708562

VL - 28

SP - 543

EP - 552

JO - International Journal of Forecasting

JF - International Journal of Forecasting

SN - 0169-2070

IS - 2

ER -