Predicting and explaining corruption across countries: A machine learning approach

Marcio Salles Melo Lima, Dursun Delen

Research output: Contribution to journalArticle

Abstract

In the era of Big Data, Analytics, and Data Science, corruption is still ubiquitous and is perceived as one of the major challenges of modern societies. A large body of academic studies has attempted to identify and explain the potential causes and consequences of corruption, at varying levels of granularity, mostly through theoretical lenses by using correlations and regression-based statistical analyses. The present study approaches the phenomenon from the predictive analytics perspective by employing contemporary machine learning techniques to discover the most important corruption perception predictors based on enriched/enhanced nonlinear models with a high level of predictive accuracy. Specifically, within the multiclass classification modeling setting that is employed herein, the Random Forest (an ensemble-type machine learning algorithm) is found to be the most accurate prediction/classification model, followed by Support Vector Machines and Artificial Neural Networks. From the practical standpoint, the enhanced predictive power of machine learning algorithms coupled with a multi-source database revealed the most relevant corruption-related information, contributing to the related body of knowledge, generating actionable insights for administrator, scholars, citizens, and politicians. The variable importance results indicated that government integrity, property rights, judicial effectiveness, and education index are the most influential factors in defining the corruption level of significance.

Original languageEnglish
Article number101407
JournalGovernment Information Quarterly
Volume37
Issue number1
DOIs
StateAccepted/In press - 1 Jan 2019
Externally publishedYes

Fingerprint

corruption
learning
non-linear model
studies (academic)
right of ownership
neural network
integrity
politician
citizen
regression
cause
science
society
knowledge
education

Keywords

  • Corruption perception
  • Government integrity
  • Machine learning
  • Predictive modeling
  • Random forest
  • Social development
  • Society policies and regulations

Cite this

@article{e13a7aea1ab842f6b721a3558d248cf1,
title = "Predicting and explaining corruption across countries: A machine learning approach",
abstract = "In the era of Big Data, Analytics, and Data Science, corruption is still ubiquitous and is perceived as one of the major challenges of modern societies. A large body of academic studies has attempted to identify and explain the potential causes and consequences of corruption, at varying levels of granularity, mostly through theoretical lenses by using correlations and regression-based statistical analyses. The present study approaches the phenomenon from the predictive analytics perspective by employing contemporary machine learning techniques to discover the most important corruption perception predictors based on enriched/enhanced nonlinear models with a high level of predictive accuracy. Specifically, within the multiclass classification modeling setting that is employed herein, the Random Forest (an ensemble-type machine learning algorithm) is found to be the most accurate prediction/classification model, followed by Support Vector Machines and Artificial Neural Networks. From the practical standpoint, the enhanced predictive power of machine learning algorithms coupled with a multi-source database revealed the most relevant corruption-related information, contributing to the related body of knowledge, generating actionable insights for administrator, scholars, citizens, and politicians. The variable importance results indicated that government integrity, property rights, judicial effectiveness, and education index are the most influential factors in defining the corruption level of significance.",
keywords = "Corruption perception, Government integrity, Machine learning, Predictive modeling, Random forest, Social development, Society policies and regulations",
author = "Lima, {Marcio Salles Melo} and Dursun Delen",
year = "2019",
month = "1",
day = "1",
doi = "10.1016/j.giq.2019.101407",
language = "English",
volume = "37",
journal = "Government Information Quarterly",
issn = "0740-624X",
number = "1",

}

Predicting and explaining corruption across countries : A machine learning approach. / Lima, Marcio Salles Melo; Delen, Dursun.

In: Government Information Quarterly, Vol. 37, No. 1, 101407, 01.01.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Predicting and explaining corruption across countries

T2 - A machine learning approach

AU - Lima, Marcio Salles Melo

AU - Delen, Dursun

PY - 2019/1/1

Y1 - 2019/1/1

N2 - In the era of Big Data, Analytics, and Data Science, corruption is still ubiquitous and is perceived as one of the major challenges of modern societies. A large body of academic studies has attempted to identify and explain the potential causes and consequences of corruption, at varying levels of granularity, mostly through theoretical lenses by using correlations and regression-based statistical analyses. The present study approaches the phenomenon from the predictive analytics perspective by employing contemporary machine learning techniques to discover the most important corruption perception predictors based on enriched/enhanced nonlinear models with a high level of predictive accuracy. Specifically, within the multiclass classification modeling setting that is employed herein, the Random Forest (an ensemble-type machine learning algorithm) is found to be the most accurate prediction/classification model, followed by Support Vector Machines and Artificial Neural Networks. From the practical standpoint, the enhanced predictive power of machine learning algorithms coupled with a multi-source database revealed the most relevant corruption-related information, contributing to the related body of knowledge, generating actionable insights for administrator, scholars, citizens, and politicians. The variable importance results indicated that government integrity, property rights, judicial effectiveness, and education index are the most influential factors in defining the corruption level of significance.

AB - In the era of Big Data, Analytics, and Data Science, corruption is still ubiquitous and is perceived as one of the major challenges of modern societies. A large body of academic studies has attempted to identify and explain the potential causes and consequences of corruption, at varying levels of granularity, mostly through theoretical lenses by using correlations and regression-based statistical analyses. The present study approaches the phenomenon from the predictive analytics perspective by employing contemporary machine learning techniques to discover the most important corruption perception predictors based on enriched/enhanced nonlinear models with a high level of predictive accuracy. Specifically, within the multiclass classification modeling setting that is employed herein, the Random Forest (an ensemble-type machine learning algorithm) is found to be the most accurate prediction/classification model, followed by Support Vector Machines and Artificial Neural Networks. From the practical standpoint, the enhanced predictive power of machine learning algorithms coupled with a multi-source database revealed the most relevant corruption-related information, contributing to the related body of knowledge, generating actionable insights for administrator, scholars, citizens, and politicians. The variable importance results indicated that government integrity, property rights, judicial effectiveness, and education index are the most influential factors in defining the corruption level of significance.

KW - Corruption perception

KW - Government integrity

KW - Machine learning

KW - Predictive modeling

KW - Random forest

KW - Social development

KW - Society policies and regulations

UR - http://www.scopus.com/inward/record.url?scp=85073157082&partnerID=8YFLogxK

U2 - 10.1016/j.giq.2019.101407

DO - 10.1016/j.giq.2019.101407

M3 - Article

AN - SCOPUS:85073157082

VL - 37

JO - Government Information Quarterly

JF - Government Information Quarterly

SN - 0740-624X

IS - 1

M1 - 101407

ER -