Development of a new metric to identify rare patterns in association analysis: The case of analyzing diabetes complications

Saeed Piri, Dursun Delen, Tieming Liu, William Paiva

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Diabetes, one of the most serious and fast growing chronic health conditions, often leads to other serious complications such as neurological, renal, ophthalmic, and heart diseases. Research has shown that more than 85% of diabetic patients develop at least one of these complications. Therefore, studying comorbidities among diabetic patients using association analysis is a worthy research endeavor. Association analysis is a well-known data mining method that aims to reveal the association/affinity patterns/rules among various items (objects or events) that occur together. One of the most critical problems in association analysis is the difficulty with the identification of rare items/patterns. In ordinary association analysis, specifying a large minimum-support leads to not discovering rare rules, while setting a small minimum-support leads to over-generating rules that may not be strong and beneficial. In this study, we propose a new assessment metric, called adjusted_support, to address this problem. Applying this new metric can retrieve rare patterns without over-generating association rules. To test the proposed metric, we extracted data from a large and feature-rich electronic medical records data warehouse and performed association analysis on the resultant data set that included 492,025 unique patients diagnosed with diabetes and related complications. By applying adjusted_support, we discovered interesting associations among diabetes complications such as neurological manifestations with diabetic arthropathy and gastroparesis; renal manifestations with retinopathy; gastroparesis with ketoacidosis and retinopathy; and skin complications with hyperglycemia, peripheral circulatory disorder, heart disease, and neurological manifestations. We also performed association analysis in various demographic groups at more granular levels. Besides association analysis, we also analyzed the comorbidity situation among different demographic groups of diabetics. Finally, we studied and compared the prevalence of diabetes complications in every demographic group of patients.

Original languageEnglish
Pages (from-to)112-125
Number of pages14
JournalExpert Systems with Applications
Volume94
DOIs
StatePublished - 15 Mar 2018

Fingerprint

Medical problems
Electronic medical equipment
Data warehouses
Association rules
Data mining
Skin
Health

Keywords

  • Adjusted_support
  • Association rule mining
  • Comorbidity
  • Data mining
  • Diabetes
  • Rare-pattern identification

Cite this

@article{dd4db04ddaba4b3d870f6d9be3cd47f5,
title = "Development of a new metric to identify rare patterns in association analysis: The case of analyzing diabetes complications",
abstract = "Diabetes, one of the most serious and fast growing chronic health conditions, often leads to other serious complications such as neurological, renal, ophthalmic, and heart diseases. Research has shown that more than 85{\%} of diabetic patients develop at least one of these complications. Therefore, studying comorbidities among diabetic patients using association analysis is a worthy research endeavor. Association analysis is a well-known data mining method that aims to reveal the association/affinity patterns/rules among various items (objects or events) that occur together. One of the most critical problems in association analysis is the difficulty with the identification of rare items/patterns. In ordinary association analysis, specifying a large minimum-support leads to not discovering rare rules, while setting a small minimum-support leads to over-generating rules that may not be strong and beneficial. In this study, we propose a new assessment metric, called adjusted_support, to address this problem. Applying this new metric can retrieve rare patterns without over-generating association rules. To test the proposed metric, we extracted data from a large and feature-rich electronic medical records data warehouse and performed association analysis on the resultant data set that included 492,025 unique patients diagnosed with diabetes and related complications. By applying adjusted_support, we discovered interesting associations among diabetes complications such as neurological manifestations with diabetic arthropathy and gastroparesis; renal manifestations with retinopathy; gastroparesis with ketoacidosis and retinopathy; and skin complications with hyperglycemia, peripheral circulatory disorder, heart disease, and neurological manifestations. We also performed association analysis in various demographic groups at more granular levels. Besides association analysis, we also analyzed the comorbidity situation among different demographic groups of diabetics. Finally, we studied and compared the prevalence of diabetes complications in every demographic group of patients.",
keywords = "Adjusted_support, Association rule mining, Comorbidity, Data mining, Diabetes, Rare-pattern identification",
author = "Saeed Piri and Dursun Delen and Tieming Liu and William Paiva",
year = "2018",
month = "3",
day = "15",
doi = "10.1016/j.eswa.2017.09.061",
language = "English",
volume = "94",
pages = "112--125",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Ltd",

}

Development of a new metric to identify rare patterns in association analysis : The case of analyzing diabetes complications. / Piri, Saeed; Delen, Dursun; Liu, Tieming; Paiva, William.

In: Expert Systems with Applications, Vol. 94, 15.03.2018, p. 112-125.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Development of a new metric to identify rare patterns in association analysis

T2 - The case of analyzing diabetes complications

AU - Piri, Saeed

AU - Delen, Dursun

AU - Liu, Tieming

AU - Paiva, William

PY - 2018/3/15

Y1 - 2018/3/15

N2 - Diabetes, one of the most serious and fast growing chronic health conditions, often leads to other serious complications such as neurological, renal, ophthalmic, and heart diseases. Research has shown that more than 85% of diabetic patients develop at least one of these complications. Therefore, studying comorbidities among diabetic patients using association analysis is a worthy research endeavor. Association analysis is a well-known data mining method that aims to reveal the association/affinity patterns/rules among various items (objects or events) that occur together. One of the most critical problems in association analysis is the difficulty with the identification of rare items/patterns. In ordinary association analysis, specifying a large minimum-support leads to not discovering rare rules, while setting a small minimum-support leads to over-generating rules that may not be strong and beneficial. In this study, we propose a new assessment metric, called adjusted_support, to address this problem. Applying this new metric can retrieve rare patterns without over-generating association rules. To test the proposed metric, we extracted data from a large and feature-rich electronic medical records data warehouse and performed association analysis on the resultant data set that included 492,025 unique patients diagnosed with diabetes and related complications. By applying adjusted_support, we discovered interesting associations among diabetes complications such as neurological manifestations with diabetic arthropathy and gastroparesis; renal manifestations with retinopathy; gastroparesis with ketoacidosis and retinopathy; and skin complications with hyperglycemia, peripheral circulatory disorder, heart disease, and neurological manifestations. We also performed association analysis in various demographic groups at more granular levels. Besides association analysis, we also analyzed the comorbidity situation among different demographic groups of diabetics. Finally, we studied and compared the prevalence of diabetes complications in every demographic group of patients.

AB - Diabetes, one of the most serious and fast growing chronic health conditions, often leads to other serious complications such as neurological, renal, ophthalmic, and heart diseases. Research has shown that more than 85% of diabetic patients develop at least one of these complications. Therefore, studying comorbidities among diabetic patients using association analysis is a worthy research endeavor. Association analysis is a well-known data mining method that aims to reveal the association/affinity patterns/rules among various items (objects or events) that occur together. One of the most critical problems in association analysis is the difficulty with the identification of rare items/patterns. In ordinary association analysis, specifying a large minimum-support leads to not discovering rare rules, while setting a small minimum-support leads to over-generating rules that may not be strong and beneficial. In this study, we propose a new assessment metric, called adjusted_support, to address this problem. Applying this new metric can retrieve rare patterns without over-generating association rules. To test the proposed metric, we extracted data from a large and feature-rich electronic medical records data warehouse and performed association analysis on the resultant data set that included 492,025 unique patients diagnosed with diabetes and related complications. By applying adjusted_support, we discovered interesting associations among diabetes complications such as neurological manifestations with diabetic arthropathy and gastroparesis; renal manifestations with retinopathy; gastroparesis with ketoacidosis and retinopathy; and skin complications with hyperglycemia, peripheral circulatory disorder, heart disease, and neurological manifestations. We also performed association analysis in various demographic groups at more granular levels. Besides association analysis, we also analyzed the comorbidity situation among different demographic groups of diabetics. Finally, we studied and compared the prevalence of diabetes complications in every demographic group of patients.

KW - Adjusted_support

KW - Association rule mining

KW - Comorbidity

KW - Data mining

KW - Diabetes

KW - Rare-pattern identification

UR - http://www.scopus.com/inward/record.url?scp=85032491288&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2017.09.061

DO - 10.1016/j.eswa.2017.09.061

M3 - Article

AN - SCOPUS:85032491288

VL - 94

SP - 112

EP - 125

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

ER -