TY - JOUR
T1 - Development of a new metric to identify rare patterns in association analysis
T2 - The case of analyzing diabetes complications
AU - Piri, Saeed
AU - Delen, Dursun
AU - Liu, Tieming
AU - Paiva, William
N1 - Publisher Copyright:
© 2017 Elsevier Ltd
PY - 2018/3/15
Y1 - 2018/3/15
N2 - Diabetes, one of the most serious and fast growing chronic health conditions, often leads to other serious complications such as neurological, renal, ophthalmic, and heart diseases. Research has shown that more than 85% of diabetic patients develop at least one of these complications. Therefore, studying comorbidities among diabetic patients using association analysis is a worthy research endeavor. Association analysis is a well-known data mining method that aims to reveal the association/affinity patterns/rules among various items (objects or events) that occur together. One of the most critical problems in association analysis is the difficulty with the identification of rare items/patterns. In ordinary association analysis, specifying a large minimum-support leads to not discovering rare rules, while setting a small minimum-support leads to over-generating rules that may not be strong and beneficial. In this study, we propose a new assessment metric, called adjusted_support, to address this problem. Applying this new metric can retrieve rare patterns without over-generating association rules. To test the proposed metric, we extracted data from a large and feature-rich electronic medical records data warehouse and performed association analysis on the resultant data set that included 492,025 unique patients diagnosed with diabetes and related complications. By applying adjusted_support, we discovered interesting associations among diabetes complications such as neurological manifestations with diabetic arthropathy and gastroparesis; renal manifestations with retinopathy; gastroparesis with ketoacidosis and retinopathy; and skin complications with hyperglycemia, peripheral circulatory disorder, heart disease, and neurological manifestations. We also performed association analysis in various demographic groups at more granular levels. Besides association analysis, we also analyzed the comorbidity situation among different demographic groups of diabetics. Finally, we studied and compared the prevalence of diabetes complications in every demographic group of patients.
AB - Diabetes, one of the most serious and fast growing chronic health conditions, often leads to other serious complications such as neurological, renal, ophthalmic, and heart diseases. Research has shown that more than 85% of diabetic patients develop at least one of these complications. Therefore, studying comorbidities among diabetic patients using association analysis is a worthy research endeavor. Association analysis is a well-known data mining method that aims to reveal the association/affinity patterns/rules among various items (objects or events) that occur together. One of the most critical problems in association analysis is the difficulty with the identification of rare items/patterns. In ordinary association analysis, specifying a large minimum-support leads to not discovering rare rules, while setting a small minimum-support leads to over-generating rules that may not be strong and beneficial. In this study, we propose a new assessment metric, called adjusted_support, to address this problem. Applying this new metric can retrieve rare patterns without over-generating association rules. To test the proposed metric, we extracted data from a large and feature-rich electronic medical records data warehouse and performed association analysis on the resultant data set that included 492,025 unique patients diagnosed with diabetes and related complications. By applying adjusted_support, we discovered interesting associations among diabetes complications such as neurological manifestations with diabetic arthropathy and gastroparesis; renal manifestations with retinopathy; gastroparesis with ketoacidosis and retinopathy; and skin complications with hyperglycemia, peripheral circulatory disorder, heart disease, and neurological manifestations. We also performed association analysis in various demographic groups at more granular levels. Besides association analysis, we also analyzed the comorbidity situation among different demographic groups of diabetics. Finally, we studied and compared the prevalence of diabetes complications in every demographic group of patients.
KW - Adjusted_support
KW - Association rule mining
KW - Comorbidity
KW - Data mining
KW - Diabetes
KW - Rare-pattern identification
UR - http://www.scopus.com/inward/record.url?scp=85032491288&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2017.09.061
DO - 10.1016/j.eswa.2017.09.061
M3 - Article
AN - SCOPUS:85032491288
SN - 0957-4174
VL - 94
SP - 112
EP - 125
JO - Expert Systems with Applications
JF - Expert Systems with Applications
ER -