A Framework to Detect Disguised Missing Data

Belen, Rahime; TAŞKAYA TEMİZEL, TUĞBA

doi:10.4018/978-1-60960-067-9.ch001

A Framework to Detect Disguised Missing Data

KNOWLEDGE DISCOVERY PRACTICES AND EMERGING APPLICATIONS OF DATA MINING: TRENDS AND NEW DOMAINS, ss.1-22, 2011 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası:
Basım Tarihi: 2011
Doi Numarası: 10.4018/978-1-60960-067-9.ch001
Dergi Adı: KNOWLEDGE DISCOVERY PRACTICES AND EMERGING APPLICATIONS OF DATA MINING: TRENDS AND NEW DOMAINS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED)
Sayfa Sayıları: ss.1-22
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Many manually populated very large databases suffer from data quality problems such as missing, inaccurate data and duplicate entries. A recently recognized data quality problem is that of disguised missing data which arises when an explicit code for missing data such as NA (Not Available) is not provided and a legitimate data value is used instead. Presence of these values may affect the outcome of data mining tasks severely such that association mining algorithms or clustering techniques may result in biased inaccurate association rules and invalid clusters respectively. Detection and elimination of these values are necessary but burdensome to be carried out manually. In this chapter, the methods to detect disguised missing values by visual inspection are explained first. Then, the authors describe the methods used to detect these values automatically. Finally, the framework to detect disguised missing data is proposed and a demonstration of the framework on spatial and categorical data sets is provided.