Loading Tree…
Information in a data collection that is missing due to a specified reason.
This indicator provides an overview on the frequency of coded categories for missing data, such as missing by design, refusal, technical error, met exclusion criteria etc. Depending on the coding of missing values, implementations may provide insights into why data is missing.
In total 1000 subjects participated in a health study consisting of several examinations. One of them is a magnetic resonance imaging (MRI) substudy. Nonparticipation in the MRI substudy is coded as follows:
Code | Missing type | n | Percent |
---|---|---|---|
1 | examination conducted | 640 | 64% |
11 | refusal | 200 | 20% |
12 | met exclusion criterion | 100 | 10% |
13 | no show | 20 | 2% |
14 | not examined due to technical reason | 10 | 1% |
15 | no examination data possible | 30 | 3% |
Percentages for categories related to missing due to a specified reason are provided for the codes in the fourth column.
Note that no eligibility issues are taken into account. Computing DQ2003 refusal rate would provide a different result as ineligible individuals do not enter the denominator. Therefore the refusal rate would be 22% while the corresponding percentage with this indicator is 20%
Obtaining an overview on the count and percentage of missing observations provides important insights on the potential reasons underlying missing values. Context knowledge about the missing categories is of particular relevance to make inferences about potential missing mechanisms (Schafer & Graham 2002).
The higher the number or percentage of occurrences the lower the data quality.
Stausberg, J., D. Nasseh and M. Nonnemacher (2015). “Measuring data quality: A review of the literature between 2005 and 2013.” Stud Health Technol Inform 210: 712-716.
Schafer JL, Graham JW. Missing data: our view of the state of the art. Psychol Methods. 2002;7(2):147-177.