Loading Tree…

DQI-2001

Definition

Data fields without a measurement value.

Explanation

Missing values provides information on all missing data values in the targeted structure of a data set. It does not take into account any eligibility issues. All missing codes are treated alike.

Example

Study data from a survey is provided as an R data frame with 10000 observations. For a variable on “Have you been treated in a hospital in the past year?” the following response categories emerge:

Code Meaning percent
0 no 60%
1 yes 10%
999 no reply 5%

In addition, there are NA’s for 25% of the subjects.

For the latter it is not entirely clear why these values are NA. Measurements may be missing but also the coding 999 may be incomplete. Because of this uncertainty, only metrics on missing data within the domain crude missingness are computed to acknowledge this uncertainty. All data fields without valid measurements (0,1) are assumed to be missing. In this case this would mean a percentage of 30% missing data values for the assessed item.

Crude missingness results are frequently the only option to assess missingness if there is no proper coding of missing data and results may be misleading.

In this example it may prove that only participants were asked for hospitalizations who have affirmed any medical therapy in the past year. The question was omitted for all 25% who responded “no” to any medical treatment. In this case the NA would not have the meaning of missing data but would rather imply that for those 25% the correct answer is 0=no, resulting in only 5% true missingness.

To obtain valid results, all NAs need to be recoded either to 0 or to another value code that indicates a designed jump. With 100% of the values properly coded, results from the domain qualified missingness may be calculated.

Guidance

Missing values is to be applied if there is no complete and fully interpretable coding of missing values.

A related indicator from the integrity dimension is uncertain missingness status (DQI-1008). It targets any occurrence of no or uninformative missing value codes (e.g. NA/./Null…). DQI-1008 only counts occurrences of no or uninformative missing value codes, while missing values counts any missing value code Therefore DQI-2001 will always lead to a higher or equal proportion of missing values compared to DQI-1008.

Interpretation

The higher the percentage of missing values the potentially lower the data quality.

Because there is uncertainty about the proper interpretation of missing values, the loss in data quality is only described as potential.

Descriptors

Implementations

Literature

  • Richter A, Schössow J, Werner A, et al. Data quality monitoring in clinical and observational epidemiologic studies: the role of metadata and process information. MIBE 2019;15(1).

  • https://www.hl7.org/fhir/v3/NullFlavor/cs.html