Loading Tree…

DQI-1004

Definition

Data records across different data sets do not match as expected.

Explanation

Data for data quality assessments is frequently distributed across different files or tables. All information must correctly be assigned to the observational units in the study as a necessary precondition for subsequent data quality analyses.

Example

A report on data quality of a blood pressure examination is based on two files: “T_INTRO” and “T_BLOODPRE” stored in some location, e.g. “path\”. Both files contain, as expected 2200 observational units and have passed the check on “Unexpected set of observational units”. Yet, when merging both files there is no match. A closer examination of both files reveals that different keys (ID-variables) have been used. Therefore, a recoding takes place to secure the same key. Subsequently, the merge of both files works as intended, leading to a result file of 2200 subjects.

Guidance

A data record mismatch is a severe data quality problem because it entails a wrong assignment of information to observational units. It likely leads to erroneous estimates of other data quality measures.

Any deficit should be remedied by appropriate data management processes. Afterwards the data quality reporting processes should be restarted again.

Note that this indicator only resolves unexpected matches between data sets. Within data sets, a wrong assignment of keys will not be detected.

Interpretation

The higher the number or percentage of occurrences the lower the data quality.

Literature

  • Mitchell MN. Data management using stata. A practical handbook. College Station, Texas: Stata press; 2020.