Loading Tree…

Definition

The observed structure of a data set differs from the expected structure.

Explanation

There may be expectations about the technical structure of a data set such as the number of data records (e.g. cases, observational units, the rows in a data set), the number of data elements (e.g. variables, the columns in a data set). Deviations from expected data set structures are targeted by the indicators within this domain.

Example

For some data quality assessment, a study data set with 47 study variables is expected according to the provided metadata file. Yet, the study data set only contains 40 out of the 47 specified variables. This discrepancy is targeted by the indicator “Unexpected data elements” within the dimension “Structural data set error”.

Guidance

Issues within this domain may lead to incomplete or even erroneous data quality reports because the number of addressed data elements (e.g. variables) or data records (e.g. observational units) is not correct.

A wrong number of data records may impair the validity all subsequently computed data quality indicators, while a mismatch of data elements selectively impedes the conduct of checks that are specifically related to some data element. Any issue must be corrected prior to further analyses.

Note: “Structural data set errors” only target deviations from an expected data set structure, while indicators within the data quality dimension completeness analyze missing codes within data-fields, while assuming a correct data set structure.

Literature

  • Lee K, Weiskopf N, Pathak J. A framework for data quality assessment in clinical research datasets. AMIA Annu Symp Proc 2017;2017:1080-9.

  • Kahn MG, Callahan TJ, Barnard J, et al. A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data. EGEMS (Wash DC). 2016;4(1):1244.

  • Nonnemacher M, Nasseh D, Stausberg J. Datenqualität in der medizinischen Forschung: Leitlinie zum Adaptiven Datenmanagement in Kohortenstudien und Registern. Berlin: TMF e.V..; 2014.

  • Stausberg J, Bauer U, Nasseh D, et al. Indicators of data quality: review and requirements from the perspective of networked medical research MIBE 2019;15(1):1-8.