Loading Tree…

Definition

The technical representation of data values within a data set does not conform to the expected representation.

Explanation

Value format integrity targets the formatting of data values against a defined reference standard. Study and metadata may be targeted alike. The reference for study data is the corresponding metadata. In case of metadata the reference standard may be provided by other metadata sets.

Example

A blood pressure examination needs to be evaluated. For this purpose a study data set is provided which, amongst others, contains variables on:

  • measurements (systolic and diastolic blood pressure)

  • process variables (examination time and date, examiner number, device number)

Furthermore a metadata file is provided which provides information about the study data variables such as:

  • the name of all variables

  • the datatype of each variable

Checks using indicators from this domain match the observed variables and their expected format against the reference standard as provided by a metadata file. Based on these checks, the following deficiencies are detected:

  • the study variables on one blood pressure measurement is available in a string instead of a numeric format

  • the examination date is provided as a string instead of a date-time format (e.g.: September5th, 2010)

Base on this feedback, an update of the data set is requested. After an update of the data base, the data quality check is repeated.

Guidance

Issues within this domain may corrupt the correct assessment of the affected data elements or data records while they do not impair the assessment of data structures with correct formatting. Most issues thus have a local impact.

In case of findings within this domain, issues should be remedied to ensure an appropriate assessment of subsequent indicators for the affected data structures.

Literature

  • Lee K, Weiskopf N, Pathak J. A framework for data quality assessment in clinical research datasets. AMIA Annu Symp Proc 2017;2017:1080-9.

  • Kahn MG, Callahan TJ, Barnard J, et al. A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data. EGEMS (Wash DC). 2016;4(1):1244.

  • Nonnemacher M, Nasseh D, Stausberg J. Datenqualität in der medizinischen Forschung: Leitlinie zum Adaptiven Datenmanagement in Kohortenstudien und Registern. Berlin: TMF e.V..; 2014.

  • Stausberg J, Bauer U, Nasseh D, et al. Indicators of data quality: review and requirements from the perspective of networked medical research MIBE 2019;15(1):1-8.

  • Weiskopf NG, Bakken S, Hripcsak G, Weng C. A Data Quality Assessment Guideline for Electronic Health Record Data Reuse. EGEMS (Wash DC). 2017;5(1):14.

  • https://www.ibm.com/support/knowledgecenter/SSQNUZ_2.5.0/cpd/organize/quality_violations.html#quality_violations__class