Loading Tree…

Definition

The degree to which data values are free of breaks in conventions or contradictions.

Explanation

“Consistency” targets convention breaks in the data as well as contradictions. This is generally realized based on a check of data values against expected values or ranges, or a check of different data values against each other, using logical checks.

The key issue is: Do data values comply with formal rules?

Indicators within this subdomain thereby target impossible, inadmissible or uncertain data values or combinations of data values. The conduct of related checks requires extensive knowledge permissible ranges or values.

Example

A blood pressure measurement in a population based study comprises three repeated measurements.

For systolic blood pressure with variables SBP1, SBP2, SBP3 no values below 10mmHg or above 300mmHg are permitted. The permitted range [10;300] is annotated in metadata with reference to these three variables. Implementations of the related inadmissible numerical values indicator count the number and percent of violations of this rule.

Another requirement with regards to these three measurements is the appropriate time order (T1,T2,T3) of each blood pressure measurement. The third measurement may only take place after the second one, and the second one after the first, logically: T1<T2<T3. Any violation of this order should be targeted by indicators within the contradictions domain.

Guidance

When using appropriate electronic case report forms (eCRFs) consistency related issues may largely be avoided by controlling the data entry process. Yet, this is not allways fuly possible and therefore, consistency checks are indispensable in any data quality assessment pipeline.

During ongoing data collections consistency checks trigger data management activities to correct data values classified as inadmissible or impossible. The may also trigger quality management activities to assess whether uncertain values are correct or wrong.

With regards to statistical analyses they safeguard the exclusion of inadmissible values.

Within an data quality pipeline, it is recommendable to exclude detected violations from subsequent accuracy targeted analyses to avoid a multiple classification of single data quality issues in different indicators. A viable option is to recode extraneous data values to missing.

Literature

  • Stausberg, J., D. Nasseh and M. Nonnemacher (2015). “Measuring data quality: A review of the literature between 2005 and 2013.” Stud Health Technol Inform 210: 712-716.

  • Weiskopf, N. G. and C. Weng (2013). “Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research.” J Am Med Inform Assoc 20(1): 144-151.

  • Weiskopf NG, Bakken S, Hripcsak G, Weng C. A Data Quality Assessment Guideline for Electronic Health Record Data Reuse. EGEMS (Wash DC). 2017;5(1):14.

  • Lee K, Weiskopf N, Pathak J. A framework for data quality assessment in clinical research datasets. AMIA Annu Symp Proc 2017;2017:1080-9.

  • Kahn MG, Callahan TJ, Barnard J, et al. A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data. EGEMS (Wash DC). 2016;4(1):1244.

  • https://www.ibm.com/support/knowledgecenter/SSQNUZ_2.5.0/cpd/organize/quality_violations.html#quality_violations__class