This tutorial covers how to analyse selected data quality indicators using data from the Study of Health in Pomerania (SHIP START-0, 1997-2001). We follow the data quality framework from Schmidt et al., 2021,, which organizes the data quality indicators as follows:

Integrity
- Structural data set error
- Value format error
  - Data type mismatch
  - Uncertain missingness status
Completeness
- Crude Missingness
  - Missing values
- Qualified missingness
  - Missing rates (Non-response, Refusal and Drop-Out)
  - Missing due to specified reason
Consistency
- Range and value violations
  - Inadmissible or Uncertain numerical or time-date values
  - Inadmissible categorical values
- Contradictions
  - Logical or empirical contradictions
Accuracy
- Unexpected distributions

The example data and metadata are available here. See the introductory tutorial for instructions on importing these files into R, as well as details on their structure and contents. For more details on the framework see the concept description.

Back to Overview

Schmidt, C.O., Struckmann, S., Enzenbach, C., Reineke, A., Stausberg, J., Damerow, S., Huebner, M., Schmidt, B., Sauerbrei, W., and Richter, A. (2021). Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in r. BMC Medical Research Methodology 21, 1–15.

Example data quality assessment of SHIP data

Back to Overview