Loading Tree…

DQI-3003

Definition

Observed categorical data values are not admissible according to the allowed categories.

Explanation

Several measurements are collected on categorical scales, such as educational level or responses to a numerical rating scales. It is a priori clear which values are permissible. Deviations from predefined categories are not admissible and should be corrected or flagged.

Example

A response to a smoking item “Do you smoke currently?” in an adult sample using a postal questionnaire has the following two response options:

  • 0=no

  • 1=yes

However out of 766 observations, in 6 cases the number “2” appears. Based on the available questionnaires a check is conducted to identify data entry errors and reveals that some values have been transferred incorrectly by entering a “2” instead of a “1”.

Guidance

Any violation of an admissibility rule triggers a data cleaning process with the intention to replace the wrong value by the correct one. If this is not possible affected observations should at least be flagged to be adequately handled during analyses.

Any erroneous value within admissible categories may not be identified by this indicator.

Interpretation

The higher the number or percentage of inadmissible categorical variables, the lower the data quality.

Implementations

Literature

  • Nonnemacher M, Nasseh D, Stausberg J. Datenqualität in der medizinischen Forschung: Leitlinie zum Adaptiven Datenmanagement in Kohortenstudien und Registern. Berlin: TMF e.V..; 2014.

  • Stausberg J, Bauer U, Nasseh D, et al. Indicators of data quality: review and requirements from the perspective of networked medical research MIBE 2019;15(1):1-8.