Introduction

Cross-item level metadata contains descriptions and expectations about the joint use of two or more data elements for data quality assessments. A distinct table is necessary as there is a 1:n relationship of potential checks to any single data element. This means that several checks are possible for each data element.


Cross-item level metadata for data quality reporting


VARIABLE_LIST

Defines the list of variables to be assessed for a defined check. The list of variables must be a string in which each variable is separated by a pipe character (|).


CHECK_LABEL

Specifies a sentence that explains in clear language what is done to improve the readability of the report. For example, the label for a check may be as follows.

VARIABLE_LIST CHECK_LABEL
12 v00004 | v00005 Blood pressure checks


CONTRADICTION_TERM

Sets the term for the contradiction checks. The input must be readable logic in REDCap format. See the contradictions dataquieR function for an explanation on the definition of contradictions.

For instance, we may define contradiction checks for the age, sex and smoking variables as below.

CHECK_LABEL CONTRADICTION_TERM
1 Age follow-up [AGE_1] < [AGE_0]
2 Sex follow-up [SEX_1] <> [SEX_0]
8 Smokers inconsistency ([SMOKING_0] = “yes”) and ([SMOKE_SHOP_0] = “never”)


CONTRADICTION_TYPE

Establishes whether the contradiction is logical or empirical.

For example, for the contradictions defined above, we may specify the following contradiction types.

CHECK_LABEL CONTRADICTION_TERM CONTRADICTION_TYPE
1 Age follow-up [AGE_1] < [AGE_0] LOGICAL
2 Sex follow-up [SEX_1] <> [SEX_0] LOGICAL
8 Smokers inconsistency ([SMOKING_0] = “yes”) and ([SMOKE_SHOP_0] = “never”) EMPIRICAL


MULTIVARIATE_OUTLIER_CHECKTYPE

Sets the type of check for multivariate assessments of outliers.

VARIABLE_LIST CHECK_LABEL MULTIVARIATE_OUTLIER_CHECKTYPE
12 v00004 | v00005 Blood pressure checks Hubert


N_RULES

Specifies the number of rules that must be violated for an observation to be flagged as an outlier. It applies to all potential assessment rules for multivariate outliers.

CHECK_LABEL MULTIVARIATE_OUTLIER_CHECKTYPE N_RULES
12 Blood pressure checks Hubert 1


ASSOCIATION_RANGE

Specifies the allowable range of an association. The inclusion of the endpoints follows standard mathematical notation using round brackets for open intervals and square brackets for closed intervals. Values must be separated by a semicolon.

The metadata excerpt below shows an example of a possible interval for an association.

CHECK_LABEL ASSOCIATION_RANGE
12 Blood pressure checks (0.7;)


ASSOCIATION_METRIC

The metric underlying the association in ASSOCIATION_RANGE. The input is a string that specifies the analysis algorithm to be used.

For instance, in the example below, Pearson association is specified.

CHECK_LABEL ASSOCIATION_RANGE ASSOCIATION_METRIC
12 Blood pressure checks (0.7;) Pearson


ASSOCIATION_DIRECTION

The allowable direction of an association. The input is a string that can be either “positive” or “negative”.

In the following example, a positive association is expected for the blood pressure variables.

CHECK_LABEL ASSOCIATION_METRIC ASSOCIATION_DIRECTION
12 Blood pressure checks Pearson positive


ASSOCIATION_FORM

The allowable form of association. The string specifies the form based on a selected list.

In the metadata excerpt below, a linear association form is expected for the blood pressure accuracy tests.

CHECK_LABEL ASSOCIATION_METRIC ASSOCIATION_FORM
12 Blood pressure checks Pearson linear


REL_VAL

Specifies the type of reliability or validity analysis. The string specifies the analysis algorithm to be used, and can be either “inter-class” or “intra-class”.

In the following example, an inter-class reliability is defined for the blood pressure variables.

CHECK_LABEL REL_VAL
12 Blood pressure checks inter_class


GOLDSTANDARD

Defines the measurement variable to be used as a known gold standard. Only one variable can be defined as the gold standard.


DATA_PREPARATION

Defines the pre-processing steps that can be applied before checking the contradiction rules. The following possible options can be specified:

  • LABEL: the value levels will be replaced by the value labels;

  • MISSING_NA: missing codes will be replaced by NAs;

  • MISSING_LABEL: missing codes will be replaced by their labels;

  • MISSING_INTERPRET: missing codes will be replaced by the corresponding AAPOR codes;

  • LIMITS: hard limits violations will be replaced by NAs.

More than one option can be specified using the pipe symbol \(|\), e.g. LABEL | LIMITS However, the three MISSING_ options are mutually exclusive.

If the column is not present in the metadata or it is empty, the default is to use the three options: LABEL (if VALUE_LABELS are specified in the item level metadata), MISSING_NA, and LIMITS.

The user can disable all options writing only the pipe symbol | in this column.