To assess the consistency of different data elements, we apply the function con_contradictions_redcap(). The rules to identify contradictions must first be defined in the cross-item metadata. An overview is given in the respective tutorial. Each line within the spreadsheet defines one rule. Subsequently, the contradictions assessment may be triggered using the table as the point of reference:

# Load dataquieR
library(dataquieR)

# Load data
sd1 <- prep_get_data_frame("ship")

# Load metadata
file_name <- system.file("extdata", "ship_meta_v2.xlsx", package = "dataquieR")
prep_load_workbook_like_file(file_name)
meta_data_item <- prep_get_data_frame("item_level") # item_level is a sheet in ship_meta_v2.xlsx
meta_data_cross_item <- prep_get_data_frame("cross-item_level") # cross-item_level is another sheet in ship_meta_v2.xlsx

# Apply indicator functions
AnyContradictions <- con_contradictions_redcap(
  study_data = sd1,
  meta_data = meta_data_item,
  label_col = "LABEL",
  meta_data_cross_item = meta_data_cross_item,
  threshold_value = 1
)

A summary table shows the number and percentage of contradictions for each defined rule:

AnyContradictions$SummaryTable
VARIABLE_LIST CHECK_LABEL CONTRADICTION_TERM CONTRADICTION_TYPE MULTIVARIATE_OUTLIER_CHECKTYPE N_RULES ASSOCIATION_RANGE ASSOCIATION_METRIC ASSOCIATION_DIRECTION ASSOCIATION_FORM REL_VAL GOLDSTANDARD CHECK_ID DATA_PREPARATION NUM_con_con PCT_con_con GRADING
DBP_0.1 | SBP_0.1 Systolic blood pressure lower than dyastolic blood pressure, first measurement [sbp1] < [dbp1] LOGICAL NA NA NA NA NA NA NA NA 1 LABEL | MISSING_NA | LIMITS 0 0.00 0
DBP_0.2 | SBP_0.2 Systolic blood pressure lower than dyastolic blood pressure, second measurement [sbp2] < [dbp2] LOGICAL NA NA NA NA NA NA NA NA 2 LABEL | MISSING_NA | LIMITS 0 0.00 0
BODY_HEIGHT_0 | BODY_WEIGHT_0 Body height lower than body weight [BODY_HEIGHT_0] < [BODY_WEIGHT_0] LOGICAL NA NA NA NA NA NA NA NA 3 LABEL | MISSING_NA | LIMITS 0 0.00 0
BODY_HEIGHT_0 | WAIST_CIRC_0 Body height lower than waist circumference [BODY_HEIGHT_0] < [WAIST_CIRC_0] LOGICAL NA NA NA NA NA NA NA NA 4 LABEL | MISSING_NA | LIMITS 0 0.00 0
CONTRACEPTIVA_EVER_0 | SEX_0 Contraception inconsistency [SEX_0] = “males” and [CONTRACEPTIVA_EVER_0] = “yes” LOGICAL NA NA NA NA NA NA NA NA 5 LABEL | MISSING_NA | LIMITS 12 0.56 0
DIAB_AGE_ONSET_0 | DIABETES_KNOWN_0 Diabetes age inconsistency 1 [DIABETES_KNOWN_0] = “yes” and [DIAB_AGE_ONSET_0] = “” EMPIRICAL NA NA NA NA NA NA NA NA 6 LABEL | MISSING_NA | LIMITS 63 2.92 1
DIAB_AGE_ONSET_0 | DIABETES_KNOWN_0 Diabetes age inconsistency 2 [DIAB_AGE_ONSET_0] > 0 and not([DIABETES_KNOWN_0] = “yes”) LOGICAL NA NA NA NA NA NA NA NA 7 LABEL | MISSING_NA | LIMITS 35 1.62 1


In this example, rule seven leads to the identification of 35 contradictions: age of onset for diabetes is provided (DIAB_AGE_ONSET_0), but the variable on the presence of diabetes (DIABETES_KNOWN_0) does not indicate a known disease.

The distributions may also be displayed as a plot:

AnyContradictions$SummaryPlot 

Back to Example data quality assessment of SHIP data