To assess the consistency of different data elements, we apply the
function con_contradictions_redcap()
. The rules to identify
contradictions must first be defined in the cross-item metadata. An
overview is given in the respective tutorial.
Each line within the spreadsheet defines one rule. Subsequently, the
contradictions assessment may be triggered using the table as the point
of reference:
# Load dataquieR
library(dataquieR)
# Load data
sd1 <- prep_get_data_frame("ship")
# Load metadata
file_name <- system.file("extdata", "ship_meta_v2.xlsx", package = "dataquieR")
prep_load_workbook_like_file(file_name)
meta_data_item <- prep_get_data_frame("item_level") # item_level is a sheet in ship_meta_v2.xlsx
meta_data_cross_item <- prep_get_data_frame("cross-item_level") # cross-item_level is another sheet in ship_meta_v2.xlsx
# Apply indicator functions
AnyContradictions <- con_contradictions_redcap(
study_data = sd1,
meta_data = meta_data_item,
label_col = "LABEL",
meta_data_cross_item = meta_data_cross_item,
threshold_value = 1
)
A summary table shows the number and percentage of contradictions for each defined rule:
AnyContradictions$SummaryTable
VARIABLE_LIST | CHECK_LABEL | CONTRADICTION_TERM | CONTRADICTION_TYPE | MULTIVARIATE_OUTLIER_CHECKTYPE | N_RULES | ASSOCIATION_RANGE | ASSOCIATION_METRIC | ASSOCIATION_DIRECTION | ASSOCIATION_FORM | REL_VAL | GOLDSTANDARD | CHECK_ID | DATA_PREPARATION | NUM_con_con | PCT_con_con | GRADING |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DBP_0.1 | SBP_0.1 | Systolic blood pressure lower than dyastolic blood pressure, first measurement | [sbp1] < [dbp1] | LOGICAL | NA | NA | NA | NA | NA | NA | NA | NA | 1 | LABEL | MISSING_NA | LIMITS | 0 | 0.00 | 0 |
DBP_0.2 | SBP_0.2 | Systolic blood pressure lower than dyastolic blood pressure, second measurement | [sbp2] < [dbp2] | LOGICAL | NA | NA | NA | NA | NA | NA | NA | NA | 2 | LABEL | MISSING_NA | LIMITS | 0 | 0.00 | 0 |
BODY_HEIGHT_0 | BODY_WEIGHT_0 | Body height lower than body weight | [BODY_HEIGHT_0] < [BODY_WEIGHT_0] | LOGICAL | NA | NA | NA | NA | NA | NA | NA | NA | 3 | LABEL | MISSING_NA | LIMITS | 0 | 0.00 | 0 |
BODY_HEIGHT_0 | WAIST_CIRC_0 | Body height lower than waist circumference | [BODY_HEIGHT_0] < [WAIST_CIRC_0] | LOGICAL | NA | NA | NA | NA | NA | NA | NA | NA | 4 | LABEL | MISSING_NA | LIMITS | 0 | 0.00 | 0 |
CONTRACEPTIVA_EVER_0 | SEX_0 | Contraception inconsistency | [SEX_0] = “males” and [CONTRACEPTIVA_EVER_0] = “yes” | LOGICAL | NA | NA | NA | NA | NA | NA | NA | NA | 5 | LABEL | MISSING_NA | LIMITS | 12 | 0.56 | 0 |
DIAB_AGE_ONSET_0 | DIABETES_KNOWN_0 | Diabetes age inconsistency 1 | [DIABETES_KNOWN_0] = “yes” and [DIAB_AGE_ONSET_0] = “” | EMPIRICAL | NA | NA | NA | NA | NA | NA | NA | NA | 6 | LABEL | MISSING_NA | LIMITS | 63 | 2.92 | 1 |
DIAB_AGE_ONSET_0 | DIABETES_KNOWN_0 | Diabetes age inconsistency 2 | [DIAB_AGE_ONSET_0] > 0 and not([DIABETES_KNOWN_0] = “yes”) | LOGICAL | NA | NA | NA | NA | NA | NA | NA | NA | 7 | LABEL | MISSING_NA | LIMITS | 35 | 1.62 | 1 |
In this example, rule seven leads to the identification of 35 contradictions: age of onset for diabetes is provided (DIAB_AGE_ONSET_0), but the variable on the presence of diabetes (DIABETES_KNOWN_0) does not indicate a known disease.
The distributions may also be displayed as a plot:
AnyContradictions$SummaryPlot