Both Uncertain numerical values and Inadmissible numerical values, as well as Uncertain time-date values and Inadmissible time-date values, can be calculated using con_limit_deviations). When specifying limits = "SOFT_LIMITS" the check does not identify inadmissible but uncertain values, according to the specified ranges. An example call is:

# Load dataquieR
library(dataquieR)

# Load data
sd1 <- prep_get_data_frame("ship")

# Load metadata
file_name <- system.file("extdata", "ship_meta_v2.xlsx", package = "dataquieR")
prep_load_workbook_like_file(file_name)
meta_data_item <- prep_get_data_frame("item_level") # item_level is a sheet in ship_meta_v2.xlsx

# Apply indicator function
MyValueLimits <- con_limit_deviations(
  study_data = sd1,
  meta_data  = meta_data_item,
  label_col  = "LABEL",
  limits     = "HARD_LIMITS"
)

A table output provides the number and percentage of all the range violations for the variables specifying limits in the metadata:

MyValueLimits$SummaryData
Variables Section Limits Number Percentage
DBP_0.2 below HARD_LIMITS 0 0.00
DBP_0.2 within HARD_LIMITS 2148 100.00
DBP_0.2 above HARD_LIMITS 0 0.00
DBP_0.2 below DETECTION_LIMITS 0 0.00
DBP_0.2 within DETECTION_LIMITS 2148 100.00
DBP_0.2 above DETECTION_LIMITS 0 0.00
DBP_0.2 below SOFT_LIMITS 4 0.19
DBP_0.2 within SOFT_LIMITS 2134 99.35
DBP_0.2 above SOFT_LIMITS 10 0.47
BODY_HEIGHT_0 below HARD_LIMITS 0 0.00
BODY_HEIGHT_0 within HARD_LIMITS 2151 100.00
BODY_HEIGHT_0 above HARD_LIMITS 0 0.00
BODY_WEIGHT_0 below HARD_LIMITS 0 0.00
BODY_WEIGHT_0 within HARD_LIMITS 2150 100.00
BODY_WEIGHT_0 above HARD_LIMITS 0 0.00
WAIST_CIRC_0 below HARD_LIMITS 0 0.00
WAIST_CIRC_0 within HARD_LIMITS 2148 100.00
WAIST_CIRC_0 above HARD_LIMITS 0 0.00
EXAM_DT_0 below HARD_LIMITS 0 0.00
EXAM_DT_0 within HARD_LIMITS 2154 100.00
EXAM_DT_0 above HARD_LIMITS 0 0.00
CHOLES_HDL_0 below HARD_LIMITS 0 0.00
CHOLES_HDL_0 within HARD_LIMITS 2138 100.00
CHOLES_HDL_0 above HARD_LIMITS 0 0.00
CHOLES_LDL_0 below HARD_LIMITS 0 0.00
CHOLES_LDL_0 within HARD_LIMITS 2126 100.00
CHOLES_LDL_0 above HARD_LIMITS 0 0.00
CHOLES_ALL_0 below HARD_LIMITS 0 0.00
CHOLES_ALL_0 within HARD_LIMITS 2139 100.00
CHOLES_ALL_0 above HARD_LIMITS 0 0.00
AGE_0 below HARD_LIMITS 1 0.05
AGE_0 within HARD_LIMITS 2153 99.95
AGE_0 above HARD_LIMITS 0 0.00
SBP_0.1 below HARD_LIMITS 0 0.00
SBP_0.1 within HARD_LIMITS 2131 99.02
SBP_0.1 above HARD_LIMITS 21 0.98
SBP_0.1 below DETECTION_LIMITS 0 0.00
SBP_0.1 within DETECTION_LIMITS 2131 100.00
SBP_0.1 above DETECTION_LIMITS 0 0.00
SBP_0.1 below SOFT_LIMITS 4 0.19
SBP_0.1 within SOFT_LIMITS 2031 95.31
SBP_0.1 above SOFT_LIMITS 96 4.50
SBP_0.2 below HARD_LIMITS 0 0.00
SBP_0.2 within HARD_LIMITS 2134 99.35
SBP_0.2 above HARD_LIMITS 14 0.65
SBP_0.2 below DETECTION_LIMITS 0 0.00
SBP_0.2 within DETECTION_LIMITS 2134 100.00
SBP_0.2 above DETECTION_LIMITS 0 0.00
SBP_0.2 below SOFT_LIMITS 4 0.19
SBP_0.2 within SOFT_LIMITS 2071 97.05
SBP_0.2 above SOFT_LIMITS 59 2.76
DBP_0.1 below HARD_LIMITS 0 0.00
DBP_0.1 within HARD_LIMITS 2150 99.91
DBP_0.1 above HARD_LIMITS 2 0.09
DBP_0.1 below DETECTION_LIMITS 0 0.00
DBP_0.1 within DETECTION_LIMITS 2150 100.00
DBP_0.1 above DETECTION_LIMITS 0 0.00
DBP_0.1 below SOFT_LIMITS 2 0.09
DBP_0.1 within SOFT_LIMITS 2139 99.49
DBP_0.1 above SOFT_LIMITS 9 0.42


The last column of the table also provides a GRADING. If the percentage of violations is above some threshold, a GRADING of 1 is assigned. In this case, any occurrence is classified as problematic. Otherwise, the GRADING is 0.

The following statement assigns all variables identified as problematic to an object whichdeviate to enable a more targeted output, for example, to plot the distributions for any variable with violations along the specified limits:

# select variables with deviations
whichdeviate <- as.character(
  MyValueLimits$SummaryTable$Variables)[
    MyValueLimits$SummaryTable$FLG_con_rvv_unum == 1 |
    MyValueLimits$SummaryTable$FLG_con_rvv_utdat == 1 |
    MyValueLimits$SummaryTable$FLG_con_rvv_inum == 1 |
    MyValueLimits$SummaryTable$FLG_con_rvv_itdat == 1                    ]
whichdeviate <- whichdeviate[!is.na(whichdeviate)]

We can restrict the plots to those where variables have limit deviations, i.e., those with a GRADING of 1 in the table above, using MyValueLimits$SummaryPlotList[whichdeviate] (only the first two are displayed below to reduce file size):

Back to Example data quality assessment of SHIP data