Both Uncertain numerical values
and Inadmissible numerical values, as
well as Uncertain time-date values
and Inadmissible time-date values,
can be calculated using con_limit_deviations
).
When specifying limits = "SOFT_LIMITS"
the check does not
identify inadmissible but uncertain values, according to the specified
ranges. An example call is:
# Load dataquieR
library(dataquieR)
# Load data
sd1 <- prep_get_data_frame("ship")
# Load metadata
file_name <- system.file("extdata", "ship_meta_v2.xlsx", package = "dataquieR")
prep_load_workbook_like_file(file_name)
meta_data_item <- prep_get_data_frame("item_level") # item_level is a sheet in ship_meta_v2.xlsx
# Apply indicator function
MyValueLimits <- con_limit_deviations(
study_data = sd1,
meta_data = meta_data_item,
label_col = "LABEL",
limits = "HARD_LIMITS"
)
A table output provides the number and percentage of all the range violations for the variables specifying limits in the metadata:
MyValueLimits$SummaryData
Variables | Section | Limits | Number | Percentage |
---|---|---|---|---|
DBP_0.2 | below | HARD_LIMITS | 0 | 0.00 |
DBP_0.2 | within | HARD_LIMITS | 2148 | 100.00 |
DBP_0.2 | above | HARD_LIMITS | 0 | 0.00 |
DBP_0.2 | below | DETECTION_LIMITS | 0 | 0.00 |
DBP_0.2 | within | DETECTION_LIMITS | 2148 | 100.00 |
DBP_0.2 | above | DETECTION_LIMITS | 0 | 0.00 |
DBP_0.2 | below | SOFT_LIMITS | 4 | 0.19 |
DBP_0.2 | within | SOFT_LIMITS | 2134 | 99.35 |
DBP_0.2 | above | SOFT_LIMITS | 10 | 0.47 |
BODY_HEIGHT_0 | below | HARD_LIMITS | 0 | 0.00 |
BODY_HEIGHT_0 | within | HARD_LIMITS | 2151 | 100.00 |
BODY_HEIGHT_0 | above | HARD_LIMITS | 0 | 0.00 |
BODY_WEIGHT_0 | below | HARD_LIMITS | 0 | 0.00 |
BODY_WEIGHT_0 | within | HARD_LIMITS | 2150 | 100.00 |
BODY_WEIGHT_0 | above | HARD_LIMITS | 0 | 0.00 |
WAIST_CIRC_0 | below | HARD_LIMITS | 0 | 0.00 |
WAIST_CIRC_0 | within | HARD_LIMITS | 2148 | 100.00 |
WAIST_CIRC_0 | above | HARD_LIMITS | 0 | 0.00 |
EXAM_DT_0 | below | HARD_LIMITS | 0 | 0.00 |
EXAM_DT_0 | within | HARD_LIMITS | 2154 | 100.00 |
EXAM_DT_0 | above | HARD_LIMITS | 0 | 0.00 |
CHOLES_HDL_0 | below | HARD_LIMITS | 0 | 0.00 |
CHOLES_HDL_0 | within | HARD_LIMITS | 2138 | 100.00 |
CHOLES_HDL_0 | above | HARD_LIMITS | 0 | 0.00 |
CHOLES_LDL_0 | below | HARD_LIMITS | 0 | 0.00 |
CHOLES_LDL_0 | within | HARD_LIMITS | 2126 | 100.00 |
CHOLES_LDL_0 | above | HARD_LIMITS | 0 | 0.00 |
CHOLES_ALL_0 | below | HARD_LIMITS | 0 | 0.00 |
CHOLES_ALL_0 | within | HARD_LIMITS | 2139 | 100.00 |
CHOLES_ALL_0 | above | HARD_LIMITS | 0 | 0.00 |
AGE_0 | below | HARD_LIMITS | 1 | 0.05 |
AGE_0 | within | HARD_LIMITS | 2153 | 99.95 |
AGE_0 | above | HARD_LIMITS | 0 | 0.00 |
SBP_0.1 | below | HARD_LIMITS | 0 | 0.00 |
SBP_0.1 | within | HARD_LIMITS | 2131 | 99.02 |
SBP_0.1 | above | HARD_LIMITS | 21 | 0.98 |
SBP_0.1 | below | DETECTION_LIMITS | 0 | 0.00 |
SBP_0.1 | within | DETECTION_LIMITS | 2131 | 100.00 |
SBP_0.1 | above | DETECTION_LIMITS | 0 | 0.00 |
SBP_0.1 | below | SOFT_LIMITS | 4 | 0.19 |
SBP_0.1 | within | SOFT_LIMITS | 2031 | 95.31 |
SBP_0.1 | above | SOFT_LIMITS | 96 | 4.50 |
SBP_0.2 | below | HARD_LIMITS | 0 | 0.00 |
SBP_0.2 | within | HARD_LIMITS | 2134 | 99.35 |
SBP_0.2 | above | HARD_LIMITS | 14 | 0.65 |
SBP_0.2 | below | DETECTION_LIMITS | 0 | 0.00 |
SBP_0.2 | within | DETECTION_LIMITS | 2134 | 100.00 |
SBP_0.2 | above | DETECTION_LIMITS | 0 | 0.00 |
SBP_0.2 | below | SOFT_LIMITS | 4 | 0.19 |
SBP_0.2 | within | SOFT_LIMITS | 2071 | 97.05 |
SBP_0.2 | above | SOFT_LIMITS | 59 | 2.76 |
DBP_0.1 | below | HARD_LIMITS | 0 | 0.00 |
DBP_0.1 | within | HARD_LIMITS | 2150 | 99.91 |
DBP_0.1 | above | HARD_LIMITS | 2 | 0.09 |
DBP_0.1 | below | DETECTION_LIMITS | 0 | 0.00 |
DBP_0.1 | within | DETECTION_LIMITS | 2150 | 100.00 |
DBP_0.1 | above | DETECTION_LIMITS | 0 | 0.00 |
DBP_0.1 | below | SOFT_LIMITS | 2 | 0.09 |
DBP_0.1 | within | SOFT_LIMITS | 2139 | 99.49 |
DBP_0.1 | above | SOFT_LIMITS | 9 | 0.42 |
The last column of the table also provides a GRADING. If the
percentage of violations is above some threshold, a GRADING of 1 is
assigned. In this case, any occurrence is classified as problematic.
Otherwise, the GRADING is 0.
The following statement assigns all variables identified as
problematic to an object whichdeviate
to enable a more
targeted output, for example, to plot the distributions for any variable
with violations along the specified limits:
# select variables with deviations
whichdeviate <- as.character(
MyValueLimits$SummaryTable$Variables)[
MyValueLimits$SummaryTable$FLG_con_rvv_unum == 1 |
MyValueLimits$SummaryTable$FLG_con_rvv_utdat == 1 |
MyValueLimits$SummaryTable$FLG_con_rvv_inum == 1 |
MyValueLimits$SummaryTable$FLG_con_rvv_itdat == 1 ]
whichdeviate <- whichdeviate[!is.na(whichdeviate)]
We can restrict the plots to those where variables have limit
deviations, i.e., those with a GRADING of 1 in the table above, using
MyValueLimits$SummaryPlotList[whichdeviate]
(only the first
two are displayed below to reduce file size):