Both Uncertain numerical values and Inadmissible numerical values, as well as Uncertain time-date values and Inadmissible time-date values, can be calculated using con_limit_deviations). When specifying limits = "SOFT_LIMITS" the check does not identify inadmissible but uncertain values, according to the specified ranges. An example call is:

# Load dataquieR
library(dataquieR)

# Load data
sd1 <- prep_get_data_frame("ship")

# Load metadata
prep_load_workbook_like_file("ship_meta_v2")
meta_data_item <- prep_get_data_frame("item_level") # item_level is a sheet in ship_meta_v2.xlsx

# Apply indicator function
MyValueLimits <- con_limit_deviations(
  study_data = sd1,
  meta_data  = meta_data_item,
  label_col  = "LABEL",
  limits     = "HARD_LIMITS"
)

A table output provides the number and percentage of all the range violations for the variables specifying limits in the metadata:

MyValueLimits$SummaryData

	Variables	Limits	Below.limits-N (%)	Within.limits-N (%)	Above.limits-N (%)
1	DBP_0.2	HARD_LIMITS	0 (0)	2148 (100)	0 (0)
4	DBP_0.2	DETECTION_LIMITS	0 (0)	2148 (100)	0 (0)
7	DBP_0.2	SOFT_LIMITS	4 (0.19)	2134 (99.35)	10 (0.47)
10	BODY_HEIGHT_0	HARD_LIMITS	0 (0)	2151 (100)	0 (0)
13	BODY_WEIGHT_0	HARD_LIMITS	0 (0)	2150 (100)	0 (0)
16	WAIST_CIRC_0	HARD_LIMITS	0 (0)	2148 (100)	0 (0)
19	EXAM_DT_0	HARD_LIMITS	0 (0)	2154 (100)	0 (0)
22	CHOLES_HDL_0	HARD_LIMITS	0 (0)	2138 (100)	0 (0)
25	CHOLES_LDL_0	HARD_LIMITS	0 (0)	2126 (100)	0 (0)
28	CHOLES_ALL_0	HARD_LIMITS	0 (0)	2139 (100)	0 (0)
31	AGE_0	HARD_LIMITS	1 (0.05)	2153 (99.95)	0 (0)
34	SBP_0.1	HARD_LIMITS	0 (0)	2131 (99.02)	21 (0.98)
37	SBP_0.1	DETECTION_LIMITS	0 (0)	2131 (100)	0 (0)
40	SBP_0.1	SOFT_LIMITS	4 (0.19)	2031 (95.31)	96 (4.5)
43	SBP_0.2	HARD_LIMITS	0 (0)	2134 (99.35)	14 (0.65)
46	SBP_0.2	DETECTION_LIMITS	0 (0)	2134 (100)	0 (0)
49	SBP_0.2	SOFT_LIMITS	4 (0.19)	2071 (97.05)	59 (2.76)
52	DBP_0.1	HARD_LIMITS	0 (0)	2150 (99.91)	2 (0.09)
55	DBP_0.1	DETECTION_LIMITS	0 (0)	2150 (100)	0 (0)
58	DBP_0.1	SOFT_LIMITS	2 (0.09)	2139 (99.49)	9 (0.42)

The last column of the table also provides a GRADING. If the percentage of violations is above some threshold, a GRADING of 1 is assigned. In this case, any occurrence is classified as problematic. Otherwise, the GRADING is 0.

The following statement assigns all variables identified as problematic to an object whichdeviate to enable a more targeted output, for example, to plot the distributions for any variable with violations along the specified limits:

# select variables with deviations
whichdeviate <- as.character(
  MyValueLimits$SummaryTable$Variables)[
    MyValueLimits$SummaryTable$FLG_con_rvv_unum == 1 |
    MyValueLimits$SummaryTable$FLG_con_rvv_utdat == 1 |
    MyValueLimits$SummaryTable$FLG_con_rvv_inum == 1 |
    MyValueLimits$SummaryTable$FLG_con_rvv_itdat == 1                    ]
whichdeviate <- whichdeviate[!is.na(whichdeviate)]

We can restrict the plots to those where variables have limit deviations, i.e., those with a GRADING of 1 in the table above, using MyValueLimits$SummaryPlotList[whichdeviate] (only the first two are displayed below to reduce file size):

Back to Example data quality assessment of SHIP data

Uncertain or Inadmissible numerical or time-date values

Back to Example data quality assessment of SHIP data