The acc_end_digits
function focuses on the response
variables’ last decimal or end digit. Examining end digits may be
relevant when there is manual data transfer or editing because a
preference for rounding could occur.
The implementation of the acc_end_digits
function is
similar to the acc_shape_or_scale
function, adapted from the idea of rootograms
(Tukey 1977,
Kleiber and Zeileis 2016). However, the
emphasis is on the last decimals of the measurement variables rather
than their overall distribution. In this way, the
acc_end_digits
function is an implementation of the Unexpected shape indicator and a
descriptor for Unexpected
proportions, which belong to the Unexpected distributions domain in the
Accuracy dimension.
For more details, see the user’s manual and source code.
acc_end_digits(
resp_vars = NULL,
label_col = LABEL,
study_data = sd1,
meta_data = md1
)
The function has the following arguments:
There is no implementation of thresholds.
To illustrate the output, we use the example synthetic data and metadata that are bundled with the dataquieR package. See the introductory tutorial for instructions on importing these files into R, as well as details on their structure and contents.
For the acc_end_digits
function, the metadata columns
DATA_TYPE
, MISSING_LIST
and the number of
DECIMALS
are relevant:
VAR_NAMES | LABEL | MISSING_LIST | DATA_TYPE | DECIMALS | |
---|---|---|---|---|---|
9 | v00004 | SBP_0 | 99980 | 99981 | 99982 | 99983 | 99984 | 99985 | 99986 | 99987 | 99988 | 99989 | 99990 | 99991 | 99992 | 99993 | 99994 | 99995 | float | 0 |
10 | v00005 | DBP_0 | 99980 | 99981 | 99982 | 99983 | 99984 | 99985 | 99986 | 99987 | 99988 | 99989 | 99990 | 99991 | 99992 | 99993 | 99994 | 99995 | float | 0 |
11 | v00006 | GLOBAL_HEALTH_VAS_0 | 99980 | 99983 | 99987 | 99988 | 99989 | 99990 | 99991 | 99992 | 99993 | 99994 | 99995 | float | 1 |
14 | v00009 | ARM_CIRC_0 | 99980 | 99981 | 99982 | 99983 | 99984 | 99985 | 99986 | 99987 | 99988 | 99989 | 99990 | 99991 | 99992 | 99993 | 99994 | 99995 | float | 0 |
21 | v00014 | CRP_0 | 99980 | 99981 | 99982 | 99983 | 99984 | 99985 | 99986 | 99988 | 99989 | 99990 | 99991 | 99992 | 99994 | 99995 | float | 3 |
22 | v00015 | BSG_0 | 99980 | 99981 | 99982 | 99983 | 99984 | 99985 | 99986 | 99988 | 99989 | 99990 | 99991 | 99992 | 99994 | 99995 | float | 0 |
This example specifies the analysis of end digits for the variable
CRP_0
(C-reactive protein):
end_digits <- acc_end_digits(
resp_vars = "CRP_0",
label_col = LABEL,
study_data = sd1,
meta_data = md1
)
The output is a list containing SummaryTable and SummaryPlot. The
SummaryTable is a table containing the response variable and indicating
whether the uniform distribution of end digits is met
(GRADING = 0
) or a deviation was found
(GRADING = 1
). This table is necessary for the generic
function dataquieR::dq_report()
to summarize all
information for the examined variables.
Run end_digits$SummaryData
to see the output:
The second output, SummaryPlot, is a bar chart that indicates
significant deviations from the uniform distribution. Call it with
end_digits$SummaryPlot
:
Any deviation from the distribution specified in the metadata is indicated in red.
resp_vars
(if these are
defined in the metadata).acc_shape_or_scale
.resp_vars
.Deviations from a uniform number of end digits will only be informative if the response variable has a symmetric distribution. If the underlying measurement has a skewed distribution, the end digits will not follow a uniform distribution.