The acc_varcomp
function examines the impact of
so-called process variables on the measurement variables
through variance based models and intraclass correlations (ICC). This
implementation is model-based. The function can be applied on variables
of type float.
Note: The term ICC is more frequently used to describe the agreement between different observers, examiners or even devices. In respective settings, a good agreement is pursued. ICC-values can vary between \([-1; \: 1]\) and an ICC close to \(1\) is desired (Koo and Li 2016, Müller and Büttner 1994).
In multi-level analysis the ICC is interpreted differently. Please
see Snijders et
al. (Sniders and Bosker 1999). In this
context, the proportion of variance explained by respective group levels
indicates an influence of (at least one) level of the respective
group_vars
.
Irrespective of the used terminology, regarding data quality it is desired that process variables do not explain systematically components of variance. Therefore, values close to \(0\) are desired.
acc_varcomp
is an implementation of the Unexpected location indicator, which
belongs to the Unexpected
distributions domain in the Accuracy dimension.
For more details, see the user’s manual and source code.
acc_varcomp(
resp_vars = NULL,
group_vars = NULL,
co_vars = NULL,
min_obs_in_subgroup = 30,
min_subgroups = 5,
label_col = NULL,
threshold_value = 0.05,
study_data = sd1,
meta_data = md1
)
The function has the following arguments:
NULL
for output
without grouping.group_vars
is used. Specifies the minimum number of
observations required to include a subgroup (level) of the
group_vars
in the analysis. Subgroups with less
observations are excluded. The default is 30.group_vars
is used. Specifies the minimum number of subgroups (levels) included
group_vars
. If the variable defined in
group_vars
has less subgroups it is not used for analysis.
The default is 5.To illustrate the output, we use the example synthetic data and metadata that are bundled with the dataquieR package. See the introductory tutorial for instructions on importing these files into R, as well as details on their structure and contents.
Similar to the approach of the acc_margins
function,
we assume that at least one examiner does not adhere to the SOP and may
influence the measurement process:
v00000 | v00001 | v00002 | v00003 | v00004 | v00005 | v01003 | v01002 | v00103 | v00006 |
---|---|---|---|---|---|---|---|---|---|
3 | LEIIX715 | 0 | 49 | 127 | 77 | 49 | 0 | 40-49 | 3.8 |
1 | QHNKM456 | 0 | 47 | 114 | 76 | 47 | 0 | 40-49 | 1.9 |
1 | HTAOB589 | 0 | 50 | 114 | 71 | 50 | 0 | 50-59 | 0.8 |
5 | HNHFV585 | 0 | 48 | 120 | 65 | 48 | 0 | 40-49 | 3.8 |
1 | UTDLS949 | 0 | 56 | 119 | 78 | 56 | 0 | 50-59 | 4.1 |
5 | YQFGE692 | 1 | 47 | 133 | 81 | 47 | 1 | 40-49 | 9.5 |
1 | AVAEH932 | 0 | 53 | 114 | 78 | 53 | 0 | 50-59 | 5.0 |
3 | QDOPT378 | 1 | 48 | 116 | 86 | 48 | 1 | 40-49 | 9.6 |
3 | BMOAK786 | 0 | 44 | 115 | 71 | 44 | 0 | 40-49 | 2.0 |
5 | ZDKNF462 | 0 | 50 | 116 | 74 | 50 | 0 | 50-59 | 2.4 |
For the acc_varcomp
function, the columns
DATA_TYPE
, MISSING_LIST
and
HARD_LIMITS
in the metadata are relevant:
VAR_NAMES | LABEL | MISSING_LIST | DATA_TYPE | HARD_LIMITS | |
---|---|---|---|---|---|
9 | v00004 | SBP_0 | 99980 | 99981 | 99982 | 99983 | 99984 | 99985 | 99986 | 99987 | 99988 | 99989 | 99990 | 99991 | 99992 | 99993 | 99994 | 99995 | float | [80;180] |
10 | v00005 | DBP_0 | 99980 | 99981 | 99982 | 99983 | 99984 | 99985 | 99986 | 99987 | 99988 | 99989 | 99990 | 99991 | 99992 | 99993 | 99994 | 99995 | float | [50;Inf) |
11 | v00006 | GLOBAL_HEALTH_VAS_0 | 99980 | 99983 | 99987 | 99988 | 99989 | 99990 | 99991 | 99992 | 99993 | 99994 | 99995 | float | [0;10] |
14 | v00009 | ARM_CIRC_0 | 99980 | 99981 | 99982 | 99983 | 99984 | 99985 | 99986 | 99987 | 99988 | 99989 | 99990 | 99991 | 99992 | 99993 | 99994 | 99995 | float | [0;Inf) |
21 | v00014 | CRP_0 | 99980 | 99981 | 99982 | 99983 | 99984 | 99985 | 99986 | 99988 | 99989 | 99990 | 99991 | 99992 | 99994 | 99995 | float | [0;Inf) |
22 | v00015 | BSG_0 | 99980 | 99981 | 99982 | 99983 | 99984 | 99985 | 99986 | 99988 | 99989 | 99990 | 99991 | 99992 | 99994 | 99995 | float | [0;100] |
Here, the function is applied to examine the agreement between
observers (USR_BP_0
) for the systolic and diastolic blood
pressure variables (SBP_0
and DBP_0
,
respectively):
varcomp_1 <- acc_varcomp(resp_vars = c("SBP_0", "DBP_0"),
group_vars = c("USR_BP_0"),
co_vars = c("AGE_0", "SEX_0"),
label_col = "LABEL",
min_obs_in_subgroup = 20,
min_subgroups = 3,
study_data = sd1,
meta_data = md1)
## Did not find any 'SCALE_LEVEL' column in item-level meta_data. Predicting it from the data -- please verify these predictions, they may be wrong and lead to functions claiming not to be reasonably applicable to a variable.
## using the same group var "USR_BP_0" for all resp_vars
names(varcomp_1)
## [1] "SummaryTable" "SummaryData" "ScalarValue_max_icc"
## [4] "ScalarValue_argmax_icc"
Output: Summary table
The summary data frame is called using
varcomp_1$SummaryTable
:
Variables | Object | Model.Call | ICC_acc_ud_loc | Class.Number | Mean.Class.Size | Median.Class.Size | Min.Class.Size | Max.Class.Size | convergence.problem | GRADING |
---|---|---|---|---|---|---|---|---|---|---|
SBP_0 | USR_BP_0 | SBP_0 ~ AGE_0 + SEX_0 + (1 | USR_BP_0) | 0.153 | 15 | 165.8 | 160 | 29 | 413 | FALSE | 1 |
DBP_0 | USR_BP_0 | DBP_0 ~ AGE_0 + SEX_0 + (1 | USR_BP_0) | 0.172 | 15 | 165.0 | 162 | 28 | 413 | FALSE | 1 |
In addition to this table, some scalar values are returned (“ScalarValue_max_icc”, “ScalarValue_argmax_icc”) which represent the highest proportion ICC/VC and the response variable with the highest ICC/VC.
ICC or the analysis of variance components should be applied in combination with MARGINS. Extended tests showed that ICC is less susceptible to false-positive indications of data quality issues than margins.
resp_vars
(if defined in
the metadata).resp_vars
using co_vars
and group_vars
for
adjustment.group_vars
indicating the ICC.Sufficient numbers of observations within each level of the
group_vars
are required. This can be specified by the
formal min_obs_level
. Nevertheless, the algorithm of the
linear mixed effects model may not converge in cases of imbalanced and
low numbers of observations.