The function con_inadmissible_categorical
examines if
all observed levels in the study data are valid according to the value
lists defined in the metadata for each categorical variable. Thus,
con_inadmissible_categorical
is an implementation of the Inadmissible categorical values
indicator, which belongs to the Range and
value violations domain in the Consistency dimension.
For more details, see the user’s manual and the source code.
con_inadmissible_categorical(
resp_vars = NULL,
study_data = sd1,
meta_data = md1,
label_col = NULL,
threshold = NULL
)
The con_inadmissible_categorical
function has the
following arguments:
No threshold is implemented.
To illustrate the output, we use the example synthetic data and metadata that are bundled with the dataquieR package. See the introductory tutorial for instructions on importing these files into R, as well as details on their structure and contents.
For the con_inadmissible_categorical
function, the
columns VALUE_LABELS
, MISSING_LIST
and
JUMP_LIST
in the metadata are particularly relevant.
VALUE_LABELS have to be defined as follows:
\("0 = male \: | \: 1 = female"\)
\("A = good \: | \: B = moderate \: | \: C = bad "\)
This table shows the metadata defined for the example data that required for this implementation:
VAR_NAMES | LABEL | MISSING_LIST | JUMP_LIST | VALUE_LABELS | |
---|---|---|---|---|---|
5 | v00103 | AGE_GROUP_0 | NA | NA | NA |
12 | v00007 | ASTHMA_0 | 99980 | 99988 | 99989 | 99991 | 99993 | 99994 | 99995 | NA | 0 = no | 1 = yes |
39 | v00030 | MEDICATION_0 | 99980 | 99983 | 99988 | 99989 | 99990 | 99991 | 99993 | 99994 | 99995 | NA | 0 = no | 1 = yes |
36 | v00027 | N_BIRTH_0 | 99980 | 99983 | 99988 | 99989 | 99990 | 99991 | 99993 | 99994 | 99995 | 88880 | NA |
40 | v00031 | N_ATC_CODES_0 | 99980 | 99983 | 99988 | 99989 | 99990 | 99991 | 99993 | 99994 | 99995 | NA | NA |
43 | v40000 | PART_INTERVIEW | NA | NA | 0 = no | 1 = yes |
31 | v00022 | EATING_PREFS_0 | 99980 | 99983 | 99988 | 99989 | 99990 | 99991 | 99993 | 99994 | 99995 | NA | 0 = none | 1 = vegetarian | 2 = vegan |
8 | v10000 | PART_STUDY | NA | NA | 0 = no | 1 = yes |
20 | v20000 | PART_PHYS_EXAM | NA | NA | 0 = no | 1 = yes |
10 | v00005 | DBP_0 | 99980 | 99981 | 99982 | 99983 | 99984 | 99985 | 99986 | 99987 | 99988 | 99989 | 99990 | 99991 | 99992 | 99993 | 99994 | 99995 | NA | NA |
In this example, all variables with assigned
VALUE_LABELS
will be examined.
IAVCatAll <- con_inadmissible_categorical(study_data = sd1,
meta_data = md1,
label_col = "LABEL")
names(IAVCatAll)
## [1] "SummaryData" "SummaryTable" "ModifiedStudyData"
## [4] "FlaggedStudyData"
Summary Table:
The first output object contains a summary of all examined
variables/data elements. Those showing categories that were not
specified in the metadata are flagged, i.e. the column
GRADING
has the value 1.
Variables | NUM_con_rvv_icat | PCT_con_rvv_icat | GRADING | FLG_con_rvv_icat |
---|---|---|---|---|
CENTER_0 | 0 | 0.0 | 0 | FALSE |
SEX_0 | 0 | 0.0 | 0 | FALSE |
SEX_1 | 0 | 0.0 | 0 | FALSE |
PART_STUDY | 0 | 0.0 | 0 | FALSE |
ASTHMA_0 | 0 | 0.0 | 0 | FALSE |
VO2_CAPCAT_0 | 0 | 0.0 | 0 | FALSE |
ARM_CIRC_DISC_0 | 0 | 0.0 | 0 | FALSE |
ARM_CUFF_0 | 0 | 0.0 | 0 | FALSE |
USR_VO2_0 | 0 | 0.0 | 0 | FALSE |
USR_BP_0 | 0 | 0.0 | 0 | FALSE |
PART_PHYS_EXAM | 0 | 0.0 | 0 | FALSE |
PART_LAB | 0 | 0.0 | 0 | FALSE |
EDUCATION_0 | 0 | 0.0 | 0 | FALSE |
EDUCATION_1 | 3 | 0.1 | 1 | TRUE |
FAM_STAT_0 | 2389 | 79.6 | 1 | TRUE |
MARRIED_0 | 0 | 0.0 | 0 | FALSE |
EATING_PREFS_0 | 0 | 0.0 | 0 | FALSE |
MEAT_CONS_0 | 0 | 0.0 | 0 | FALSE |
SMOKING_0 | 0 | 0.0 | 0 | FALSE |
SMOKE_SHOP_0 | 24 | 0.8 | 1 | TRUE |
INCOME_GROUP_0 | 0 | 0.0 | 0 | FALSE |
PREGNANT_0 | 0 | 0.0 | 0 | FALSE |
MEDICATION_0 | 349 | 11.6 | 1 | TRUE |
USR_SOCDEM_0 | 172 | 5.7 | 1 | TRUE |
PART_INTERVIEW | 0 | 0.0 | 0 | FALSE |
PART_QUESTIONNAIRE | 0 | 0.0 | 0 | FALSE |
Modified data:
The modified data set is similar to the study data but inadmissible values were removed. For example, in education_1 those values of category “7” have been replaced by NA three times.
Flagged data:
For each variable with inadmissible values a separate columns is added flagging the observations with inadmissible categories.
EDUCATION_1_IAV | FAM_STAT_0_IAV | SMOKE_SHOP_0_IAV | MEDICATION_0_IAV | USR_SOCDEM_0_IAV |
---|---|---|---|---|
0 | 0 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 1 |
0 | 0 | 0 | 0 | 0 |
The higher the number of inadmissible values the lower the data quality in each data element. Similarly, the higher the number of data elements with inadmissible values the lower the data quality.
Note: If the majority of data values appear inadmissible, the correct specification of metadata should be reviewed.
VALUE_LABELS
as
supplied in the metadata.