Missing codes and jump codes are values used to specify a reason for a data missing. Nas in the data would be better replaced with other value codes indicating a designed jump or a reason for missing data.
To improve the coding of missing values, and replace NAs in study
data and metadata with an actual jump or missing code, you can use a
function in dataquieR
called
prep_add_missing_code
. This function requires a rule to be
used to replace NAs with a value code.
We can look at this example with 2 variables:
It is possible to amend data and metadata with a rule stating that
any time the variable “v_attend” is “no”, “v_no_yrs” should contain a
jump code (for example 9999) instead of NA. This rule can be used to
modify the study data (replacing NAs with 9999), but also to amend the
metadata with the new jump code in the item_level
column
JUMP_LIST
.
In this example we use the SHIP-based example data to
modify study data and metadata using rules and the function
prep_add_missing_codes
.
We want to add a jump code 88888
to replace NAs for the
variable sbp1
if the sex
is
females
, and a missing code 99999
to replace
NAs for the variable waist
if the age
is
greater than 60
(note that only NAs will be
replaced).
To start, import the SHIP-based Example data and modify it.
library(dataquieR)
ship1 <- prep_get_data_frame("ship") #import study data
ship1 <- head(ship1, n = 10) #keep only the first 10 rows
ship1 <- ship1[, c(1:4,11:12, 17:18, 22)] #select a subset of columns
ship1$sbp1[1:4] <- NA #introduce NAs for the purpose of this example
ship1$waist <- NA #introduce NAs for the purpose of this example
Here is the resulting study data ship1
ship1
id | exdate | age | sex | sbp1 | sbp2 | waist | cholesterol | family |
---|---|---|---|---|---|---|---|---|
3861 | 1998-09-22 | 65 | 1 | NA | 176 | NA | 5.169 | 1 |
6506 | 1998-01-21 | 70 | 1 | NA | 138 | NA | 5.751 | 1 |
6096 | 1999-04-07 | 43 | 2 | NA | 114 | NA | 5.612 | 3 |
6674 | 2000-10-06 | 55 | 2 | NA | 129 | NA | 5.214 | 1 |
6490 | 1998-11-17 | 69 | 2 | 145 | 145 | NA | 5.504 | 3 |
5366 | 1997-11-27 | 65 | 1 | 194 | 179 | NA | 6.558 | 1 |
5735 | 1999-09-01 | 40 | 2 | 154 | 148 | NA | 5.159 | 5 |
4031 | 1999-08-12 | 51 | 2 | 187 | 177 | NA | 5.953 | 1 |
3578 | 2000-02-26 | 25 | 1 | 122 | 123 | NA | 3.188 | 3 |
4807 | 2000-07-13 | 80 | 2 | 184 | 170 | NA | 6.596 | 1 |
Then you have to write the rule, or in this case rules, to use to add the jump or missing codes. They have to be stored in a table with the following columns:
JUMP
or
MISSING
;Here there are the rules to use to amend the two variables as mentioned above.
rule_example1 <- data.frame(resp_vars = c("sbp1", "waist"),
CODE_VALUE = c(88888, 99999),
RULE = c('[sex] == "females"', "[age] > 60"),
CODE_CLASS = c("JUMP", "MISSING"),
CODE_LABEL = c("Jump code added as example" ,
"Missing code added as example"))
rule_example1
## Warning in attr(x, "align"): 'xfun::attr()' is deprecated.
## Use 'xfun::attr2()' instead.
## See help("Deprecated")
resp_vars | CODE_VALUE | RULE | CODE_CLASS | CODE_LABEL |
---|---|---|---|---|
sbp1 | 88888 | [sex] == “females” | JUMP | Jump code added as example |
waist | 99999 | [age] > 60 | MISSING | Missing code added as example |
Now you can modify study data and metadata using the function
prep_add_missing_codes
ship_new<- prep_add_missing_codes(study_data = ship1,
meta_data_v2 = "ship_meta_v2",
rules = rule_example1)
You can see that now the study data contains two new jump codes
88888
for the variable sbp1
and five new
missing codes 99999
for the variable
waist
.
ship_new$ModifiedStudyData
id | exdate | age | sex | sbp1 | sbp2 | waist | cholesterol | family |
---|---|---|---|---|---|---|---|---|
3861 | 1998-09-22 | 65 | 1 | NA | 176 | 99999 | 5.169 | 1 |
6506 | 1998-01-21 | 70 | 1 | NA | 138 | 99999 | 5.751 | 1 |
6096 | 1999-04-07 | 43 | 2 | 88888 | 114 | NA | 5.612 | 3 |
6674 | 2000-10-06 | 55 | 2 | 88888 | 129 | NA | 5.214 | 1 |
6490 | 1998-11-17 | 69 | 2 | 145 | 145 | 99999 | 5.504 | 3 |
5366 | 1997-11-27 | 65 | 1 | 194 | 179 | 99999 | 6.558 | 1 |
5735 | 1999-09-01 | 40 | 2 | 154 | 148 | NA | 5.159 | 5 |
4031 | 1999-08-12 | 51 | 2 | 187 | 177 | NA | 5.953 | 1 |
3578 | 2000-02-26 | 25 | 1 | 122 | 123 | NA | 3.188 | 3 |
4807 | 2000-07-13 | 80 | 2 | 184 | 170 | 99999 | 6.596 | 1 |
The item_level metadata
contains the new code in the
JUMP_LIST
column for the variable sbp1
.
Moreover, the new missing code for the variable waist
has been added to the other missing codes already present for that
variable in the column MISSING_LIST
ship_new$ModifiedMetaData
VAR_NAMES | LABEL | JUMP_LIST | |
---|---|---|---|
3 | sbp1 | SBP_0.1 | 88888 = Jump code added as example |
VAR_NAMES | LABEL | MISSING_LIST | |
---|---|---|---|
29 | waist | WAIST_CIRC_0 | 99900 = Missing - other reason | 99901 = Missing - refusal | 99902 = Missing - not assessable | 99903 = Missing - technical problem | 99904 = Missing - not available (material) | 99905 = Missing - not usable (material) | 99906 = Missing - reason unknown | 99907 = Missing - optional value | 99908 = Deleted - other reason | 99909 = Deleted - contradiction | 99910 = Deleted - value outside limits | 99912 = Value above detection limit | 99913 = Value below detection limit | 99914 = Data management ongoing | 99999 = Missing code added as example |
It is possible to insert missing codes, not only if the values are
NAs but in all cases that the rule applies. To do that you need to set
the argument overwrite = TRUE
ship_new2<- prep_add_missing_codes(study_data = ship1,
meta_data_v2 = "ship_meta_v2",
rules = rule_example1,
overwrite = TRUE)
You can see that now the study data contains six new jump codes
88888
for the variable sbp1
(for all females)
and five new missing codes 99999
for the variable
waist
.
ship_new2$ModifiedStudyData
id | exdate | age | sex | sbp1 | sbp2 | waist | cholesterol | family |
---|---|---|---|---|---|---|---|---|
3861 | 1998-09-22 | 65 | 1 | NA | 176 | 99999 | 5.169 | 1 |
6506 | 1998-01-21 | 70 | 1 | NA | 138 | 99999 | 5.751 | 1 |
6096 | 1999-04-07 | 43 | 2 | 88888 | 114 | NA | 5.612 | 3 |
6674 | 2000-10-06 | 55 | 2 | 88888 | 129 | NA | 5.214 | 1 |
6490 | 1998-11-17 | 69 | 2 | 88888 | 145 | 99999 | 5.504 | 3 |
5366 | 1997-11-27 | 65 | 1 | 194 | 179 | 99999 | 6.558 | 1 |
5735 | 1999-09-01 | 40 | 2 | 88888 | 148 | NA | 5.159 | 5 |
4031 | 1999-08-12 | 51 | 2 | 88888 | 177 | NA | 5.953 | 1 |
3578 | 2000-02-26 | 25 | 1 | 122 | 123 | NA | 3.188 | 3 |
4807 | 2000-07-13 | 80 | 2 | 88888 | 170 | 99999 | 6.596 | 1 |