Missing codes and jump codes are values used to specify a reason for a data missing. Nas in the data would be better replaced with other value codes indicating a designed jump or a reason for missing data.

To improve the coding of missing values, and replace NAs in study data and metadata with an actual jump or missing code, you can use a function in dataquieR called prep_add_missing_code. This function requires a rule to be used to replace NAs with a value code.

We can look at this example with 2 variables:

  • v_attend, with label “The person interviewed attended University” that can be either “yes” or “no”
  • v_no_yrs, with label “number of years of University done” that contains a number (corresponding to the number of years spent as a student) or NA (if not answered or not applicable).

It is possible to amend data and metadata with a rule stating that any time the variable “v_attend” is “no”, “v_no_yrs” should contain a jump code (for example 9999) instead of NA. This rule can be used to modify the study data (replacing NAs with 9999), but also to amend the metadata with the new jump code in the item_level column JUMP_LIST.

An example with the SHIP-based Example data

In this example we use the SHIP-based example data to modify study data and metadata using rules and the function prep_add_missing_codes.

We want to add a jump code 88888 to replace NAs for the variable sbp1 if the sex is females, and a missing code 99999 to replace NAs for the variable waist if the age is greater than 60 (note that only NAs will be replaced).

To start, import the SHIP-based Example data and modify it.

library(dataquieR)
ship1 <- prep_get_data_frame("ship") #import study data
ship1 <- head(ship1, n = 10) #keep only the first 10 rows
ship1 <- ship1[, c(1:4,11:12, 17:18, 22)] #select a subset of columns
ship1$sbp1[1:4] <- NA #introduce NAs for the purpose of this example
ship1$waist <- NA #introduce NAs for the purpose of this example

Here is the resulting study data ship1

ship1
id exdate age sex sbp1 sbp2 waist cholesterol family
3861 1998-09-22 65 1 NA 176 NA 5.169 1
6506 1998-01-21 70 1 NA 138 NA 5.751 1
6096 1999-04-07 43 2 NA 114 NA 5.612 3
6674 2000-10-06 55 2 NA 129 NA 5.214 1
6490 1998-11-17 69 2 145 145 NA 5.504 3
5366 1997-11-27 65 1 194 179 NA 6.558 1
5735 1999-09-01 40 2 154 148 NA 5.159 5
4031 1999-08-12 51 2 187 177 NA 5.953 1
3578 2000-02-26 25 1 122 123 NA 3.188 3
4807 2000-07-13 80 2 184 170 NA 6.596 1


Then you have to write the rule, or in this case rules, to use to add the jump or missing codes. They have to be stored in a table with the following columns:

  • resp_vars, the variable to amend in the study data;
  • CODE_VALUE, the code to use in the study data to replace NAs;
  • RULE, the rule written in REDCap stating the condition in which to replace values with the new missing or jump code;
  • CODE_CLASS, it can be either JUMP or MISSING;
  • CODE_LABEL, the label of the new added code.

Here there are the rules to use to amend the two variables as mentioned above.

rule_example1 <- data.frame(resp_vars = c("sbp1", "waist"), 
                  CODE_VALUE = c(88888, 99999), 
                  RULE = c('[sex] == "females"', "[age] > 60"), 
                  CODE_CLASS = c("JUMP", "MISSING"), 
                  CODE_LABEL = c("Jump code added as example" , 
                                 "Missing code added as example"))
rule_example1
## Warning in attr(x, "align"): 'xfun::attr()' is deprecated.
## Use 'xfun::attr2()' instead.
## See help("Deprecated")
resp_vars CODE_VALUE RULE CODE_CLASS CODE_LABEL
sbp1 88888 [sex] == “females” JUMP Jump code added as example
waist 99999 [age] > 60 MISSING Missing code added as example


Now you can modify study data and metadata using the function prep_add_missing_codes

ship_new<- prep_add_missing_codes(study_data = ship1, 
                                  meta_data_v2 = "ship_meta_v2", 
                                  rules = rule_example1)


You can see that now the study data contains two new jump codes 88888 for the variable sbp1 and five new missing codes 99999 for the variable waist.

ship_new$ModifiedStudyData
id exdate age sex sbp1 sbp2 waist cholesterol family
3861 1998-09-22 65 1 NA 176 99999 5.169 1
6506 1998-01-21 70 1 NA 138 99999 5.751 1
6096 1999-04-07 43 2 88888 114 NA 5.612 3
6674 2000-10-06 55 2 88888 129 NA 5.214 1
6490 1998-11-17 69 2 145 145 99999 5.504 3
5366 1997-11-27 65 1 194 179 99999 6.558 1
5735 1999-09-01 40 2 154 148 NA 5.159 5
4031 1999-08-12 51 2 187 177 NA 5.953 1
3578 2000-02-26 25 1 122 123 NA 3.188 3
4807 2000-07-13 80 2 184 170 99999 6.596 1

The item_level metadata contains the new code in the JUMP_LIST column for the variable sbp1.

Moreover, the new missing code for the variable waist has been added to the other missing codes already present for that variable in the column MISSING_LIST

ship_new$ModifiedMetaData
VAR_NAMES LABEL JUMP_LIST
3 sbp1 SBP_0.1 88888 = Jump code added as example


VAR_NAMES LABEL MISSING_LIST
29 waist WAIST_CIRC_0 99900 = Missing - other reason | 99901 = Missing - refusal | 99902 = Missing - not assessable | 99903 = Missing - technical problem | 99904 = Missing - not available (material) | 99905 = Missing - not usable (material) | 99906 = Missing - reason unknown | 99907 = Missing - optional value | 99908 = Deleted - other reason | 99909 = Deleted - contradiction | 99910 = Deleted - value outside limits | 99912 = Value above detection limit | 99913 = Value below detection limit | 99914 = Data management ongoing | 99999 = Missing code added as example


How to replace values even if they are not NAs

It is possible to insert missing codes, not only if the values are NAs but in all cases that the rule applies. To do that you need to set the argument overwrite = TRUE

ship_new2<- prep_add_missing_codes(study_data = ship1, 
                                  meta_data_v2 = "ship_meta_v2", 
                                  rules = rule_example1, 
                                  overwrite = TRUE)


You can see that now the study data contains six new jump codes 88888 for the variable sbp1 (for all females) and five new missing codes 99999 for the variable waist.

ship_new2$ModifiedStudyData
id exdate age sex sbp1 sbp2 waist cholesterol family
3861 1998-09-22 65 1 NA 176 99999 5.169 1
6506 1998-01-21 70 1 NA 138 99999 5.751 1
6096 1999-04-07 43 2 88888 114 NA 5.612 3
6674 2000-10-06 55 2 88888 129 NA 5.214 1
6490 1998-11-17 69 2 88888 145 99999 5.504 3
5366 1997-11-27 65 1 194 179 99999 6.558 1
5735 1999-09-01 40 2 88888 148 NA 5.159 5
4031 1999-08-12 51 2 88888 177 NA 5.953 1
3578 2000-02-26 25 1 122 123 NA 3.188 3
4807 2000-07-13 80 2 88888 170 99999 6.596 1

Back to Overview