Description

The function com_unit_missingness targets unit missingness or unit nonresponse (Kalton and Kasprzyk 1986). It does so without analyzing the reason for why data is missing. This is why com_unit_missingness is an implementation of the Missing values indicator, which belongs to the Crude Missingness domain in the Completeness dimension.

com_unit_missingness checks if all measurement variables in the provided study dataset are missing for an observational unit. Therefore any decision on unit missingness is dependent on the scope of the provided dataset.

For more details, see the user’s manual and the source code.

Usage and arguments

com_unit_missingness(study_data,
  meta_data,
  id_vars = NULL,
  strata_vars = NULL,
  label_col
)

The com_unit_missingness function has the following arguments:

  • study_data: mandatory, the data frame containing the measurements.
  • meta_data: mandatory, the data frame containing the study data’s metadata.
  • id_vars: optional, a character vector of ID variables that should not be considered when calculating unit-missingness.
  • strata_vars: optional, a string or integer variable used for stratification.
  • label_col: optional, the column in the metadata data frame containing the labels of all the variables in the study data.

Crude unit missingness can be calculated for stratified data. In this case strata_vars must be specified. There is no implementation of a threshold value.

Example output

To illustrate the output, we use the example synthetic data and metadata that are bundled with the dataquieR package. See the introductory tutorial for instructions on importing these files into R, as well as details on their structure and contents.

No stratification

The first example specifies the analyses of missing units without stratification:

unit_miss_1 <- com_unit_missingness(
  study_data = sd1,
  meta_data = md1,
  id_vars = c("CENTER_0", "PSEUDO_ID"),
  label_col = "LABEL"
)

The function outputs the lists FlaggedStudyData and SummaryData. FlaggedStudyData contains a data frame of the study data that uses flags to indicate observations without any measurements at all. SummaryData contains a vector of two elements: (1) the number of observations showing unit missingness, and (2) the percentage of unit missingness.

Run unit_miss_1$SummaryData to see the summary output. In this example, unit missingness is observed in n = 60 observations, which equals 2% in this dataset.

Stratification

Unit missingness can also be calculated using a discrete variable for stratification, for example, in multi-center studies:

unit_miss_2 <- com_unit_missingness(
  study_data = sd1,
  meta_data = md1,
  id_vars = c("CENTER_0", "PSEUDO_ID"),
  strata_vars = "CENTER_0",
  label_col = "LABEL"
)

The stratified summary data frame output provides indicates unit missingness for each stratum, unit_miss_2$SummaryData:

CENTER_0 N_OBS N_UNIT_MISSINGS N_UNIT_MISSINGS_(%)
Berlin 617 15 2.43
Hamburg 581 11 1.89
Leipzig 593 9 1.52
Cologne 564 13 2.30
Munich 585 12 2.05

Interpretation

com_unit_missingness provides the number and proportion of units without a single valid measurement value on any provided variable. Generally, the higher the proportion on units with missing data, the lower the data quality.

Unit missingness should be distinguished from segment and item missingness because it may have different causes and underlying mechanisms. For example, unit-nonresponse may be selective regarding the targeted study population or may occur due to technical reasons, such as record linkage.

Some notes of caution apply:

  • com_unit_missingness calculates a crude rate of unit missingness, meaning that it ignores the reason for why information is missing. As missingness may have several causes com_unit_missingness will for example miss out on design related missingness, which does not, per se, relate to an inferior data quality.

  • com_unit_missingness only looks at the provided variables. Thus, results tells that for the intended scope of variables no information comes from any observational unit. In terms of the conceptual distinction unit, segment, item missingness, the results of com_unit_missingness may vary. Take for example if the variables to be checked only come from one segment (e.g. one examination) of a study. The meaning of results in this case is, in fact not unit missingness but segment or maybe even item missingness. Users must therefore keep the scope of variables in mind to correctly interpret results.

  • Variables that provide non-measurement information of relevance like IDs must be excluded from the analyses to get any meaningful results. Generally, all variables, that are by default filled out completely independent of participation status must be excluded.

Concept relations

Kalton, G., and Kasprzyk, D. (1986). The treatment of missing survey data. Survey Methodology 12, 1–16.