Description

The function com_qualified_item_missingness describes the missingness at the level of single variables. It provides the proportion of eligible observational units that are missing depending on different reasons:

  • Non-response rate indicates the proportion of eligible observational units missing due to no response to contact attempts.
  • Refusal rate represents the proportion of observational units that are missing due to refusal to provide the requested information.

com_qualified_item_missingness provide indicators for non-response rate and refusal rate. Both indicators belong to the qualified missingness domain in the Completeness dimension.

For more details, see the user’s manual and source code.

Usage and arguments

com_qualified_item_missingness(
   study_data = "study_data",
   meta_data_v2 = "meta_data_v2"
 )

The function has the following arguments:

  • study_data: mandatory, the data frame containing the study measurements
  • meta_data_v2: mandatory, the data frame containing the metadata
  • resp_vars: optional, a character vector specifying the measurement variables of interest
  • label_col: optional, the column in the metadata data frame containing the labels of all the variables in the study data.
  • item_level: optional, the data frame that contains metadata attributes of study data
  • meta_data: alias for item_level
  • _expected_observations: a character vector indicating the observations expected using three possible options based on the old PART_VAR concept: ALL(all observations are expected and included), SEGMENT (the column PART_VAR is expected to point to a variables with values 0 and 1, indicating if the variable was expected to be observed and therefore included in the check), or HIERARCHY (a recursive check, so if a variable points to such a participation variable IN PART_VAR, and that other variable does has also a PART_VAR entry pointing to a variable, the observation of the initial variable is only expected, if both segment variables are 1).

Example output

To illustrate the output, we use a subset of the example synthetic data and metadata that are bundled with the dataquieR package. See the introductory tutorial for instructions on importing these files into R, as well as details on their structure and contents.

qual1 <- com_qualified_item_missingness( 
  resp_vars = c("CRP_0", "BSG_0"), 
  study_data = "study_data",
  meta_data_v2 = "meta_data_v2"
 )

The function generates two outputs: SummaryTable and SummaryData.

Output 1: SummaryTable This data frame contains information for each variable on the missing values using user defined value codes. In the following example the codes are based on the AAPOR definitions (Public Opinion Research 2016).

The SummaryTable is called using qual1$SummaryTable

Variables O NE R NC I N N2 RR1 NRR1 PCT_com_qum_nonresp RR2 NRR2 REF1 PCT_com_qum_refusal
CRP_0 39 9 5 16 2699 2768 2940 0.978253 0.021747 2.174701 0.978253 0.021747 0.0018123 0.1812251
BSG_0 45 11 6 10 2686 2758 2940 0.977794 0.022206 2.220604 0.977794 0.022206 0.0021842 0.2184201

Output 2: SummaryData This data frame summarize the information of SummaryTable and provides just the percentage of the indicators. It is called with qual1$SummaryData.

Variables Non-response rate (Percentage (0 to 100)) Refusal rate (Percentage (0 to 100))
CRP_0 2.17% 0.18%
BSG_0 2.22% 0.22%

Interpretation

The higher the percentage of non-response rate and refusal rate, the lower the data quality.

Algorithm of the implementation

  1. The lists of missing codes and labels are selected from the metadata
  2. The number of each missing code (e.g., I for Complete participation) in each variable is calculated
  3. The percentage of non-response rate and refusal rate are calculated using the formulas: for non-response rate 1 - RR1 (where RR1 is the response rate based on participation only, i.e., (I+P)/((I+P+PL) + (R+BO+NC+O) + (UH+UO)); for refusal rate (REF1) the formula include all who refused at any stage, i.e., (R+BO)/((I+P+PL) + (R+BO+NC+O) + (UH+UO)

Concept relations

Public Opinion Research, T.A.A. for (2016). Standard definitions: Final dispositions of case codes and outcome rates for surveys (AAPOR).