Description

The function com_qualified_item_missingness describes the missingness at the level of single variables. It provides the proportion of eligible observational units that are missing depending on different reasons:

Non-response rate indicates the proportion of eligible observational units missing due to no response to contact attempts.
Refusal rate represents the proportion of observational units that are missing due to refusal to provide the requested information.

com_qualified_item_missingness provide indicators for non-response rate and refusal rate. Both indicators belong to the qualified missingness domain in the Completeness dimension.

For more details, see the user’s manual and source code.

Usage and arguments

com_qualified_item_missingness(
   study_data = "study_data",
   meta_data_v2 = "meta_data_v2"
 )

The function has the following arguments:

study_data: mandatory, the data frame containing the study measurements
meta_data_v2: mandatory, the data frame containing the metadata
resp_vars: optional, a character vector specifying the measurement variables of interest
label_col: optional, the column in the metadata data frame containing the labels of all the variables in the study data.
item_level: optional, the data frame that contains metadata attributes of study data
meta_data: alias for item_level
_expected_observations: a character vector indicating the observations expected using three possible options based on the old PART_VAR concept: ALL(all observations are expected and included), SEGMENT (the column PART_VAR is expected to point to a variables with values 0 and 1, indicating if the variable was expected to be observed and therefore included in the check), or HIERARCHY (a recursive check, so if a variable points to such a participation variable IN PART_VAR, and that other variable does has also a PART_VAR entry pointing to a variable, the observation of the initial variable is only expected, if both segment variables are 1).

Example output

To illustrate the output, we use a subset of the example synthetic data and metadata that are bundled with the dataquieR package. See the introductory tutorial for instructions on importing these files into R, as well as details on their structure and contents.

qual1 <- com_qualified_item_missingness( 
  resp_vars = c("CRP_0", "BSG_0"), 
  study_data = "study_data",
  meta_data_v2 = "meta_data_v2"
 )

The function generates two outputs: SummaryTable and SummaryData.

Output 1: SummaryTable This data frame contains information for each variable on the missing values using user defined value codes. In the following example the codes are based on the AAPOR definitions (Public Opinion Research 2016).

The SummaryTable is called using qual1$SummaryTable

Variables	O	NE	R	NC	I	N	N2	RR1	NRR1	PCT_com_qum_nonresp	RR2	NRR2	REF1	PCT_com_qum_refusal
CRP_0	39	9	5	16	2699	2768	2940	0.978253	0.021747	2.174701	0.978253	0.021747	0.0018123	0.1812251
BSG_0	45	11	6	10	2686	2758	2940	0.977794	0.022206	2.220604	0.977794	0.022206	0.0021842	0.2184201

Output 2: SummaryData This data frame summarize the information of SummaryTable and provides just the percentage of the indicators. It is called with qual1$SummaryData.

Variables	Non-response rate (Percentage (0 to 100))	Refusal rate (Percentage (0 to 100))
CRP_0	2.17%	0.18%
BSG_0	2.22%	0.22%

Interpretation

The higher the percentage of non-response rate and refusal rate, the lower the data quality.

Algorithm of the implementation

The lists of missing codes and labels are selected from the metadata
The number of each missing code (e.g., I for Complete participation) in each variable is calculated
The percentage of non-response rate and refusal rate are calculated using the formulas: for non-response rate 1 - RR1 (where RR1 is the response rate based on participation only, i.e., (I+P)/((I+P+PL) + (R+BO+NC+O) + (UH+UO)); for refusal rate (REF1) the formula include all who refused at any stage, i.e., (R+BO)/((I+P+PL) + (R+BO+NC+O) + (UH+UO)

Concept relations

Data quality Indicator Non-response rate
Data quality Indicator Refusal rate

Public Opinion Research, T.A.A. for (2016). Standard definitions: Final dispositions of case codes and outcome rates for surveys (AAPOR).

R implementation of qualified item missingness