Description

The function com_qualified_segment_missingness checks for Non-response rate and Refusal rate for each segment.

com_qualified_segment_missingness provide indicators for non-response rate and refusal rate. Both indicators belong to the qualified missingness domain in the Completeness dimension.

For more details, see the user’s manual and source code.

Usage and arguments

com_qualified_segment_missingness(
   study_data = "study_data",
   meta_data_v2 = "meta_data_v2"
 )

The function has the following arguments:

study_data: mandatory, the data frame containing the study measurements
meta_data_v2: mandatory, the data frame containing the metadata
label_col: optional, the column in the metadata data frame containing the labels of all the variables in the study data.
item_level: optional, the data frame that contains metadata attributes of study data
meta_data: alias for item_level
_expected_observations: a character vector indicating the observations expected using three possible options based on the old PART_VAR concept: ALL(all observations are expected and included), SEGMENT (the column PART_VAR is expected to point to a variables with values 0 and 1, indicating if the variable was expected to be observed and therefore included in the check), or HIERARCHY (a recursive check, so if a variable points to such a participation variable IN PART_VAR, and that other variable does has also a PART_VAR entry pointing to a variable, the observation of the initial variable is only expected, if both segment variables are 1)
_meta_data_segment: the data frame containing the metadata for the segment level
___segment_level: alias for meta_data_segment

Example output

To illustrate the output, we use a subset of the SHIP example data and metadata that are bundled with the dataquieR package. See the introductory tutorial for instructions on importing these files into R, as well as details on their structure and contents.

qual_seg <- com_qualified_segment_missingness( 
  study_data = "ship",
  meta_data_v2 = "ship_meta_v2"
 )

The function generates two outputs: SegmentTable and SegmentData.

Output 1: SegmentTable This data frame contains information for each segment on the missing values using user defined value codes. In the following example the codes are based on the AAPOR definitions (Public Opinion Research 2016).

The SegmentTable is called using qual_seg$SegmentTable

Segment	O	R	UO	P	I	RR1	NRR1	PCT_com_qum_nonresp	RR2	NRR2	REF1	PCT_com_qum_refusal	N	N2
INTRO	0	0	0	0	2154	1.0000000	0.0000000	0.0000000	1.0000000	0.0000000	0.0000000	0.0000000	2154	2154
SOMATOMETRY	2	1	1	0	2150	0.9981430	0.0018570	0.1857010	0.9981430	0.0018570	0.0004643	0.0464253	2154	2154
INTERVIEW	4	7	0	2143	0	0.9948932	0.0051068	0.5106778	0.9948932	0.0051068	0.0032498	0.3249768	2154	2154
LABORATORY	16	1	0	0	2137	0.9921077	0.0078923	0.7892293	0.9921077	0.0078923	0.0004643	0.0464253	2154	2154

Output 2: SegmentData This data frame summarize the information of SegmentTable and provides just the percentage of the indicators. It is called with qual_seg$SegmentData.

Segment	Non-response rate (Percentage (0 to 100))	Refusal rate (Percentage (0 to 100))
INTRO	0%	0%
SOMATOMETRY	0.19%	0.05%
INTERVIEW	0.51%	0.32%
LABORATORY	0.79%	0.05%

Interpretation

The higher the percentage of non-response rate and refusal rate, the lower the data quality.

Algorithm of the implementation

The lists of missing codes and labels are selected from the metadata
The number of each missing code (e.g., I for Complete participation) in each segment is calculated
The percentage of non-response rate and refusal rate are calculated using the formulas: for non-response rate 1 - RR1 (where RR1 is the response rate based on participation only, i.e., (I+P)/((I+P+PL) + (R+BO+NC+O) + (UH+UO)); for refusal rate (REF1) the formula include all who refused at any stage, i.e., (R+BO)/((I+P+PL) + (R+BO+NC+O) + (UH+UO)

Concept relations

Data quality Indicator Non-response rate
Data quality Indicator Refusal rate

Public Opinion Research, T.A.A. for (2016). Standard definitions: Final dispositions of case codes and outcome rates for surveys (AAPOR).

R implementation of qualified segment missingness