In an epidemiological study, data may be grouped according to the different examinations, such as laboratory, blood pressure or ultrasound measurements. The corresponding metadata to describe single segments is termed segment level.
dataquieR
uses segment level metadataTo analyze data quality at the segment level, the item level must include information
about which variable corresponds to each segment in the column labelled
STUDY_SEGMENT
.
This column includes the name of the study segment (as strings), defined for each variable.
Specifies the number of expected data records in each study segment. The value must be an integer. The check will only be conducted if a number is entered.
For example, the data frame level count metadata may be:
STUDY_SEGMENT | SEGMENT_RECORD_COUNT |
---|---|
STUDY | 3000 |
PHYS_EXAM | 2000 |
LAB | 1990 |
INTERVIEW | 3000 |
QUESTIONNAIRE | 2981 |
The name of the table containing the reference IDs to be compared with the IDs in the targeted segment. The input must be a string and can refer to a spreadsheet in the same or another workbook or an URL.
In the example below, for the first four segments, the IDs are
specified in the sheet called expected_ids
of the same
workbook. In contrast, the IDs for PART_QUESTIONNAIRE
are
provided in the pseudo_id
sheet of the
questionnaire_data.xlsx
workbook. Since this is a different
workbook, its path must be specified.
STUDY_SEGMENT | SEGMENT_ID_TABLE |
---|---|
STUDY | expected_id |
PHYS_EXAM | expected_id |
LAB | expected_id |
INTERVIEW | expected_id |
QUESTIONNAIRE | d:/data/questionnaire_data.xlsx | pseudo_id |
A string that sets the type of check to be conducted when comparing the reference ID table with the IDs in a segment. Two checks are possible:
SEGMENT_ID_REF_TABLE
and the IDs in
STUDY_SEGMENT
, orSTUDY_SEGMENT
are a
subset of SEGMENT_ID_TABLE
.For instance, the PART_STUDY
,
PART_INTERVIEW
and PART_QUESTIONNAIRE
may
comprise all participants from a study, while particular sections, such
as PART_PHYS_EXAM
and PART_LAB
, may have only
been collected from a smaller participant sample:
STUDY_SEGMENT | SEGMENT_RECORD_CHECK |
---|---|
STUDY | exact |
PHYS_EXAM | subset |
LAB | subset |
INTERVIEW | exact |
QUESTIONNAIRE | exact |
Defines all variables to be used as one single ID variable (a combined key) in a segment. The list of variables must be a string in which each variable is separated by a pipe character (|).
For example, the ID for PART_PHYS_EXAM
is defined by a
combined key specified by a list of variables, where the key consists of
the “PSEUDO_ID” and “CENTER_0” variables. For the rest of the variables,
the ID is specified by the variable “v00001”:
STUDY_SEGMENT | SEGMENT_ID_VARS |
---|---|
STUDY | v00001 |
PHYS_EXAM | PSEUDO_ID | CENTER_0 |
LAB | v00001 |
INTERVIEW | v00001 |
QUESTIONNAIRE | v00001 |
Specifies whether identical data is permitted across rows in a segment (excluding ID variables). The input is a Boolean, meaning:
For instance, row repetitions may be allowed for
PART_PHYS_EXAM
and PART_LAB
but not for the
rest of the segments.
STUDY_SEGMENT | SEGMENT_UNIQUE_ROWS |
---|---|
STUDY | true |
PHYS_EXAM | false |
LAB | false |
INTERVIEW | true |
QUESTIONNAIRE | true |
Provides the name of the variable that indicates participation in the respective segment. For instance:
STUDY_SEGMENT | SEGMENT_PART_VARS |
---|---|
STUDY | seg_study_part |
PHYS_EXAM | seg_phys_exam_part |
LAB | seg_lab_part |
INTERVIEW | seg_interview_part |
QUESTIONNAIRE | seg_questionnaire_part |
In the study data, each segment participation variable contains participation and missing codes (e.g., -10000, 99980, 99981). If interpretation codes are provided in a separate table (e.g., segment_missing_table), the participation codes allow the calculation of qualified missingness rates per segment.