Metadata is considered as “data that describe other data” (Nadkarni 2011). Metadata provides information to support the correct interpretation of study data and guide data quality (DQ) assessments as well as statistical analyses. Metadata are, for example, lists of value codes to examine reasons for incomplete data or value labels to support interpretable reports. Some metadata is specific for certain DQ assessments, while others will be used across most DQ implementations.
Metadata is commonly stored in data dictionaries (DDs). DDs
frequently contain the name of a variable, its data type, and, if
applicable, labels for the levels of a categorical variable
(Meyer et al. 2012). DDs should be available
for the study data of each research study. However, DDs often host only
a subset of all information necessary for data quality assessments.
Thus, DDs need to be extended on aspects related to data quality. If
this is not possible, metadata may also be stored in a spreadsheet-type
format, such as data frames. dataquieR
uses predefined
metadata provided as data frames, as described below.
dataquieR
uses metadataThe metadata schema used by dataquieR is based on a formal data quality framework for observational studies (Schmidt et al. 2021). dataquieR makes use of metadata, that has been organized in a structured form across four tables:
Each metadata table is arranged as a spreadsheet in a workbook to facilitate user input. Users can provide metadata directly in the spreadsheet or by specifying the source file for a specific item (e.g., another spreadsheet or an URL). The metadata schema also allows users to enter reference tables for participant IDs at the different study levels and missing and jump assignments per variable. Additionally, the tables can contain information to control the report output (e.g., the role or order of variables in the report) and the calculation of the quality indicators.
In all metadata tables, the column names are written in upper case letters to distinguish them from the column names in the study data.