Any extensive data quality report requires study data (for example,
clinical measurements) and metadata. dataquieR
supports a
spreadsheet-type structure with several tables, which is described in
more detail in metadata
annotation tutorial.
Below, we list all the existing implementations in
dataquieR
(see Download for
installation instructions) with links to their respective documentation.
Additional examples, alternative implementations, and contributing code
guidelines are available as tutorials.
These are functions from dataquieR
that can be used to
trigger single data quality checks. Their use is recommended for rather
specific applications. It may be easier to use the dq_report2 function for
standard reports.
All dataquieR
’s functions are linked to the underlying
data quality concept as described in the
table below.
The indicator functions are aided by 352 support functions. The main task of
these functions is to ensure a stable operation of
dataquieR
in light of potentially deficient data, which
requires extensive data preprocessing steps.
In Stata, the package dqrep
can be used for data quality
analyses. It can be installed using the following command syntax:
net from https://packages.qihs.uni-greifswald.de/repository/stata/dqrep
net install dqrep, replace
net get dqrep, replace
Note: In rare case of issues when installing
dqrep
from the repository above please contact us.
dqrep
stands for “Data Quality REPorter”. This wrapper
command triggers an analysis pipeline to generate data quality
assessments. Assessments range from simple descriptive variable
overviews to full scale data quality reports that cover missing data,
extreme values, value distributions, observer and device effects or the
time course of measurements. Reports are provided as .pdf or .docx files
which are accompanied by a data set on assessment results. Reports are
highly customizable and visualize the severity and number of data
quality issues. In addition, there are options for benchmarking results
between examinations and studies.
There are two essentially different approaches to run
dqrep
:
First, dqrep
can be used to assess variables of the
active dataset. While most functionalities are available, checks that
depend on varying information at the variable level (e.g. range
violations) cannot be performed. Any variable used in a certain role
(e.g. observervars, keyvars) must be called for in
varlist
.
Second, dqrep
can be used to perform checks of variables
across a number of datasets that are specified in the targetfiles
option. In addition, a metadatafile can be specified that holds
information on variables and checks using the metadatafile option. This
allows for a more flexible application on variables in distinct data
sets, making use of all implemented dqrep
functionalities.
For more details on the conduct of dqrep
see this help
file.