dataquieR
provides many outputs ready to be integrated
with a quality report. However, usually, requirements are more specific.
The following documentation can be used for adjusting outputs to meet
specific requirements for the two most common types of output, data
frames and ggplot2
-graphics.
dataquieR
The basic example used in this documentation requires two objects
which are mandatory for all dataquieR
functions:
These data are loaded from the dataquieR
package.
load(system.file("extdata", "study_data.RData", package = "dataquieR"))
sd1 <- study_data
load(system.file("extdata", "meta_data.RData", package = "dataquieR"))
md1 <- meta_data
The example output is generated using the dataquieR
function: com_item_missingness()
.
tab_ex1 <- com_item_missingness(study_data = sd1,
meta_data = md1,
threshold_value = 90,
include_sysmiss = TRUE,
show_causes = FALSE)
#> Warning: The argument "show_causes" has been deprecated. It will be ignored and in a future version be removed.
#> Warning: There are 60 meassurements of "v00000" for participants not being part of one of the segments "PART_STUDY"
#> Warning: There are 60 meassurements of "v00001" for participants not being part of one of the segments "PART_STUDY"
#> Warning: There are 16 meassurements of "v00033" for participants not being part of one of the segments "PART_STUDY", "PART_INTERVIEW"
#> Warning: There are 76 meassurements of "v00042" for participants not being part of one of the segments "PART_STUDY", "PART_INTERVIEW", "PART_QUESTIONNAIRE"
This function generates two objects: SummaryTable, SummaryData,
SummaryPlot, ReportSummaryTable. The first is a data frame, the second a
ggplot
. The following steps show how to edit these
objects.
For the use of data frames in data quality reporting, there are two important aspects.
they should be displayed in a neat and comprehensible way. For
this aspect, many packages exist, e.g. xtable
,
kableExtra
, pixiedust
, huxtable
and DT
, each of which integrates with some of the most
output formats supported by rmarkdown
/pandoc
,
namely html
, docx
, pdf
, and
flexdashbaord
. For using these package, we ask the reader
to refer to these packages’ documentation, please.
Given the size of data frames there must be ways to filter and /
or sort them, to add or remove columns, and to rename columns. For these
issues a good choice is the tidyverse
with the
dplyr
package.
MOVE TO OTHER PLACE??? Related with the next point
(ggplot2
graphics generated by dataquieR
),
wide- and long-format is another point with tables. tidyr
is one possible choice for transforming tables from long- to
wide-format.
The most simple output of the data frame appears like this (first 10 shown only to reduce file size):
knitr::kable(head(tab_ex1$SummaryTable, 10))
Variables | Observations N | Sysmiss N (%) | Datavalues N (%) | Missing codes N (%) | Jumps N (%) | Measurements N (%) | GRADING |
---|---|---|---|---|---|---|---|
v00000 | 2940 | 0 (0) | 2940 (100) | 0 (0) | 0 (0) | 2940 (100) | 0 |
v00001 | 2940 | 0 (0) | 2940 (100) | 0 (0) | 0 (0) | 2940 (100) | 0 |
v00002 | 2940 | 0 (0) | 2940 (100) | 0 (0) | 0 (0) | 2940 (100) | 0 |
v00003 | 2940 | 0 (0) | 2940 (100) | 0 (0) | 0 (0) | 2940 (100) | 0 |
v00103 | 2940 | 0 (0) | 2940 (100) | 0 (0) | 0 (0) | 2940 (100) | 0 |
v01003 | 2940 | 0 (0) | 2940 (100) | 0 (0) | 0 (0) | 2940 (100) | 0 |
v01002 | 2940 | 0 (0) | 2940 (100) | 0 (0) | 0 (0) | 2940 (100) | 0 |
v10000 | 3000 | 60 (2) | 2940 (98) | 0 (0) | 0 (0) | 2940 (98) | 0 |
v00004 | 2940 | 239 (8.13) | 2701 (91.87) | 140 (4.76) | 0 (0) | 2561 (87.11) | 1 |
v00005 | 2940 | 233 (7.93) | 2707 (92.07) | 163 (5.54) | 0 (0) | 2544 (86.53) | 1 |
The table above comprises information regarding missing values of all
variables in the study data. Nevertheless, it represents not the most
beautiful output. We may use some functionality of the
kableExtra
package and attach this formats to the present
table using dplyr
.
suppressPackageStartupMessages(library(dplyr))
library(kableExtra)
kable(tab_ex1$SummaryTable, "html") %>%
kable_styling(bootstrap_options = c("hover"))
Variables | Observations N | Sysmiss N (%) | Datavalues N (%) | Missing codes N (%) | Jumps N (%) | Measurements N (%) | GRADING |
---|---|---|---|---|---|---|---|
v00000 | 2940 | 0 (0) | 2940 (100) | 0 (0) | 0 (0) | 2940 (100) | 0 |
v00001 | 2940 | 0 (0) | 2940 (100) | 0 (0) | 0 (0) | 2940 (100) | 0 |
v00002 | 2940 | 60 (2.04) | 2880 (97.96) | 0 (0) | 0 (0) | 2880 (97.96) | 0 |
v00003 | 2940 | 60 (2.04) | 2880 (97.96) | 0 (0) | 0 (0) | 2880 (97.96) | 0 |
v00103 | 2940 | 60 (2.04) | 2880 (97.96) | 0 (0) | 0 (0) | 2880 (97.96) | 0 |
v01003 | 2940 | 60 (2.04) | 2880 (97.96) | 0 (0) | 0 (0) | 2880 (97.96) | 0 |
v01002 | 2940 | 60 (2.04) | 2880 (97.96) | 0 (0) | 0 (0) | 2880 (97.96) | 0 |
v10000 | 2940 | 60 (2.04) | 2880 (97.96) | 0 (0) | 0 (0) | 2880 (97.96) | 0 |
v00004 | 2940 | 299 (10.17) | 2641 (89.83) | 140 (4.76) | 0 (0) | 2501 (85.07) | 1 |
v00005 | 2940 | 293 (9.97) | 2647 (90.03) | 163 (5.54) | 0 (0) | 2484 (84.49) | 1 |
v00006 | 2940 | 306 (10.41) | 2634 (89.59) | 76 (2.59) | 0 (0) | 2558 (87.01) | 1 |
v00007 | 2940 | 287 (9.76) | 2653 (90.24) | 72 (2.45) | 0 (0) | 2581 (87.79) | 1 |
v00008 | 2940 | 285 (9.69) | 2655 (90.31) | 120 (4.08) | 0 (0) | 2535 (86.22) | 1 |
v00009 | 2940 | 280 (9.52) | 2660 (90.48) | 63 (2.14) | 0 (0) | 2597 (88.33) | 1 |
v00109 | 2940 | 298 (10.14) | 2642 (89.86) | 69 (2.35) | 0 (0) | 2573 (87.52) | 1 |
v00010 | 2940 | 296 (10.07) | 2644 (89.93) | 81 (2.76) | 0 (0) | 2563 (87.18) | 1 |
v00011 | 2940 | 149 (5.07) | 2791 (94.93) | 69 (2.35) | 0 (0) | 2722 (92.59) | 0 |
v00012 | 2940 | 140 (4.76) | 2800 (95.24) | 85 (2.89) | 0 (0) | 2715 (92.35) | 0 |
v00013 | 2940 | 60 (2.04) | 2880 (97.96) | 0 (0) | 0 (0) | 2880 (97.96) | 0 |
v20000 | 2940 | 60 (2.04) | 2880 (97.96) | 0 (0) | 0 (0) | 2880 (97.96) | 0 |
v00014 | 2940 | 232 (7.89) | 2708 (92.11) | 69 (2.35) | 0 (0) | 2639 (89.76) | 1 |
v00015 | 2940 | 242 (8.23) | 2698 (91.77) | 72 (2.45) | 0 (0) | 2626 (89.32) | 1 |
v00016 | 2940 | 308 (10.48) | 2632 (89.52) | 0 (0) | 0 (0) | 2632 (89.52) | 1 |
v00017 | 2940 | 60 (2.04) | 2880 (97.96) | 0 (0) | 0 (0) | 2880 (97.96) | 0 |
v30000 | 2940 | 60 (2.04) | 2880 (97.96) | 0 (0) | 0 (0) | 2880 (97.96) | 0 |
v00018 | 2940 | 148 (5.03) | 2792 (94.97) | 380 (12.93) | 0 (0) | 2412 (82.04) | 1 |
v01018 | 2924 | 159 (5.44) | 2765 (94.56) | 416 (14.23) | 0 (0) | 2349 (80.34) | 1 |
v00019 | 2924 | 198 (6.77) | 2726 (93.23) | 413 (14.12) | 0 (0) | 2313 (79.1) | 1 |
v00020 | 2924 | 202 (6.91) | 2722 (93.09) | 432 (14.77) | 0 (0) | 2290 (78.32) | 1 |
v00021 | 2924 | 236 (8.07) | 2688 (91.93) | 428 (14.64) | 0 (0) | 2260 (77.29) | 1 |
v00022 | 2924 | 224 (7.66) | 2700 (92.34) | 448 (15.32) | 0 (0) | 2252 (77.02) | 1 |
v00023 | 2924 | 247 (8.45) | 2677 (91.55) | 451 (15.42) | 0 (0) | 2226 (76.13) | 1 |
v00024 | 2924 | 259 (8.86) | 2665 (91.14) | 449 (15.36) | 0 (0) | 2216 (75.79) | 1 |
v00025 | 2924 | 1681 (57.49) | 1243 (42.51) | 513 (17.54) | 0 (0) | 730 (24.97) | 1 |
v00026 | 2924 | 320 (10.94) | 2604 (89.06) | 481 (16.45) | 0 (0) | 2123 (72.61) | 1 |
v00027 | 2924 | 289 (9.88) | 2635 (90.12) | 499 (17.07) | 1113 (38.06) | 1023 (56.49) | 1 |
v00028 | 2924 | 311 (10.64) | 2613 (89.36) | 515 (17.61) | 0 (0) | 2098 (71.75) | 1 |
v00029 | 2924 | 350 (11.97) | 2574 (88.03) | 519 (17.75) | 1066 (36.46) | 989 (53.23) | 1 |
v00030 | 2924 | 1809 (61.87) | 1115 (38.13) | 550 (18.81) | 0 (0) | 565 (19.32) | 1 |
v00031 | 2924 | 386 (13.2) | 2538 (86.8) | 556 (19.02) | 0 (0) | 1982 (67.78) | 1 |
v00032 | 2924 | 382 (13.06) | 2542 (86.94) | 332 (11.35) | 0 (0) | 2210 (75.58) | 1 |
v00033 | 2924 | 60 (2.05) | 2864 (97.95) | 0 (0) | 0 (0) | 2864 (97.95) | 0 |
v40000 | 2924 | 60 (2.05) | 2864 (97.95) | 0 (0) | 0 (0) | 2864 (97.95) | 0 |
v00034 | 2924 | 453 (15.49) | 2471 (84.51) | 299 (10.23) | 0 (0) | 2172 (74.28) | 1 |
v00035 | 2864 | 479 (16.72) | 2385 (83.28) | 324 (11.31) | 0 (0) | 2061 (71.96) | 1 |
v00036 | 2864 | 491 (17.14) | 2373 (82.86) | 325 (11.35) | 0 (0) | 2048 (71.51) | 1 |
v00037 | 2864 | 483 (16.86) | 2381 (83.14) | 374 (13.06) | 0 (0) | 2007 (70.08) | 1 |
v00038 | 2864 | 552 (19.27) | 2312 (80.73) | 374 (13.06) | 0 (0) | 1938 (67.67) | 1 |
v00039 | 2864 | 563 (19.66) | 2301 (80.34) | 389 (13.58) | 0 (0) | 1912 (66.76) | 1 |
v00040 | 2864 | 531 (18.54) | 2333 (81.46) | 401 (14) | 0 (0) | 1932 (67.46) | 1 |
v00041 | 2864 | 560 (19.55) | 2304 (80.45) | 427 (14.91) | 0 (0) | 1877 (65.54) | 1 |
v00042 | 2864 | 60 (2.09) | 2804 (97.91) | 0 (0) | 0 (0) | 2804 (97.91) | 0 |
v50000 | 2864 | 60 (2.09) | 2804 (97.91) | 0 (0) | 0 (0) | 2804 (97.91) | 0 |
The table above is getting very long. Another possibility is to use
paged output of data frames. Therefore a simple line in the YAML-header
must be added (df_print: paged) under output. A simple call of
the data frame allows then the browsing of rows and columns.
Alternatively, you may use the DT
package, even as default
printer for data.frame
s.
tab_ex1$SummaryTable
To use DT
, you would have to add a chunk like the
following to your R-Markdown file:
```{r include=FALSE}
library(knitr)
library(DT)
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE)
knit_print.data.frame = function(x, ...) { knit_print(DT::datatable(x), ...) }
registerS3method("knit_print", "data.frame", knit_print.data.frame)
```
Apparently the column Observations N is identical in all rows of the table and can be removed without loss of information. This is achieved via a \(-\) operator:
tab_ex1$SummaryTable %>%
select(-'Observations N')
The column Variables contains rather technical names of
variables not enabling for interpretation of the content. For this
reason, all dataquieR
functions have an option called
label_col
. The selected label can be any column in the meta
data, our model suggests to name that column LABEL
. For
time being, the labels must be valid in R formulas, which means, they
should basically not contain characters other than letters or numbers.
We plan to relax this condition.
tab_ex2 <- com_item_missingness(study_data = sd1,
meta_data = md1,
threshold_value = 90,
label_col = "LABEL",
include_sysmiss = TRUE,
show_causes = FALSE)
#> Warning: The argument "show_causes" has been deprecated. It will be ignored and in a future version be removed.
#> Warning: There are 60 meassurements of "CENTER_0" for participants not being part of one of the segments "PART_STUDY"
#> Warning: There are 60 meassurements of "PSEUDO_ID" for participants not being part of one of the segments "PART_STUDY"
#> Warning: There are 16 meassurements of "INT_DT_0" for participants not being part of one of the segments "PART_STUDY", "PART_INTERVIEW"
#> Warning: There are 76 meassurements of "QUEST_DT_0" for participants not being part of one of the segments "PART_STUDY", "PART_INTERVIEW", "PART_QUESTIONNAIRE"
tab_ex2$SummaryTable %>%
select(-'Observations N')
Maybe, we want to sort columns or rows. This can also be achieved by
dplyr
functions:
tab_ex2$SummaryTable %>%
select(-'Observations N') %>%
arrange(desc(`Measurements N (%)`))
Sorting by the number of observations is a bit complicated up to now,
because currently dataquieR
returns text in the columns.
The text can be extracted using the following code:
splitted_measurements_col <- # this will be a list of character vectors of length 2 (part before and part after the '(' character for each row)
strsplit(tab_ex2$SummaryTable$`Measurements N (%)`, # the measurement count column
'(', # splited at the opening bracket
fixed = TRUE # fixed string match, no pattern match
)
percent_part_in_col <- # this will be a character vector of the percentages
unlist( # we don't want to have a list but a vector of percentages as usually for data frame columns
lapply(splitted_measurements_col, `[[`, 2) # select the second entry of each entry in the list
)
sort_order <- as.numeric(sub(')', '', percent_part_in_col, fixed = TRUE)) # remove the closing bracket and convert the characters to numbers
tab_ex2$SummaryTable %>%
select(-'Observations N') %>%
arrange(desc(sort_order))
Maybe the columns should be in some other order too:
tab_ex2$SummaryTable %>%
select(-'Observations N') %>% # the GRADING column must be removed without using the everyting() in the next row, so we keep to lines.
select(`Variables`, `Measurements N (%)`, everything()) # everything adds all columns not yet available.
ggplot2
The versatile ggplot2
package provides possibilities to
modify graphics after they have been created, to render them in vector
formats and even to extract the underlying data. It is handy for
interfacing with user code. Also, ggplot2
has a
comprehensive concept behind, a graphics grammar, which makes it highly
structured and using its code easy to understand. For more advice about
the ggplot2
package, we refer kindly to the vignettes of
that package:
browseVignettes(package = "ggplot2")
The package dataquieR
generates two types of
ggplot-objects.
The latter is used if several plots are generated, typically for each
variable of the study data. As the handling and manipulation of a single
SummaryPlot is more straightforward we exemplify a plot
list using the dataquieR
function
acc_distributions
:
ex1 <- acc_distributions(resp_vars = NULL,
group_vars = NULL,
label_col = "LABEL",
study_data = sd1,
meta_data = md1)
#> Warning: All variables defined to be integer or float in the metadata are used by acc_distributions.
#> Warning: Variable 'PART_STUDY' (resp_vars) has fewer distinct values than required
#> Warning: Variable 'PART_PHYS_EXAM' (resp_vars) has fewer distinct values than required
#> Warning: Variable 'PART_LAB' (resp_vars) has fewer distinct values than required
#> Warning: Variable 'MEDICATION_0' (resp_vars) has fewer distinct values than required
#> Warning: In "resp_vars", variables "PART_LAB", "PART_PHYS_EXAM", "MEDICATION_0", "PART_STUDY" were excluded.
#> Warning: For GLOBAL_HEALTH_VAS_0, ARM_CIRC_DISC_0, ARM_CUFF_0, BSG_0, DEV_NO_0, EDUCATION_0, EDUCATION_1, FAM_STAT_0, MARRIED_0, N_CHILD_0, EATING_PREFS_0, MEAT_CONS_0, SMOKING_0, SMOKE_SHOP_0, N_INJURIES_0, N_BIRTH_0, INCOME_GROUP_0, PREGNANT_0, N_ATC_CODES_0, PART_INTERVIEW, ITEM_1_0, ITEM_2_0, ITEM_3_0, ITEM_5_0, ITEM_6_0, ITEM_7_0, ITEM_8_0, PART_QUESTIONNAIRE, there is no metadata on expected location or expected proportions available.
This yields a set of 39 figures! All of which are
ggplot2
objects:
unique(unlist(lapply(ex1$SummaryPlotList, class)))
#> [1] "gg" "ggplot"
There is a package named ggedit
for editing
ggplot2
-objects easily. Nevertheless, in the following the
basics to do so are discussed. For more complex adjustments, we
recommend now ggedit
.
To list them all, a simple print of the
ex1$SummaryPlotList
can be used, but this will also print
the “normal” output of printing a list, i.e. the names or numbers of all
its elements. To avoid this, you can simply print each element of the
list separately:
# for (i in 1:length(ex1$SummaryPlotList)) # substituted by the next row to shorten the output of this vignette:
for (i in head(seq_along(ex1$SummaryPlotList), 4)) {
print(ex1$SummaryPlotList[[i]])
}
Of course, an apply-iteration would be possible too, but for the means of plotting figures, the for loop perfectly fits.
Using this code, all figures are printed one below the other. To have
them in columns, the chunk-option out.width can be handy.
rmarkdown
plots figures aside, if the current row is not
yet filled, so something like out.width=c('50%', '50%')
can
be used to achieve a two-column image list.
Another possibility to arrange list of plots is the
ggpubr
package which handles a specific formal for lists of
ggplot2
objects.
ggpubr::ggarrange(plotlist = ex1$SummaryPlotList[1:4])
An alternative to ggpubr
is the
patchwork
-package, which provides a very intuitive way of
aligning ggplot2
graphics:
library(patchwork)
p1 <- ex1$SummaryPlotList[[1]]
p2 <- ex1$SummaryPlotList[[2]]
p3 <- ex1$SummaryPlotList[[3]]
p1 | (p2 / p3)
See the
patchwork
vignette for more details.
Please note, that the plot has obviously been rotated, so that the x/y-coordinates may not be always intuitively used in the following. There are reasons for rotating histograms that way, but in the following, one example will re-rotate the plot to the more common presentation having the counts on the y-axis.
As an example for manipulating figures, first we want to add a red
line. This is easily achieved with ggplot’s +
-operator. We
use the annotate
-function instead of the
geom_*
-functions to draw objects not directly mapped (by
aes
) to specific data points/samples to avoid redundant
plotting the very same object for each data point / sample again:
library(ggplot2)
print(
ex1$SummaryPlotList[[3]] +
annotate("segment", x = -Inf, xend = Inf, y = 0, yend = 0, colour = "red")
)
Then, we may like to highlight the largest bin in red. For this, we
need to access the bins calculated by geom_histogram
which
the ggplot_build
function makes accessible for
ggplot2
-objects:
p <- ex1$SummaryPlotList[[3]] # choose the third figure generated by dataquieR.
x <- ggplot_build(p) # make its graphical properties accessible.
largest_bin <- which.max(x[["data"]][[1]][["count"]]) # find the largest bin.
print(x[["data"]][[1]][largest_bin, c("xmin", "xmax", "ymin", "ymax")]) # this would print out the cartesian coordinates of the largest bin.
#> xmin xmax ymin ymax
#> 17 50.55 51.45 0 264
# see also the helpful contribution there: https://community.rstudio.com/t/geom-histogram-max-bin-height/10026
print( # print
p + # the plot
annotate("segment", x = -Inf, xend = Inf, y = 0, yend = 0, colour = "red") + # annotate it with the red line again
annotate("rect", # and highlight the largest bin by overplotting it with red framed black rectangle.
xmin = x[["data"]][[1]]$xmin[[largest_bin]],
xmax = x[["data"]][[1]]$xmax[[largest_bin]],
ymin = x[["data"]][[1]]$ymin[[largest_bin]],
ymax = x[["data"]][[1]]$ymax[[largest_bin]], color = "red")
)
Unfortunately, the annotate
function’s documentation is
maybe a bit sparse. The geom
-parameter refers to existing
implementations of graphics in ggplot2
all of which are
prefixed with geom_
. Usually they extract their coordinates
from the data using the mapping given in the aes
-parameter
of the whole ggplot2
object or for the specific
geom
. A useful geom_
s besides
segment
and rect
is text
for
really annotating the plot:
print( # print
p + # the plot
annotate("segment", x = -Inf, xend = Inf, y = 0, yend = 0, colour = "red") + # annotate it with the red line again
annotate("rect", # and highlight the largest bin by overplotting it with red framed black rectangle.
xmin = x[["data"]][[1]]$xmin[[largest_bin]],
xmax = x[["data"]][[1]]$xmax[[largest_bin]],
ymin = x[["data"]][[1]]$ymin[[largest_bin]],
ymax = x[["data"]][[1]]$ymax[[largest_bin]], color = "red") +
annotate("text", label = "Largest bin", x = x[["data"]][[1]]$xmax[[largest_bin]], y = x[["data"]][[1]]$ymax[[largest_bin]], angle = 270, vjust = -.5)
)
You may see the documentation of ggplot2::annotate
for
some examples.
Coordinates are given in the same coordinate system that is shown in
the plot, so drawing a line at 100
observations is as easy
as directly choosing 100
as y
coordinate.
print( # print
p + # the plot
annotate("segment", x = -Inf, xend = Inf, y = 100, yend = 100, colour = "red") + # annotate it with the red line again
annotate("segment", x = -Inf, xend = Inf, y = 0, yend = 0, colour = "red") + # annotate it with the red line again
annotate("rect", # and highlight the largest bin by overplotting it with red framed black rectangle.
xmin = x[["data"]][[1]]$xmin[[largest_bin]],
xmax = x[["data"]][[1]]$xmax[[largest_bin]],
ymin = x[["data"]][[1]]$ymin[[largest_bin]],
ymax = x[["data"]][[1]]$ymax[[largest_bin]], color = "red") +
annotate("text", label = "Largest bin", x = x[["data"]][[1]]$xmax[[largest_bin]], y = x[["data"]][[1]]$ymax[[largest_bin]], angle = 270, vjust = -.5)
)
As promised above, we will now re-rotate the whole plot.
p2 <- p + # the plot
annotate("segment", x = -Inf, xend = Inf, y = 100, yend = 100, colour = "red") + # annotate it with the red line again
annotate("segment", x = -Inf, xend = Inf, y = 0, yend = 0, colour = "red") + # annotate it with the red line again
annotate("rect", # and highlight the largest bin by overplotting it with red framed black rectangle.
xmin = x[["data"]][[1]]$xmin[[largest_bin]],
xmax = x[["data"]][[1]]$xmax[[largest_bin]],
ymin = x[["data"]][[1]]$ymin[[largest_bin]],
ymax = x[["data"]][[1]]$ymax[[largest_bin]], color = "red") +
annotate("text", label = "Largest bin", x = x[["data"]][[1]]$xmax[[largest_bin]], y = x[["data"]][[1]]$ymax[[largest_bin]], angle = 0, vjust = -.5)
suppressMessages(p2 + coord_cartesian()) # this restores the original cartesian coordinate system replacing the flipped one introduced by acc_distributions However, it emits a message about replacing the coordinate system, which we can suppress here with suppressMessages.
Note, that neither ggplot2::coord_flip
nor
ggpubr::rotate
can solve this issue. These functions are
not aware of already-rotated plots, so the following will not
rotate the plot back:
p2 + coord_flip() # does not rotate the plot but prints
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.
# Coordinate system already present. Adding new coordinate
# system, which will replace the existing one.
p2 + ggpubr::rotate() # does not rotate the plot but prints
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.
# Coordinate system already present. Adding new coordinate
# system, which will replace the existing one.
All functions of the dataquieR
use the data as they are
imported, i.e. variables of the study data can be examined and used for
grouping/stratification of results. All information for these variables
must be attached to the metadata. In some situations, particularly
during exploitative data quality reporting, it is necessary to use a new
calculated/transformed variable. Naturally, respective information is
not defined in the metadata. This peculiarity would preclude the use of
such calculated or transformed variables in data quality reporting.
To illustrate the need for a helper function is shown with the
following example from com_segment_missingness()
:
The SummaryPlot shows the frequency of observations in which all measurements of respective study segments are missing.
Exploring the segment missingness over time
would require another variable in the study data. We will generate such
a variable using the lubridate
package.
sd1$exq <- as.integer(lubridate::quarter(sd1$v00013))
table(sd1$exq)
#>
#> 1 2 3 4
#> 724 713 776 727
Information regarding this variable is then added to a copy of the
metadata (md2
) using the dataquieR
function prep_add_to_meta()
:
md2 <- dataquieR::prep_add_to_meta(VAR_NAMES = "exq",
DATA_TYPE = "integer",
LABEL = "EX_QUARTER_0",
VALUE_LABELS = "1 = 1st | 2 = 2nd | 3 = 3rd | 4 = 4th",
VARIABLE_ROLE = "process",
MISSING_LIST = "",
meta_data = md1)
MissSegs <- com_segment_missingness(study_data = sd1,
meta_data = md2,
threshold_value = 1,
label_col = LABEL,
group_vars = "EX_QUARTER_0",
direction = "high",
exclude_roles = "process")
#> Warning: No specification of color gradient direction found. The function interprets values above the threshold as violations.
#> Warning: "direction" is deprecated.
#> Warning: Study variables: "ARM_CUFF_0", "USR_VO2_0", "USR_BP_0", "EXAM_DT_0", "DEV_NO_0", "LAB_DT_0", "USR_SOCDEM_0", "INT_DT_0", "QUEST_DT_0" are not considered due to their VARIABLE_ROLE.
MissSegs$SummaryPlot