The des_summary
function provides descriptive statistics
for numerical and categorical variables in the study data.
Depending of the type of data, the function provides the appropriate
measures of central tendency (i.e., mean, median, and mode); measures of
dispersion (i.e., standard deviation, interquartile range, mean absolute
deviation, range of values, and coefficient of variation); information
on the number of categories and their frequency, on the shape of the
distribution (skewness and kurtosis), and on missing data. It also
provides plots to give an overview of the data distribution. The derived
functions des_summary_categorical
and
des_summary_continuous
only provide the appropriate
descriptive statistics for variables of the matching type of data in the
function name. The functions also work without metadata.
des_summary(
resp_vars = NULL,
study_data = sd1,
label_col = LABEL,
meta_data = md1
)
The function has the following arguments:
study_data
are assessed;prep_load_workbook_like_file()
,
prep_load_folder_with_metadata()
, or
prep_get_data_frame()
;To illustrate the output, we use the example synthetic data and metadata that are bundled with the dataquieR package. See the introductory tutorial for instructions on importing these files into R, as well as details on their structure and contents.
prep_load_workbook_like_file("meta_data_v2")
sd1 <- prep_get_data_frame("study_data")
des_sum <- des_summary(
study_data = sd1,
label_col = LABEL
)
The function generates 2 outputs SummaryData
and
SummaryTable
, that are exactly the same in this case, but
used differently in the creation of a report.
Output 1: Summary data frame
The summary data frame is called using
des_sum$SummaryData
:
Either as an interactive data.tables
table:
des_sum
Or as a kable
:
Variables | Type | STUDY_SEGMENT | Mean | SD | Median | Mode | IQR (Quartiles) | MAD | Range (Min - Max) | CV | Skewness (SE) | Kurtosis | No. categories (incl.NAs) | Frequency table | Valid | Missing | Graph | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 |
Examination center CENTER_0 v00000 |
nominal, integer | STUDY | Berlin | 5 |
|
3000 (100%) | 0 (0%) |
|
|||||||||||||||||||||
31 |
Sex B/L SEX_0 v00002 |
nominal, integer | STUDY | females | 3 |
|
2940 (98%) | 60 (2%) |
|
|||||||||||||||||||||
42 |
Age B/L AGE_0 v00003 |
ratio, integer | STUDY | 49.914 | 4.423 | 50 | 51 | 6 (Q1 = 47 | Q3 = 53) | 4.448 | 30 (33 - 63) | 8.862 | -0.037 (0.045) | -0.176 | 2940 (98%) | 60 (2%) |
|
||||||||||||||
15 |
Age group B/L AGE_GROUP_0 v00103 |
ordinal, string | STUDY | 50-59 | 50-59 | 1 (Q1 = 40-49 | Q3 = 50-59) | (30-39 - 60-69) | 5 |
|
2940 (98%) | 60 (2%) |
|
||||||||||||||||||
20 |
Age F/U AGE_1 v01003 |
ratio, integer | STUDY | 49.872 | 4.429 | 50 | 51 | 6 (Q1 = 47 | Q3 = 53) | 4.448 | 30 (33 - 63) | 8.881 | -0.032 (0.045) | -0.186 | 2940 (98%) | 60 (2%) |
|
||||||||||||||
25 |
Sex F/U SEX_1 v01002 |
nominal, integer | STUDY | females | 3 |
|
2940 (98%) | 60 (2%) |
|
|||||||||||||||||||||
43 |
Study consent PART_STUDY v10000 |
nominal, integer | STUDY | yes | 2 |
|
2940 (98%) | 60 (2%) |
|
|||||||||||||||||||||
53 |
Systolic blood pressure SBP_0 v00004 |
ratio, float | PHYS_EXAM | 126.516 | 9.613 | 127 | 130 | 13 (Q1 = 120 | Q3 = 133) | 8.896 | 63 (97 - 160) | 7.598 | 0.064 (0.045) | -0.564 | 2561 (85.37%) | 439 (14.63%) |
|
||||||||||||||
2 |
Diastolic blood pressure DBP_0 v00005 |
ratio, float | PHYS_EXAM | 81.29 | 9.214 | 81 | 80 | 12 (Q1 = 75 | Q3 = 87) | 8.896 | 57 (54 - 111) | 11.335 | 0.081 (0.045) | -0.556 | 2544 (84.8%) | 456 (15.2%) |
|
||||||||||||||
3 |
Self-reported global health GLOBAL_HEALTH_VAS_0 v00006 |
ratio, float | PHYS_EXAM | 5.027 | 2.918 | 5 | 3.2 | 5.075 (Q1 = 2.5 | Q3 = 7.575) | 3.706 | 10 (0 - 10) | 58.051 | -0.002 (0.045) | -1.437 | 2618 (87.27%) | 382 (12.73%) |
|
||||||||||||||
4 |
Known asthma ASTHMA_0 v00007 |
nominal, integer | PHYS_EXAM | no | 3 |
|
2641 (88.03%) | 359 (11.97%) |
|
|||||||||||||||||||||
9 |
Aerobic capacity category VO2_CAPCAT_0 v00008 |
ordinal, string | PHYS_EXAM | good | excellent | 3 (Q1 = excellent | Q3 = restricted) | (excellent - pathological) | 6 |
|
2595 (86.5%) | 405 (13.5%) |
|
||||||||||||||||||
10 |
Upper arm circumference ARM_CIRC_0 v00009 |
ratio, float | PHYS_EXAM | 25.033 | 3.958 | 25 | 24 | 6 (Q1 = 22 | Q3 = 28) | 4.448 | 27 (11 - 38) | 15.81 | -0.024 (0.045) | -0.359 | 2657 (88.57%) | 343 (11.43%) |
|
||||||||||||||
11 |
Upper arm circumference cat ARM_CIRC_DISC_0 v00109 |
ordinal, integer | PHYS_EXAM | (20,30] | (20,30] | 0 (Q1 = (20,30] | Q3 = (20,30]) | ((-Inf,20] - (30, Inf]) | 4 |
|
2633 (87.77%) | 367 (12.23%) |
|
||||||||||||||||||
12 |
Upper arm circumference device ARM_CUFF_0 v00010 |
ordinal, integer | PHYS_EXAM | (20,30] | (20,30] | 0 (Q1 = (20,30] | Q3 = (20,30]) | ((-Inf,20] - (30, Inf]) | 4 |
|
2623 (87.43%) | 377 (12.57%) |
|
||||||||||||||||||
13 |
Aerobic capacity examiner USR_VO2_0 v00011 |
nominal, string | PHYS_EXAM | USR_321 | 16 |
|
2782 (92.73%) | 218 (7.27%) |
|
|||||||||||||||||||||
14 |
Blood pressure examiner USR_BP_0 v00012 |
nominal, string | PHYS_EXAM | USR_301 | 16 |
|
2775 (92.5%) | 225 (7.5%) |
|
|||||||||||||||||||||
16 |
Examination date and time EXAM_DT_0 v00013 |
interval, datetime | PHYS_EXAM | 2018-07-02 10:09:59 UTC | 3.45 months | 2018-07-05 19:45:30 UTC | 2018-03-21 20:44:05 UTC and other 4 dates | 5.84 months (Q1 = 2018-04-03 04:24:05 UTC | Q3 = 2018-09-27 19:32:04 UTC) | 4.38 months | 11 months, 4 weeks, 2 days (2018-01-01 UTC - 2018-12-31 UTC) | 2940 (98%) | 60 (2%) |
|
|||||||||||||||||
18 |
Physical exam consent PART_PHYS_EXAM v20000 |
nominal, integer | PHYS_EXAM | yes | 2 |
|
2940 (98%) | 60 (2%) |
|
|||||||||||||||||||||
19 |
C-reactive protein CRP_0 v00014 |
ratio, float | LAB | 2.888 | 1.805 | 2.587 | 0.16 | 2.27 (Q1 = 1.608 | Q3 = 3.878) | 1.637 | 11.894 (0.118 - 12.012) | 62.507 | 0.897 (0.045) | 0.998 | 2699 (89.97%) | 301 (10.03%) |
|
||||||||||||||
21 |
Erythrocyte sedimentation rate BSG_0 v00015 |
ratio, float | LAB | 14.857 | 12.135 | 11 | 10 | 14 (Q1 = 6 | Q3 = 20) | 10.378 | 96 (0 - 96) | 81.677 | 1.377 (0.045) | 2.678 | 2686 (89.53%) | 314 (10.47%) |
|
||||||||||||||
22 |
Device ID DEV_NO_0 v00016 |
nominal, integer | LAB | 2 | 6 |
|
2692 (89.73%) | 308 (10.27%) |
|
|||||||||||||||||||||
23 |
Lab analysis date and time LAB_DT_0 v00017 |
interval, datetime | LAB | 2018-07-02 12:00:36 UTC | 3.45 months | 2018-07-05 21:50:00 UTC | 2018-01-01 02:00:00 UTC and other 60 dates | 5.83 months (Q1 = 2018-04-03 06:21:05 UTC | Q3 = 2018-09-27 20:03:04 UTC) | 4.38 months | 11 months, 4 weeks, 2 days, 1 minute (2018-01-01 02:00:00 UTC - 2018-12-31 02:01:00 UTC) | 2940 (98%) | 60 (2%) |
|
|||||||||||||||||
24 |
Lab analysis consent PART_LAB v30000 |
nominal, integer | LAB | yes | 2 |
|
2940 (98%) | 60 (2%) |
|
|||||||||||||||||||||
26 |
Highest educational level B/L EDUCATION_0 v00018 |
ordinal, integer | INTERVIEW | uppersecond | uppersecond | 2 (Q1 = secondary | Q3 = postsecond) | (pre-primary - secondtertiary) | 8 |
|
2472 (82.4%) | 528 (17.6%) |
|
||||||||||||||||||
28 |
Highest educational level F/U EDUCATION_1 v01018 |
ordinal, integer | INTERVIEW | uppersecond | uppersecond | 2 (Q1 = secondary | Q3 = postsecond) | (pre-primary - secondtertiary) | 8 |
|
2422 (80.73%) | 578 (19.27%) |
|
||||||||||||||||||
29 |
Marital status FAM_STAT_0 v00019 |
nominal, integer | INTERVIEW | NA | 1 |
|
0 (0%) | 3000 (100%) |
|
|||||||||||||||||||||
30 |
Currently married MARRIED_0 v00020 |
nominal, integer | INTERVIEW | no | 3 |
|
2366 (78.87%) | 634 (21.13%) |
|
|||||||||||||||||||||
32 |
Number of children N_CHILD_0 v00021 |
ratio, integer | INTERVIEW | 2.499 | 1.53 | 2 | 2 | 2 (Q1 = 1 | Q3 = 3) | 1.483 | 9 (0 - 9) | 61.21 | 0.491 (0.045) | -0.291 | 2336 (77.87%) | 664 (22.13%) |
|
||||||||||||||
33 |
Eating preferences EATING_PREFS_0 v00022 |
nominal, integer | INTERVIEW | none | 4 |
|
2328 (77.6%) | 672 (22.4%) |
|
|||||||||||||||||||||
34 |
Meat consumption MEAT_CONS_0 v00023 |
ordinal, integer | INTERVIEW | 1-2d a week | never | 2 (Q1 = never | Q3 = 3-4d a week) | (never - daily) | 6 |
|
2302 (76.73%) | 698 (23.27%) |
|
||||||||||||||||||
35 |
Current smoker SMOKING_0 v00024 |
nominal, integer | INTERVIEW | no | 3 |
|
2292 (76.4%) | 708 (23.6%) |
|
|||||||||||||||||||||
36 |
Purchasing tobacco products SMOKE_SHOP_0 v00025 |
ordinal, integer | INTERVIEW | 3-4d a week | 3-4d a week | 2 (Q1 = 1-2d a week | Q3 = 5-6d a week) | (never - daily) | 6 |
|
782 (26.07%) | 2218 (73.93%) |
|
||||||||||||||||||
37 |
Number of injuries N_INJURIES_0 v00026 |
ratio, integer | INTERVIEW | 4.588 | 2.422 | 4 | 4 | 3 (Q1 = 3 | Q3 = 6) | 2.965 | 14 (0 - 14) | 52.792 | 0.474 (0.045) | -0.572 | 2199 (73.3%) | 801 (26.7%) |
|
||||||||||||||
38 |
Number of births N_BIRTH_0 v00027 |
ratio, integer | INTERVIEW | 3.458 | 1.771 | 3 | 3 | 2 (Q1 = 2 | Q3 = 4) | 1.483 | 12 (0 - 12) | 51.212 | 0.211 (0.045) | -1.703 | 1099 (36.63%) | 1901 (63.37%) |
|
||||||||||||||
39 |
Income group INCOME_GROUP_0 v00028 |
ordinal, integer | INTERVIEW | [30-50k) | [30-50k) | 2 (Q1 = [10-30k) | Q3 = [50-70k)) | (below 10k - above 90k) | 7 |
|
2174 (72.47%) | 826 (27.53%) |
|
||||||||||||||||||
40 |
Currently pregnant PREGNANT_0 v00029 |
nominal, integer | INTERVIEW | no | 3 |
|
1065 (35.5%) | 1935 (64.5%) |
|
|||||||||||||||||||||
41 |
Medication use MEDICATION_0 v00030 |
nominal, integer | INTERVIEW | yes | 2 |
|
292 (9.73%) | 2708 (90.27%) |
|
|||||||||||||||||||||
44 |
Number of ATC codes N_ATC_CODES_0 v00031 |
ratio, integer | INTERVIEW | 2.262 | 2.726 | 1 | 0 | 3 (Q1 = 0 | Q3 = 3) | 1.483 | 22 (0 - 22) | 120.479 | 1.406 (0.045) | 3.055 | 2058 (68.6%) | 942 (31.4%) |
|
||||||||||||||
45 |
Sociodemographics examiner USR_SOCDEM_0 v00032 |
nominal, string | INTERVIEW | USR_321 | 15 |
|
2114 (70.47%) | 886 (29.53%) |
|
|||||||||||||||||||||
46 |
Interview date and time INT_DT_0 v00033 |
interval, datetime | INTERVIEW | 2018-07-02 12:40:34 UTC | 3.45 months | 2018-07-05 22:27:30 UTC | 2018-08-15 22:55:34 UTC 2018-08-24 01:57:42 UTC | 5.84 months (Q1 = 2018-04-03 06:44:50 UTC | Q3 = 2018-09-27 22:05:04 UTC) | 4.38 months | 11 months, 4 weeks, 1 day, 23 hours, 56 minutes (2018-01-01 02:24:00 UTC - 2018-12-31 02:20:00 UTC) | 2940 (98%) | 60 (2%) |
|
|||||||||||||||||
47 |
Interview consent PART_INTERVIEW v40000 |
nominal, integer | INTERVIEW | yes | 3 |
|
2940 (98%) | 60 (2%) |
|
|||||||||||||||||||||
48 |
Item 1 ITEM_1_0 v00034 |
ratio, integer | QUESTIONNAIRE | 3.037 | 1.764 | 3 | 3 | 2 (Q1 = 2 | Q3 = 4) | 1.483 | 9 (0 - 9) | 58.085 | 0.414 (0.045) | -0.589 | 2248 (74.93%) | 752 (25.07%) |
|
||||||||||||||
49 |
Item 2 ITEM_2_0 v00035 |
ratio, integer | QUESTIONNAIRE | 2.988 | 1.701 | 3 | 2 | 2 (Q1 = 2 | Q3 = 4) | 1.483 | 10 (0 - 10) | 56.928 | 0.408 (0.045) | -0.648 | 2197 (73.23%) | 803 (26.77%) |
|
||||||||||||||
50 |
Item 3 ITEM_3_0 v00036 |
ratio, integer | QUESTIONNAIRE | 3.014 | 1.718 | 3 | 3 | 2 (Q1 = 2 | Q3 = 4) | 1.483 | 10 (0 - 10) | 56.99 | 0.41 (0.045) | -0.606 | 2184 (72.8%) | 816 (27.2%) |
|
||||||||||||||
51 |
Item 4 ITEM_4_0 v00037 |
ratio, integer | QUESTIONNAIRE | 3 | 1.721 | 3 | 3 | 2 (Q1 = 2 | Q3 = 4) | 1.483 | 10 (0 - 10) | 57.37 | 0.435 (0.045) | -0.556 | 2143 (71.43%) | 857 (28.57%) |
|
||||||||||||||
52 |
Item 5 ITEM_5_0 v00038 |
ratio, integer | QUESTIONNAIRE | 6.021 | 2.374 | 6 | 6 | 4 (Q1 = 4 | Q3 = 8) | 2.965 | 10 (0 - 10) | 39.424 | -0.183 (0.045) | -1.352 | 2074 (69.13%) | 926 (30.87%) |
|
||||||||||||||
5 |
Item 6 ITEM_6_0 v00039 |
ratio, integer | QUESTIONNAIRE | 5.948 | 2.371 | 6 | 6 | 4 (Q1 = 4 | Q3 = 8) | 2.965 | 10 (0 - 10) | 39.857 | -0.119 (0.045) | -1.414 | 2048 (68.27%) | 952 (31.73%) |
|
||||||||||||||
7 |
Item 7 ITEM_7_0 v00040 |
ratio, integer | QUESTIONNAIRE | 6.037 | 2.401 | 6 | 6 | 4 (Q1 = 4 | Q3 = 8) | 2.965 | 10 (0 - 10) | 39.769 | -0.174 (0.045) | -1.398 | 2068 (68.93%) | 932 (31.07%) |
|
||||||||||||||
6 |
Item 8 ITEM_8_0 v00041 |
ratio, integer | QUESTIONNAIRE | 5.895 | 2.397 | 6 | 6 | 4 (Q1 = 4 | Q3 = 8) | 2.965 | 10 (0 - 10) | 40.67 | -0.097 (0.045) | -1.487 | 2013 (67.1%) | 987 (32.9%) |
|
||||||||||||||
27 |
Questionnaire date and time QUEST_DT_0 v00042 |
interval, datetime | QUESTIONNAIRE | 2018-08-03 23:09:06 UTC | 4.59 months | 2018-07-29 21:52:59 UTC | 2017-12-31 22:59:59 UTC | 6.31 months (Q1 = 2018-04-20 11:41:02 UTC | Q3 = 2018-10-29 13:49:41 UTC) | 4.66 months | 2 years, 10 months, 1 week, 14 hours, 43 minutes, 29.47 seconds (2017-12-31 22:59:59 UTC - 2020-11-08 13:43:28 UTC) | 2940 (98%) | 60 (2%) |
|
|||||||||||||||||
8 |
Questionnaire consent PART_QUESTIONNAIRE v50000 |
nominal, integer | QUESTIONNAIRE | yes | 3 |
|
2940 (98%) | 60 (2%) |
|