The des_summary
function provides descriptive statistics
for numerical and categorical variables in the study data.
Depending of the type of data, the function provides the appropriate
measures of central tendency (i.e., mean, median, and mode); measures of
dispersion (i.e., standard deviation, interquartile range, mean absolute
deviation, range of values, and coefficient of variation); information
on the number of categories and their frequency, on the shape of the
distribution (skewness and kurtosis), and on missing data. It also
provides plots to give an overview of the data distribution.
des_summary(
resp_vars = NULL,
study_data = sd1,
label_col = LABEL,
meta_data = md1
)
The function has the following arguments:
study_data
are assessed;prep_load_workbook_like_file()
,
prep_load_folder_with_metadata()
, or
prep_get_data_frame()
;To illustrate the output, we use the example synthetic data and metadata that are bundled with the dataquieR package. See the introductory tutorial for instructions on importing these files into R, as well as details on their structure and contents.
prep_load_workbook_like_file("meta_data_v2")
sd1 <- prep_get_data_frame("study_data")
des_sum <- des_summary(
study_data = sd1,
label_col = LABEL
)
The function generates 2 outputs SummaryData
and
SummaryTable
, that are exactly the same in this case, but
used differently in the creation of a report.
Output 1: Summary data frame
The summary data frame is called using
des_sum$SummaryData
:
Either as an interactive data.tables
table:
DT::datatable(des_sum$SummaryData, escape = FALSE)
Or as a kable
:
Variables | Labels | STUDY_SEGMENT | Mean | Median | Mode | SD | IQR (Quartiles) | MAD | Range (Min - Max) | CV | Skewness | Kurtosis | No. categories/Freq. table | Valid | Missing | Graph | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CENTER_0 |
Examination center CENTER_0 v00000 [nominal, integer] |
STUDY | Berlin |
No. unique values (incl. NA): 5
|
3000 100.000% |
0 0.000% |
||||||||||||||||||||||
DBP_0 |
Diastolic blood pressure DBP_0 v00005 [ratio, float] |
PHYS_EXAM | 81.29 | 81 | 80 | 9.2142 | 12 (Q1 = 75 | Q3 = 87) | 8.896 | 57 (54 - 111) | 11.335 | 0.0805 (0.0447) | -0.5562 |
2544 84.800% |
456 15.200% |
||||||||||||||
GLOBAL_HEALTH_VAS_0 |
Self-reported global health GLOBAL_HEALTH_VAS_0 v00006 [ratio, float] |
PHYS_EXAM | 5.027 | 5 | 3.2 | 2.9184 | 5.075 (Q1 = 2.5 | Q3 = 7.575) | 3.706 | 10 (0 - 10) | 58.0511 | -0.0015 (0.0447) | -1.4368 |
2618 87.267% |
382 12.733% |
||||||||||||||
ASTHMA_0 |
Known asthma ASTHMA_0 v00007 [nominal, integer] |
PHYS_EXAM | no |
No. unique values (incl. NA): 3
|
2641 88.033% |
359 11.967% |
||||||||||||||||||||||
VO2_CAPCAT_0 |
Aerobic capacity category VO2_CAPCAT_0 v00008 [ordinal, string] |
PHYS_EXAM | good | excellent | 3 (Q1 = excellent | Q3 = restricted) | (excellent - pathological) |
No. unique values (incl. NA): 6
|
0 0.000% |
405 13.500% |
|||||||||||||||||||
ARM_CIRC_0 |
Upper arm circumference ARM_CIRC_0 v00009 [ratio, float] |
PHYS_EXAM | 25.033 | 25 | 24 | 3.9576 | 6 (Q1 = 22 | Q3 = 28) | 4.448 | 27 (11 - 38) | 15.8097 | -0.0237 (0.0447) | -0.3594 |
2657 88.567% |
343 11.433% |
||||||||||||||
ARM_CIRC_DISC_0 |
Upper arm circumference cat ARM_CIRC_DISC_0 v00109 [ordinal, integer] |
PHYS_EXAM | (20,30] | (20,30] | 0 (Q1 = (20,30] | Q3 = (20,30]) | ((-Inf,20] - (30, Inf]) |
No. unique values (incl. NA): 4
|
2633 87.767% |
367 12.233% |
|||||||||||||||||||
ARM_CUFF_0 |
Upper arm circumference device ARM_CUFF_0 v00010 [ordinal, integer] |
PHYS_EXAM | (20,30] | (20,30] | 0 (Q1 = (20,30] | Q3 = (20,30]) | ((-Inf,20] - (30, Inf]) |
No. unique values (incl. NA): 4
|
2623 87.433% |
377 12.567% |
|||||||||||||||||||
USR_VO2_0 |
Aerobic capacity examiner USR_VO2_0 v00011 [nominal, string] |
PHYS_EXAM | USR_321 |
No. unique values (incl. NA): 16
|
0 0.000% |
218 7.267% |
||||||||||||||||||||||
USR_BP_0 |
Blood pressure examiner USR_BP_0 v00012 [nominal, string] |
PHYS_EXAM | USR_301 |
No. unique values (incl. NA): 16
|
0 0.000% |
225 7.500% |
||||||||||||||||||||||
EXAM_DT_0 |
Examination date and time EXAM_DT_0 v00013 [interval, datetime] |
PHYS_EXAM | 2018-07-02 10:09:59 UTC | 2018-07-05 19:45:30 UTC | 2018-03-21 20:44:05 UTC 2018-04-12 05:25:04 UTC and other 3 dates | 9064795 secs | 15347279 secs (Q1 = 2018-04-03 04:24:05 UTC | Q3 = 2018-09-27 19:32:04 UTC) | 11520753 secs | 364 days (2018-01-01 UTC - 2018-12-31 UTC) |
2940 98.000% |
60 2.000% |
|||||||||||||||||
PART_PHYS_EXAM |
Physical exam consent PART_PHYS_EXAM v20000 [nominal, integer] |
PHYS_EXAM | yes |
No. unique values (incl. NA): 2
|
2940 98.000% |
60 2.000% |
||||||||||||||||||||||
CRP_0 |
C-reactive protein CRP_0 v00014 [ratio, float] |
LAB | 2.888 | 2.587 | 0.16 | 1.8053 | 2.27 (Q1 = 1.608 | Q3 = 3.878) | 1.637 | 11.894 (0.118 - 12.012) | 62.5065 | 0.8966 (0.0447) | 0.9983 |
2699 89.967% |
301 10.033% |
||||||||||||||
BSG_0 |
Erythrocyte sedimentation rate BSG_0 v00015 [ratio, float] |
LAB | 14.857 | 11 | 10 | 12.1348 | 14 (Q1 = 6 | Q3 = 20) | 10.378 | 96 (0 - 96) | 81.6771 | 1.3774 (0.0447) | 2.678 |
2686 89.533% |
314 10.467% |
||||||||||||||
DEV_NO_0 |
Device ID DEV_NO_0 v00016 [nominal, integer] |
LAB | 2 |
No. unique values (incl. NA): 6
|
2692 89.733% |
308 10.267% |
||||||||||||||||||||||
LAB_DT_0 |
Lab analysis date and time LAB_DT_0 v00017 [interval, datetime] |
LAB | 2018-07-02 12:00:36 UTC | 2018-07-05 21:50:00 UTC | 2018-01-01 02:00:00 UTC 2018-01-10 15:55:28 UTC and other 59 dates | 9064818 secs | 15342119 secs (Q1 = 2018-04-03 06:21:05 UTC | Q3 = 2018-09-27 20:03:04 UTC) | 192027.4 mins | 364.0007 days (2018-01-01 02:00:00 UTC - 2018-12-31 02:01:00 UTC) |
2940 98.000% |
60 2.000% |
|||||||||||||||||
PART_LAB |
Lab analysis consent PART_LAB v30000 [nominal, integer] |
LAB | yes |
No. unique values (incl. NA): 2
|
2940 98.000% |
60 2.000% |
||||||||||||||||||||||
EDUCATION_0 |
Highest educational level B/L EDUCATION_0 v00018 [ordinal, integer] |
INTERVIEW | uppersecond | uppersecond | 2 (Q1 = secondary | Q3 = postsecond) | (pre-primary - secondtertiary) |
No. unique values (incl. NA): 8
|
2472 82.400% |
528 17.600% |
|||||||||||||||||||
EDUCATION_1 |
Highest educational level F/U EDUCATION_1 v01018 [ordinal, integer] |
INTERVIEW | uppersecond | uppersecond | 2 (Q1 = secondary | Q3 = postsecond) | (pre-primary - secondtertiary) |
No. unique values (incl. NA): 8
|
2422 80.733% |
578 19.267% |
|||||||||||||||||||
FAM_STAT_0 |
Marital status FAM_STAT_0 v00019 [nominal, integer] |
INTERVIEW | 1 |
No. unique values (incl. NA): 5
|
2389 79.633% |
611 20.367% |
||||||||||||||||||||||
MARRIED_0 |
Currently married MARRIED_0 v00020 [nominal, integer] |
INTERVIEW | no |
No. unique values (incl. NA): 3
|
2366 78.867% |
634 21.133% |
||||||||||||||||||||||
SEX_0 |
Sex B/L SEX_0 v00002 [nominal, integer] |
STUDY | females |
No. unique values (incl. NA): 3
|
2940 98.000% |
60 2.000% |
||||||||||||||||||||||
N_CHILD_0 |
Number of children N_CHILD_0 v00021 [ratio, integer] |
INTERVIEW | 2.499 | 2 | 2 | 1.5297 | 2 (Q1 = 1 | Q3 = 3) | 1.483 | 9 (0 - 9) | 61.2097 | 0.4907 (0.0447) | -0.2911 |
2336 77.867% |
664 22.133% |
||||||||||||||
EATING_PREFS_0 |
Eating preferences EATING_PREFS_0 v00022 [nominal, integer] |
INTERVIEW | none |
No. unique values (incl. NA): 4
|
2328 77.600% |
672 22.400% |
||||||||||||||||||||||
MEAT_CONS_0 |
Meat consumption MEAT_CONS_0 v00023 [ordinal, integer] |
INTERVIEW | 1-2d a week | never | 2 (Q1 = never | Q3 = 3-4d a week) | (never - daily) |
No. unique values (incl. NA): 6
|
2302 76.733% |
698 23.267% |
|||||||||||||||||||
SMOKING_0 |
Current smoker SMOKING_0 v00024 [nominal, integer] |
INTERVIEW | no |
No. unique values (incl. NA): 3
|
2292 76.400% |
708 23.600% |
||||||||||||||||||||||
SMOKE_SHOP_0 |
Purchasing tobacco products SMOKE_SHOP_0 v00025 [ordinal, integer] |
INTERVIEW | 3-4d a week | 3-4d a week | 2 (Q1 = 1-2d a week | Q3 = 5-6d a week) | (never - daily) |
No. unique values (incl. NA): 6
|
782 26.067% |
2218 73.933% |
|||||||||||||||||||
N_INJURIES_0 |
Number of injuries N_INJURIES_0 v00026 [ratio, integer] |
INTERVIEW | 4.588 | 4 | 4 | 2.4221 | 3 (Q1 = 3 | Q3 = 6) | 2.965 | 14 (0 - 14) | 52.7921 | 0.4743 (0.0447) | -0.5724 |
2199 73.300% |
801 26.700% |
||||||||||||||
N_BIRTH_0 |
Number of births N_BIRTH_0 v00027 [ratio, integer] |
INTERVIEW | 3.458 | 3 | 3 | 1.7707 | 2 (Q1 = 2 | Q3 = 4) | 1.483 | 12 (0 - 12) | 51.2115 | 0.2105 (0.0447) | -1.7027 |
1099 36.633% |
1901 63.367% |
||||||||||||||
INCOME_GROUP_0 |
Income group INCOME_GROUP_0 v00028 [ordinal, integer] |
INTERVIEW | [30-50k) | [30-50k) | 2 (Q1 = [10-30k) | Q3 = [50-70k)) | (below 10k - above 90k) |
No. unique values (incl. NA): 7
|
2174 72.467% |
826 27.533% |
|||||||||||||||||||
PREGNANT_0 |
Currently pregnant PREGNANT_0 v00029 [nominal, integer] |
INTERVIEW | no |
No. unique values (incl. NA): 3
|
1065 35.500% |
1935 64.500% |
||||||||||||||||||||||
MEDICATION_0 |
Medication use MEDICATION_0 v00030 [nominal, integer] |
INTERVIEW | yes |
No. unique values (incl. NA): 2
|
292 9.733% |
2708 90.267% |
||||||||||||||||||||||
AGE_0 |
Age B/L AGE_0 v00003 [ratio, integer] |
STUDY | 49.914 | 50 | 51 | 4.4232 | 6 (Q1 = 47 | Q3 = 53) | 4.448 | 30 (33 - 63) | 8.8616 | -0.0367 (0.0447) | -0.1761 |
2940 98.000% |
60 2.000% |
||||||||||||||
N_ATC_CODES_0 |
Number of ATC codes N_ATC_CODES_0 v00031 [ratio, integer] |
INTERVIEW | 2.262 | 1 | 0 | 2.7257 | 3 (Q1 = 0 | Q3 = 3) | 1.483 | 22 (0 - 22) | 120.4786 | 1.4057 (0.0447) | 3.0554 |
2058 68.600% |
942 31.400% |
||||||||||||||
USR_SOCDEM_0 |
Sociodemographics examiner USR_SOCDEM_0 v00032 [nominal, string] |
INTERVIEW | USR_321 |
No. unique values (incl. NA): 16
|
0 0.000% |
714 23.800% |
||||||||||||||||||||||
INT_DT_0 |
Interview date and time INT_DT_0 v00033 [interval, datetime] |
INTERVIEW | 2018-07-02 12:40:34 UTC | 2018-07-05 22:27:30 UTC | 2018-08-15 22:55:34 UTC 2018-08-24 01:57:42 UTC | 9064782 secs | 15348014 secs (Q1 = 2018-04-03 06:44:50 UTC | Q3 = 2018-09-27 22:05:04 UTC) | 11522577 secs | 363.9972 days (2018-01-01 02:24:00 UTC - 2018-12-31 02:20:00 UTC) |
2940 98.000% |
60 2.000% |
|||||||||||||||||
PART_INTERVIEW |
Interview consent PART_INTERVIEW v40000 [nominal, integer] |
INTERVIEW | yes |
No. unique values (incl. NA): 3
|
2940 98.000% |
60 2.000% |
||||||||||||||||||||||
ITEM_1_0 |
Item 1 ITEM_1_0 v00034 [ratio, integer] |
QUESTIONNAIRE | 3.037 | 3 | 3 | 1.764 | 2 (Q1 = 2 | Q3 = 4) | 1.483 | 9 (0 - 9) | 58.0849 | 0.4138 (0.0447) | -0.5894 |
2248 74.933% |
752 25.067% |
||||||||||||||
ITEM_2_0 |
Item 2 ITEM_2_0 v00035 [ratio, integer] |
QUESTIONNAIRE | 2.988 | 3 | 2 | 1.7008 | 2 (Q1 = 2 | Q3 = 4) | 1.483 | 10 (0 - 10) | 56.9277 | 0.4079 (0.0447) | -0.648 |
2197 73.233% |
803 26.767% |
||||||||||||||
ITEM_3_0 |
Item 3 ITEM_3_0 v00036 [ratio, integer] |
QUESTIONNAIRE | 3.014 | 3 | 3 | 1.7175 | 2 (Q1 = 2 | Q3 = 4) | 1.483 | 10 (0 - 10) | 56.9898 | 0.4105 (0.0447) | -0.6058 |
2184 72.800% |
816 27.200% |
||||||||||||||
ITEM_4_0 |
Item 4 ITEM_4_0 v00037 [ratio, integer] |
QUESTIONNAIRE | 3 | 3 | 3 | 1.7214 | 2 (Q1 = 2 | Q3 = 4) | 1.483 | 10 (0 - 10) | 57.3701 | 0.4347 (0.0447) | -0.5563 |
2143 71.433% |
857 28.567% |
||||||||||||||
ITEM_5_0 |
Item 5 ITEM_5_0 v00038 [ratio, integer] |
QUESTIONNAIRE | 6.021 | 6 | 6 | 2.3738 | 4 (Q1 = 4 | Q3 = 8) | 2.965 | 10 (0 - 10) | 39.4237 | -0.1831 (0.0447) | -1.3516 |
2074 69.133% |
926 30.867% |
||||||||||||||
ITEM_6_0 |
Item 6 ITEM_6_0 v00039 [ratio, integer] |
QUESTIONNAIRE | 5.948 | 6 | 6 | 2.3706 | 4 (Q1 = 4 | Q3 = 8) | 2.965 | 10 (0 - 10) | 39.8567 | -0.1194 (0.0447) | -1.4139 |
2048 68.267% |
952 31.733% |
||||||||||||||
AGE_GROUP_0 |
Age group B/L AGE_GROUP_0 v00103 [ordinal, string] |
STUDY | 50-59 | 50-59 |
No. unique values (incl. NA): 5
|
0 0.000% |
60 2.000% |
|||||||||||||||||||||
ITEM_7_0 |
Item 7 ITEM_7_0 v00040 [ratio, integer] |
QUESTIONNAIRE | 6.037 | 6 | 6 | 2.4007 | 4 (Q1 = 4 | Q3 = 8) | 2.965 | 10 (0 - 10) | 39.7687 | -0.1739 (0.0447) | -1.3982 |
2068 68.933% |
932 31.067% |
||||||||||||||
ITEM_8_0 |
Item 8 ITEM_8_0 v00041 [ratio, integer] |
QUESTIONNAIRE | 5.895 | 6 | 6 | 2.3974 | 4 (Q1 = 4 | Q3 = 8) | 2.965 | 10 (0 - 10) | 40.6699 | -0.0971 (0.0447) | -1.4874 |
2013 67.100% |
987 32.900% |
||||||||||||||
QUEST_DT_0 |
Questionnaire date and time QUEST_DT_0 v00042 [interval, datetime] |
QUESTIONNAIRE | 2018-08-04 14:59:47 UTC | 2018-07-30 08:21:46 UTC | 2018-03-17 01:21:08 UTC 2018-04-07 19:44:05 UTC and other 4 dates | 12050776 secs | 16542372 secs (Q1 = 2018-04-21 09:39:33 UTC | Q3 = 2018-10-29 20:45:45 UTC) | 12215994 secs | 1032.387 days (2018-01-11 04:26:00 UTC - 2020-11-08 13:43:28 UTC) |
2931 97.700% |
69 2.300% |
|||||||||||||||||
PART_QUESTIONNAIRE |
Questionnaire consent PART_QUESTIONNAIRE v50000 [nominal, integer] |
QUESTIONNAIRE | yes |
No. unique values (incl. NA): 3
|
2940 98.000% |
60 2.000% |
||||||||||||||||||||||
AGE_1 |
Age F/U AGE_1 v01003 [ratio, integer] |
STUDY | 49.872 | 50 | 51 | 4.4291 | 6 (Q1 = 47 | Q3 = 53) | 4.448 | 30 (33 - 63) | 8.8808 | -0.0315 (0.0447) | -0.1855 |
2940 98.000% |
60 2.000% |
||||||||||||||
SEX_1 |
Sex F/U SEX_1 v01002 [nominal, integer] |
STUDY | females |
No. unique values (incl. NA): 3
|
2940 98.000% |
60 2.000% |
||||||||||||||||||||||
PART_STUDY |
Study consent PART_STUDY v10000 [nominal, integer] |
STUDY | yes |
No. unique values (incl. NA): 2
|
2940 98.000% |
60 2.000% |
||||||||||||||||||||||
SBP_0 |
Systolic blood pressure SBP_0 v00004 [ratio, float] |
PHYS_EXAM | 126.516 | 127 | 130 | 9.613 | 13 (Q1 = 120 | Q3 = 133) | 8.896 | 63 (97 - 160) | 7.5982 | 0.0636 (0.0447) | -0.5639 |
2561 85.367% |
439 14.633% |