The Data type mismatch indicator be calculated using int_datatype_matrix in the following way:

# Load dataquieR
library(dataquieR)

# Load data
sd1 <- prep_get_data_frame("ship")

# Load metadata
file_name <- system.file("extdata", "ship_meta_v2.xlsx", package = "dataquieR")
prep_load_workbook_like_file(file_name)
meta_data_item <- prep_get_data_frame("item_level") # item_level is a sheet in ship_meta_v2.xlsx

# Apply indicator function
datatype_res <- int_datatype_matrix(
  study_data = sd1, 
  meta_data = meta_data_item, 
  label_col = "LONG_LABEL"
)

A plot and a table are provided to view the results:

datatype_res$SummaryPlot

datatype_res$SummaryData
Variables Data type mismatch N (%) Convertible mismatch, stable N (%) Convertible mismatch, unstable N (%) Data type match N (%) Expected DATA_TYPE Observed DATA_TYPE State, given threshold STUDY_SEGMENT
25 Participant ID 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype INTRO
10 Diastolic blood pressure 2 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype SOMATOMETRY
28 Somatometry examiner 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype SOMATOMETRY
5 Body height 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype SOMATOMETRY
6 Body height scale ID 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype SOMATOMETRY
7 Body weight 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) float float Matching datatype SOMATOMETRY
8 Body weight scale ID 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype SOMATOMETRY
33 Waist circumference 300 (0.14%) 2148 ( 99.72%) 0 (0.00%) 3 ( 0.14%) float string Non-matching datatype SOMATOMETRY
17 Interview examiner 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype INTERVIEW
16 Highest educational level 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype INTERVIEW
23 Marital status 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype INTERVIEW
14 Examination date and time 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) datetime datetime Matching datatype INTRO
27 Smoking status 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype INTERVIEW
12 Ever had stroke 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype INTERVIEW
11 Ever had myocardial infarction 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype INTERVIEW
20 Known diabetes 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype INTERVIEW
2 Age of diabetes onset 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype INTERVIEW
13 Ever taken birth control pills 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype INTERVIEW
24 Monthly household income 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype INTERVIEW
15 HDL-cholesterol 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) float float Matching datatype LABORATORY
22 LDL-cholesterol 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) float float Matching datatype LABORATORY
32 Total cholesterol 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) float float Matching datatype LABORATORY
26 Sex 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype INTRO
19 Intro segment consent 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype INTRO
29 Somatometry segment consent 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype INTRO
18 Interview segment consent 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype INTRO
21 Laboratory segment consent 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype INTRO
1 Age 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype INTRO
4 Blood pressure examiner 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype SOMATOMETRY
3 Blood pressure device ID 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype SOMATOMETRY
30 Systolic blood pressure 1 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype SOMATOMETRY
31 Systolic blood pressure 2 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype SOMATOMETRY
9 Diastolic blood pressure 1 0 (0.00%) 0 ( 0.00%) 0 (0.00%) 2154 (100.00%) integer integer Matching datatype SOMATOMETRY


All datatype issues found by int_datatype_matrix should be checked data element by data element. For instance, a major issue was found in the variable WAIST_CIRC_0. This variable is in the study data with datatype character, which differs from the expected datatype float defined in the metadata. Some basic checks show the misuse of commas as the decimal delimiter.

int_inspect_char(sd1$waist)
Character Count
, 3
. 2144
0 933
1 1443
2 908
3 884
4 889
5 898
6 1018
7 1279
8 1355
9 1409
NA 3


To correct this issue, converting WAIST_CIRC_0 to datatype numeric will coerce respective values to NA’s, which should be avoided. Hence, we replace the comma with the correct delimiter and correct the datatype without losing data values. The resulting applicability plot shows no more issues.

# replace comma with the correct delimiter
sd1$waist <- as.numeric(gsub(",", ".", sd1$waist))

int_datatype_matrix(
  study_data = sd1, 
  meta_data = meta_data_item, 
  label_col = "LONG_LABEL"
)$SummaryPlot

Back to Example data quality assessment of SHIP data