The function prep_get_data_frame
is powerful and it is
worth to show its many potential use possibilities. Hereafter there is a
list of possible use cases.
The function can be used to import data providing:
library(dataquieR)
prep_get_data_frame("~/Documents/data/study_data1.csv") # an example PATH
prep_get_data_frame("https://.../study_data1.csv") # an example URL
prep_get_data_frame("ship") # for the SHIP-based example data
Any data frame that is available in the data frames storage area of dataquieR, can be fetched by indicating its name.
# Import the dataquieR metadata example for the synthetic data
prep_load_workbook_like_file("meta_data_v2") # load the data frames to dataquieR storage area
item_level1 <- prep_get_data_frame("item_level") #get the data frame in an object called item_level1
item_level1
To import a specific sheet from an excel file you have to append it after the file name separated by a | symbol.
For example, let’s imagine you downloaded one of dataquieR example study data from the Example Datasets, (the synthetic example metadata in XLSX format) and saved it on your Desktop. To import directly the sheet “segment_level” you can use the following code.
# Import the segment_level sheet from the synthetic example metadata
# saved on the desktop
prep_get_data_frame("~/Desktop/meta_data_v2.xlsx|segment_level")
Attention: This works not only for Excel files, but for all files that feature more than one table (e.g., RData, OpenDocument Spreadsheet - ODS).
It is also possible to indicate the number of the sheet in the file instead of the name, for example the missing_table is the second sheet in the excel file.
# Import the missing table (sheet: missing_table)
prep_get_data_frame("meta_data_v2|2")
It is also possible to access a specific column or more specific columns from a file or from a data frame in the cache environment using a combination of | and + symbols.
# Import the segment_level sheet from the synthetic example metadata
# saved on the desktop, but only the selected columns STUDY_SEGMENT and SEGMENT_ID_VARS
prep_get_data_frame("~/Desktop/meta_data_v2.xlsx|segment_level|STUDY_SEGMENT+SEGMENT_ID_VARS")
# Import the missing table (sheet: missing_table)
prep_get_data_frame("meta_data_v2|missing_table|CODE_VALUE+CODE_LABEL")
# Import the missing table (sheet: item_level)
prep_get_data_frame("meta_data_v2|item_level|VAR_NAMES+LABEL")
The function is able to import data frames that are inside RData files. For example, we create an RData file.
#Create an RData file with 2 objects
vector1 <- 5:10
table1_ex <- data.frame(x = 1:5, y = LETTERS[1:5])
save(table1_ex, vector1, file = "table_example.RData")
rm(table1_ex)
df1 <- prep_get_data_frame("table_example.RData|table1_ex")
df1
# v1 is a vector, so the following line should fail, as prep_get_data_frames
# works only with data frames
try(v1 <- prep_get_data_frame("table_example.RData|vector1"))
## Error in base::tryCatch(base::withCallingHandlers({ :
## File "table_example.RData" did not contain a table (data frame)
## according to 'rio'
If only one data frame is present in the RData file, there is no need to specify the object name.
#Create an RData file with 2 objects
table2_ex <- data.frame(x = 1:5, y = LETTERS[1:5])
save(table2_ex, file = "table_example_2.RData")
rm(table2_ex)
df2 <- prep_get_data_frame("table_example_2.RData")
df2
There are bookmarks for some standardized vocabulary available in dataquieR.
This is the list of all the bookmarks for specific vocabulary available in dataquieR:
## [1] "ICD10GM" "ICD10" "ICPC" "SPAT" "NOMESCO"
## [6] "ATC" "ICD9" "SNOMEDrokan" "SNOMED3" "ICD7"
Bookmarks can be indicated using <
>
or voc:
.
# Import the codes of ICD7
prep_get_data_frame("<ICD7>")
# Import the codes of ICD7
prep_get_data_frame("voc:ICD7")
Note: There is the possibility to have custom bookmarks. To do so, execute the following:
prep_add_data_frames(`<>` = data.frame(voc = c("bookmark1", "bookmark2") ,
url = c("data:datasets|cars", "data:datasets|iris")))
prep_get_data_frame("<bookmark1>")
data:
Let’s say we want to import the data frame iris from the package
datasets. We want for example to import only 2 columns from that data
frame. This can be done with prep_get_data_frame
using he
prefix data:
as follows.
# Import the codes of data frame iris from package datasets, only columns
# Sepal.Length and Sepal.Width
prep_get_data_frame("data:datasets|iris|Species+Sepal.Length+Sepal.Width")
extdata:
If we are interested in a data frame that is present in the
extdata
of a package, in the folder inst
. We
can use the prefix extdata:
, as follows.
if (!rlang::is_installed("tor")) {
install.packages("tor")
}
prep_get_data_frame("extdata:tor/csv/csv1.csv")
package:
To access to the data frame we need to indicate all the path to the
specific data frame and the prefix package:
. For example
for the package dataquieR we should write as follows.
prep_get_data_frame("package:tor/extdata/csv/csv1.csv")
The data frame storage works also as a cache area, i.e., whenever
prep_get_data_frame
fetches a data frame this is also added
to the data frame storage. Use prep_list_dataframes()
to see
the content of this storage area.
Using prep_load_workbook_like_file()
and/or
prep_load_folder_with_metadata
you can add a set of data
frames from a file or folder to the storage
To add single data frames to the storage manually, use
prep_get_data_frame()
.
To clear every data frame from the storage, use
prep_purge_data_frame_cache()
.
Use prep_remove_from_cache()
to remove specific data
frames from the storage.
Example:
# Impot the dataquieR metadata example for the synthetic data
prep_load_workbook_like_file("meta_data_v2")
# Look at the content of the storage
prep_list_dataframes()
# To get a data frame from the storage
df_md <- prep_get_data_frame("dataframe_level")
# Remove all data frames from the storage
prep_purge_data_frame_cache()