How to use the function prep_get_data_frame in dataquieR

The function prep_get_data_frame is powerful and it is worth to show its many potential use possibilities. Hereafter there is a list of possible use cases.


1. Import files using a path, a URL, a sample data (package dataquieR)

The function can be used to import data providing:

  • the path of the file (in case the file is in a local folder),
  • the URL of the file,
  • just the name of the file for the example data available with the package.
library(dataquieR)

prep_get_data_frame("~/Documents/data/study_data1.csv") # an example PATH
prep_get_data_frame("https://.../study_data1.csv")  # an example URL
prep_get_data_frame("ship") # for the SHIP-based example data


2. Import data frames from data frames storage area of dataquieR

Any data frame that is available in the data frames storage area of dataquieR, can be fetched by indicating its name.

# Import the dataquieR metadata example for the synthetic data 
prep_load_workbook_like_file("meta_data_v2") # load the data frames to dataquieR storage area
item_level1 <- prep_get_data_frame("item_level") #get the data frame in an object called item_level1
item_level1


3. Import specific sheets/tables from a file

3A. Import by name

To import a specific sheet from an excel file you have to append it after the file name separated by a | symbol.

For example, let’s imagine you downloaded one of dataquieR example study data from the Example Datasets, (the synthetic example metadata in XLSX format) and saved it on your Desktop. To import directly the sheet “segment_level” you can use the following code.

# Import the segment_level sheet from the synthetic example metadata
# saved on the desktop 
prep_get_data_frame("~/Desktop/meta_data_v2.xlsx|segment_level")

Attention: This works not only for Excel files, but for all files that feature more than one table (e.g., RData, OpenDocument Spreadsheet - ODS).

3B. Import by index number

It is also possible to indicate the number of the sheet in the file instead of the name, for example the missing_table is the second sheet in the excel file.

# Import the missing table (sheet: missing_table)
prep_get_data_frame("meta_data_v2|2")


4. Import specific column/s from a determined sheet in a file or in a data frame in the dataquieR storage area

It is also possible to access a specific column or more specific columns from a file or from a data frame in the cache environment using a combination of | and + symbols.

# Import the segment_level sheet from the synthetic example metadata
# saved on the desktop, but only the selected columns STUDY_SEGMENT and SEGMENT_ID_VARS
prep_get_data_frame("~/Desktop/meta_data_v2.xlsx|segment_level|STUDY_SEGMENT+SEGMENT_ID_VARS")
# Import the missing table (sheet: missing_table)
prep_get_data_frame("meta_data_v2|missing_table|CODE_VALUE+CODE_LABEL")
# Import the missing table (sheet: item_level)
prep_get_data_frame("meta_data_v2|item_level|VAR_NAMES+LABEL")


5. Import RData file

The function is able to import data frames that are inside RData files. For example, we create an RData file.

#Create an RData file with 2 objects
vector1 <- 5:10

table1_ex <- data.frame(x = 1:5, y = LETTERS[1:5])
save(table1_ex, vector1, file = "table_example.RData")

rm(table1_ex)
df1 <- prep_get_data_frame("table_example.RData|table1_ex")
df1
# v1 is a vector, so the following line should fail, as prep_get_data_frames
# works only with data frames
try(v1 <- prep_get_data_frame("table_example.RData|vector1"))
## Error in eval(expr, envir, enclos) : 
##   File "table_example.RData" did not contain a table (data frame) according to 'rio'
## when calling try(v1 <- prep_get_data_frame("table_example.RData|vector1"))

If only one data frame is present in the RData file, there is no need to specify the object name.

#Create an RData file with 2 objects
table2_ex <- data.frame(x = 1:5, y = LETTERS[1:5])
save(table2_ex, file = "table_example_2.RData")

rm(table2_ex)
df2 <- prep_get_data_frame("table_example_2.RData")
df2


6. Import data using predefined sources (like bookmarks)

There are bookmarks for some standardized vocabulary available in dataquieR.

This is the list of all the bookmarks for specific vocabulary available in dataquieR:

Bookmarks can be indicated using < > or voc:.

# Import the codes of ICD7
prep_get_data_frame("<ICD7>")
# Import the codes of ICD7
prep_get_data_frame("voc:ICD7")

Note: There is the possibility to have custom bookmarks. To do so, execute the following:

prep_add_data_frames(`<>` =  data.frame(voc = c("bookmark1", "bookmark2") , 
                                        url = c("data:datasets|cars", "data:datasets|iris")))

prep_get_data_frame("<bookmark1>")


7. Import data that come from other packages (from data) using prefix data:

Let’s say we want to import the data frame iris from the package datasets. We want for example to import only 2 columns from that data frame. This can be done with prep_get_data_frame using he prefix data: as follows.

# Import the codes of data frame iris from package datasets, only columns 
# Sepal.Length and Sepal.Width
prep_get_data_frame("data:datasets|iris|Species+Sepal.Length+Sepal.Width")


8. Import data from other packages (from their extdata folder) using prefix extdata:

If we are interested in a data frame that is present in the extdata of a package, in the folder inst. We can use the prefix extdata:, as follows. For example for the package dataquieR

prep_get_data_frame("extdata:dataquieR/meta_data.RData")


9. Import data from other packages (specifying the path) using prefix package:

To access to the data frame we need to indicate all the path to the specific data frame and the prefix package:. For example for the package dataquieR we should write as follows.

prep_get_data_frame("package:dataquieR/extdata/meta_data.RData")


Revise the content of the dataquieR data frame storage area

The data frame storage works also as a cache area, i.e., whenever prep_get_data_frame fetches a data frame this is also added to the data frame storage. Use prep_list_dataframes()to see the content of this storage area.

Using prep_load_workbook_like_file() and/or prep_load_folder_with_metadata you can add a set of data frames from a file or folder to the storage

To add single data frames to the storage manually, use prep_get_data_frame().

To clear every data frame from the storage, use prep_purge_data_frame_cache().

Use prep_remove_from_cache() to remove specific data frames from the storage.

Example:

# Impot the dataquieR metadata example for the synthetic data 
prep_load_workbook_like_file("meta_data_v2") 
# Look at the content of the storage
prep_list_dataframes() 
# To get a data frame from the storage
df_md <- prep_get_data_frame("dataframe_level") 
# Remove all data frames from the storage 
prep_purge_data_frame_cache()