
Get Data Table from improve Repository
getData.RdRetrieves a data table from the improve repository and loads it into R with metadata including caption and description. Automatically detects file format and uses appropriate reader (.xls/.xlsx as Excel, .rds as RDS, .sas7bdat as SAS, others as CSV).
Usage
getData(
ident,
from = pwd(),
addAsLink = TRUE,
caption = "",
description = "",
parser = NULL,
...
)Arguments
- ident
Path, resource ID, entity ID, or entity version ID of the data file. Accepts relative paths (starting with
./or../), absolute paths (starting with/), UUIDs, or improve identifiers. SeeloadResource()for full details on identifier formats- from
Root path for resolving relative paths. Defaults to
pwd(), which is the step that initiated the R session (set viaIMPROVER_STEPenvironment variable)- addAsLink
Logical. If
TRUE(default), creates a link in the inventory for provenance tracking. UseimproveClean()at workflow end to clean up links. Set toFALSEfor one-off loads without provenance tracking- caption
Custom caption text. If empty string (default), uses "Source (Entity version ID):
, Last modified on: " - description
Custom description text. If empty string (default), uses the filename
- parser
Optional custom parsing function that takes a local file path as first argument and returns a data frame. Overrides automatic format detection
- ...
Additional arguments passed to the underlying read function:
readxl::read_excel(),readRDS(),haven::read_sas(), orutils::read.csv()
Value
A single-row data frame (descriptor) with the following columns, or
NULL if the resource cannot be found:
- caption
Character. Caption text for display/reporting
- path
Character. Local file path (normalized with forward slashes)
- entityId
Character. The improve entity ID of the resource
- name
Character. Original filename from the repository
- description
Character. Description text for the resource
- resource
List column containing the full resource metadata data frame from the improve server (includes fields like
resourceId,entityVersionId,nodeType,lastModifiedOn, etc.). Access viaresult$resource[[1]]- data
List column containing the parsed data frame. Access the actual data via
result$data[[1]]- dataType
Character. File type detected: "Excel", "RDS", "SAS", "CSV", or "Custom parser"
If ident matches multiple resources, returns a data frame with multiple rows
(one per resource).
Details
File format detection is case-insensitive and based on file extension:
.xls,.xlsx: Excel viareadxl::read_excel().rds: R Data Serialization viareadRDS().sas7bdat: SAS viahaven::read_sas()All others: CSV via
utils::read.csv()(note:stringsAsFactorsbehavior depends on R version)
The returned descriptor integrates with improve's provenance tracking when
addAsLink = TRUE. The file is downloaded to a local data/ subdirectory
in the current workspace.
See also
getTextString for loading text files,
getGraphics for loading image files,
getR for loading R objects,
loadResource for loading resource metadata without file content
Examples
if (FALSE) { # \dontrun{
# Load CSV data from current step
data_desc <- getData("analysis_data.csv")
df <- data_desc$data[[1]]
# Check what was loaded
data_desc$name # Original filename
data_desc$entityId # improve entity ID
data_desc$dataType # "CSV"
# Access full resource metadata
resource_info <- data_desc$resource[[1]]
resource_info$lastModifiedOn
# Load Excel file with custom caption
excel_desc <- getData(
"results/summary.xlsx",
caption = "Study Results Summary"
)
# Load without provenance tracking (one-off use)
temp_data <- getData("temp_file.csv", addAsLink = FALSE)
# Use custom parser for special format
custom_desc <- getData(
"special_format.txt",
parser = function(path) read.delim(path, sep = "|")
)
# Handle case when resource not found
result <- getData("nonexistent.csv")
if (is.null(result)) {
message("Resource not found")
}
} # }