Skip to contents

Retrieves a data table from the improve repository and loads it into R with metadata including caption and description. Automatically detects file format and uses appropriate reader (.xls/.xlsx as Excel, .rds as RDS, .sas7bdat as SAS, others as CSV).

Usage

getData(
  ident,
  from = pwd(),
  addAsLink = TRUE,
  caption = "",
  description = "",
  parser = NULL,
  ...
)

Arguments

ident

Path, resource ID, entity ID, or entity version ID of the data file. Accepts relative paths (starting with ./ or ../), absolute paths (starting with /), UUIDs, or improve identifiers. See loadResource() for full details on identifier formats

from

Root path for resolving relative paths. Defaults to pwd(), which is the step that initiated the R session (set via IMPROVER_STEP environment variable)

Logical. If TRUE (default), creates a link in the inventory for provenance tracking. Use improveClean() at workflow end to clean up links. Set to FALSE for one-off loads without provenance tracking

caption

Custom caption text. If empty string (default), uses "Source (Entity version ID): , Last modified on: "

description

Custom description text. If empty string (default), uses the filename

parser

Optional custom parsing function that takes a local file path as first argument and returns a data frame. Overrides automatic format detection

...

Additional arguments passed to the underlying read function: readxl::read_excel(), readRDS(), haven::read_sas(), or utils::read.csv()

Value

A single-row data frame (descriptor) with the following columns, or NULL if the resource cannot be found:

caption

Character. Caption text for display/reporting

path

Character. Local file path (normalized with forward slashes)

entityId

Character. The improve entity ID of the resource

name

Character. Original filename from the repository

description

Character. Description text for the resource

resource

List column containing the full resource metadata data frame from the improve server (includes fields like resourceId, entityVersionId, nodeType, lastModifiedOn, etc.). Access via result$resource[[1]]

data

List column containing the parsed data frame. Access the actual data via result$data[[1]]

dataType

Character. File type detected: "Excel", "RDS", "SAS", "CSV", or "Custom parser"

If ident matches multiple resources, returns a data frame with multiple rows (one per resource).

Details

File format detection is case-insensitive and based on file extension:

The returned descriptor integrates with improve's provenance tracking when addAsLink = TRUE. The file is downloaded to a local data/ subdirectory in the current workspace.

References

ics1141

See also

getTextString for loading text files, getGraphics for loading image files, getR for loading R objects, loadResource for loading resource metadata without file content

Examples

if (FALSE) { # \dontrun{
# Load CSV data from current step
data_desc <- getData("analysis_data.csv")
df <- data_desc$data[[1]]

# Check what was loaded
data_desc$name       # Original filename
data_desc$entityId   # improve entity ID
data_desc$dataType   # "CSV"

# Access full resource metadata
resource_info <- data_desc$resource[[1]]
resource_info$lastModifiedOn

# Load Excel file with custom caption
excel_desc <- getData(
  "results/summary.xlsx",
  caption = "Study Results Summary"
)

# Load without provenance tracking (one-off use)
temp_data <- getData("temp_file.csv", addAsLink = FALSE)

# Use custom parser for special format
custom_desc <- getData(
  "special_format.txt",
  parser = function(path) read.delim(path, sep = "|")
)

# Handle case when resource not found
result <- getData("nonexistent.csv")
if (is.null(result)) {
  message("Resource not found")
}
} # }