| Title: | A Magical Framework for Collaborative & Reproducible Data Analysis |
|---|---|
| Description: | A comprehensive data analysis framework for NIH-funded research that streamlines workflows for both data cleaning and preparing NIH Data Archive ('NDA') submission templates. Provides unified access to multiple data sources ('REDCap', 'MongoDB', 'Qualtrics', 'SQL', 'ORACLE') through interfaces to their APIs, with specialized functions for data cleaning, filtering, merging, and parsing. Features automatic validation, field harmonization, and memory-aware processing to enhance reproducibility in multi-site collaborative research as described in Mittal et al. (2021) <doi:10.20900/jpbs.20210011>. |
| Authors: | Joshua G. Kenney [aut, cre], Trevor F. Williams [aut], Minerva K. Pappu [aut], Michael J. Spilka [aut], Danielle N. Pratt [ctb], Victor J. Pokorny [ctb], Santiago Castiello de Obeso [ctb], Praveen Suthaharan [ctb], Christian R. Horgan [ctb] |
| Maintainer: | Joshua G. Kenney <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.6.16 |
| Built: | 2026-06-08 20:18:24 UTC |
| Source: | https://github.com/belieflab/wizardry |
This function processes requests for clean data sequentially for specified measures. It makes a request to the appropriate API for the named measure or measures and runs the associated data cleaning routines. It then runs a series of unit tests to verify that the data quality standards are met.
clean(..., csv = FALSE, rdata = FALSE, spss = FALSE, skip_prompt = TRUE)clean(..., csv = FALSE, rdata = FALSE, spss = FALSE, skip_prompt = TRUE)
... |
Strings, specifying the measures to process, which can be a Mongo collection, REDCap instrument, or Qualtrics survey. |
csv |
Optional; Boolean, if TRUE creates a .csv extract in ./tmp. |
rdata |
Optional; Boolean, if TRUE creates an .rdata extract in ./tmp. |
spss |
Optional; Boolean, if TRUE creates a .sav extract in ./tmp. |
skip_prompt |
Logical. If TRUE (default), skips confirmation prompts. If FALSE, prompts for confirmation unless the user has previously chosen to remember their preference. |
Prints the time taken for the data request process.
Joshua Kenney [email protected]
## Not run: clean("prl", csv=TRUE) clean("rgpts", "kamin", rdata=TRUE) # Skip confirmation prompts clean("prl", csv=TRUE, skip_prompt=TRUE) ## End(Not run)## Not run: clean("prl", csv=TRUE) clean("rgpts", "kamin", rdata=TRUE) # Skip confirmation prompts clean("prl", csv=TRUE, skip_prompt=TRUE) ## End(Not run)
This function is deprecated. Please use 'to.csv' instead. This is a legacy alias for the 'to.csv' function to maintain compatibility with older code.
createCsv(...)createCsv(...)
... |
Additional arguments passed through to |
Invisible TRUE if successful. The function writes a CSV file to the specified path and prints a message indicating the file's location.
## Not run: # DEPRECATED - use to.csv() instead createCsv(prl01) ## End(Not run)## Not run: # DEPRECATED - use to.csv() instead createCsv(prl01) ## End(Not run)
This function is deprecated. Please use 'to.rds' instead. This is a legacy alias for the 'to.rds' function to maintain compatibility with older code.
createRds(...)createRds(...)
... |
Additional arguments passed through to |
Invisible TRUE if successful. The function writes an RDS file to the specified path and prints a message indicating the file's location.
## Not run: # DEPRECATED - use to.rds() instead createRds(prl01) ## End(Not run)## Not run: # DEPRECATED - use to.rds() instead createRds(prl01) ## End(Not run)
This function is deprecated. Please use 'to.sav' instead. This is a legacy alias for the 'to.sav' function to maintain compatibility with older code.
createSpss(...)createSpss(...)
... |
Additional arguments passed through to |
Invisible TRUE if successful. Writes an SPSS file to the designated path and prints a message indicating the file's location.
## Not run: # DEPRECATED - use to.sav() instead createSpss(prl01) ## End(Not run)## Not run: # DEPRECATED - use to.sav() instead createSpss(prl01) ## End(Not run)
This function is deprecated. Please use 'sift' instead. This is a legacy alias for the 'sift' function to maintain compatibility with older code.
dataFilter(...)dataFilter(...)
... |
Additional arguments passed through to |
A filtered dataframe based on the provided parameters, and containing only the columns specified in 'cols'. If no columns are specified, returns the entire dataframe with applied row filters.
## Not run: # DEPRECATED - use sift() instead filtered <- dataFilter(df, sex="F") ## End(Not run)## Not run: # DEPRECATED - use sift() instead filtered <- dataFilter(df, sex="F") ## End(Not run)
This function is deprecated. Please use 'meld' instead. This is a legacy alias for the 'meld' function to maintain compatibility with older code.
dataMerge(...)dataMerge(...)
... |
Clean data frames to be merged. |
A merged data frame based on the specified or common candidate keys.
## Not run: # DEPRECATED - use meld() instead merged <- dataMerge(df1_clean, df2_clean) ## End(Not run)## Not run: # DEPRECATED - use meld() instead merged <- dataMerge(df1_clean, df2_clean) ## End(Not run)
This function is deprecated. Please use 'clean' instead. This is a legacy alias for the 'clean' function to maintain compatibility with older code.
dataRequest(...)dataRequest(...)
... |
Strings, specifying the measures to process, which can be a Mongo collection, REDCap instrument, or Qualtrics survey. |
Prints the time taken for the data request process.
## Not run: # DEPRECATED - use clean() instead prl <- dataRequest("prl") ## End(Not run)## Not run: # DEPRECATED - use clean() instead prl <- dataRequest("prl") ## End(Not run)
This function is deprecated. Please use 'redcap' instead. This is a legacy alias for the 'redcap' function to maintain compatibility with older code.
getRedcap(...)getRedcap(...)
... |
Optional column names to filter for. Only rows with non-missing values in ALL specified columns will be returned. This is useful for filtering data to only include complete cases for specific variables of interest. |
A data frame containing the requested REDCap data
## Not run: # DEPRECATED - use redcap() instead survey_data <- getRedcap("demographics") ## End(Not run)## Not run: # DEPRECATED - use redcap() instead survey_data <- getRedcap("demographics") ## End(Not run)
This function is deprecated. Please use 'qualtrics' instead. This is a legacy alias for the 'qualtrics' function to maintain compatibility with older code.
getSurvey(...)getSurvey(...)
... |
Optional column names to filter for. Only rows with non-missing values in ALL specified columns will be returned. This is useful for filtering data to only include complete cases for specific variables of interest. |
A cleaned and harmonized data frame containing the survey data with superkeys first.
## Not run: # DEPRECATED - use qualtrics() instead survey_data <- getSurvey("your_survey_alias") ## End(Not run)## Not run: # DEPRECATED - use qualtrics() instead survey_data <- getSurvey("your_survey_alias") ## End(Not run)
This function is deprecated. Please use 'mongo' instead. This is a legacy alias for the 'mongo' function to maintain compatibility with older code.
getTask(...)getTask(...)
... |
Optional column names to filter for. Only rows with non-missing values in ALL specified columns will be returned. This is useful for filtering data to only include complete cases for specific variables of interest. |
A data frame containing the MongoDB data with superkeys first
## Not run: # DEPRECATED - use mongo() instead survey_data <- getTask("task_alias") ## End(Not run)## Not run: # DEPRECATED - use mongo() instead survey_data <- getTask("task_alias") ## End(Not run)
This function simplifies the process of merging multiple cleaned data frames by automatically determining common merge keys or utilizing user-specified keys. Supports both inner and outer join methods, and offers options for exporting the merged data.
meld( ..., by = NULL, all = TRUE, no.dups = FALSE, csv = FALSE, rdata = FALSE, spss = FALSE )meld( ..., by = NULL, all = TRUE, no.dups = FALSE, csv = FALSE, rdata = FALSE, spss = FALSE )
... |
Clean data frames to be merged. |
by |
A vector of strings specifying the column names to be used as merge keys. If NULL, the function automatically determines common keys from the provided data frames. |
all |
Logical; if TRUE, performs an OUTER JOIN. If FALSE, performs an INNER JOIN. |
no.dups |
Logical; if TRUE, duplicates are removed post-merge. |
csv |
Logical; if TRUE, the merged data frame is exported as a CSV file. |
rdata |
Logical; if TRUE, the merged data frame is saved as an Rda file. |
spss |
Logical; if TRUE, the merged data frame is exported as an SPSS file. |
A merged data frame based on the specified or common candidate keys.
Joshua Kenney [email protected]
## Not run: # Create sample dataframes for demonstration df1 <- data.frame( src_subject_id = c("S001", "S002", "S003"), visit = c(1, 2, 1), measure1 = c(10, 15, 12), stringsAsFactors = FALSE ) df2 <- data.frame( src_subject_id = c("S001", "S002", "S004"), visit = c(1, 2, 2), measure2 = c(85, 92, 78), stringsAsFactors = FALSE ) # Perform an OUTER JOIN using default keys: merged1 <- meld(df1, df2, all = TRUE) # Perform an INNER JOIN using specified keys: merged2 <- meld(df1, df2, by = "src_subject_id", all = FALSE) ## End(Not run)## Not run: # Create sample dataframes for demonstration df1 <- data.frame( src_subject_id = c("S001", "S002", "S003"), visit = c(1, 2, 1), measure1 = c(10, 15, 12), stringsAsFactors = FALSE ) df2 <- data.frame( src_subject_id = c("S001", "S002", "S004"), visit = c(1, 2, 2), measure2 = c(85, 92, 78), stringsAsFactors = FALSE ) # Perform an OUTER JOIN using default keys: merged1 <- meld(df1, df2, all = TRUE) # Perform an INNER JOIN using specified keys: merged2 <- meld(df1, df2, by = "src_subject_id", all = FALSE) ## End(Not run)
Fetch data from MongoDB to be stored in a data frame - UPDATED VERSION
mongo( collection, ..., database = NULL, identifier = NULL, chunk_size = NULL, verbose = FALSE, interview_date = NULL )mongo( collection, ..., database = NULL, identifier = NULL, chunk_size = NULL, verbose = FALSE, interview_date = NULL )
collection |
The name of the MongoDB collection |
... |
Optional column names to filter for. Only rows with non-missing values in ALL specified columns will be returned. This is useful for filtering data to only include complete cases for specific variables of interest. |
database |
The database name (optional) |
identifier |
Field to use as identifier (optional) |
chunk_size |
Number of records per chunk (optional) |
verbose |
Logical; if TRUE, displays detailed progress messages. Default is FALSE. |
interview_date |
Optional; can be either: - A date string in various formats (ISO, US, etc.) to filter data up to that date - A boolean TRUE to return only rows with non-NA interview_date values |
A data frame containing the MongoDB data with superkeys first
## Not run: # Get data from MongoDB collection data <- mongo("collection") ## End(Not run)## Not run: # Get data from MongoDB collection data <- mongo("collection") ## End(Not run)
Retrieves a list of all available collections in the configured MongoDB database.
mongo.index(pattern = NULL, database = NULL)mongo.index(pattern = NULL, database = NULL)
pattern |
Optional regex string; if supplied, only collections whose name matches (case-insensitive) are shown. |
database |
Optional; the name of the database to connect to. If NULL, uses the database specified in the configuration file. |
A character vector containing the names of all available collections in the configured MongoDB database.
This function fetches a MongoDB collection containing multiple collections and separates it into individual data frames for each collection detected in the data. It identifies the appropriate identifier column (e.g., participantId, workerId) and splits the data based on column name prefixes.
mongo.rune(collection, prefix = NULL, db_name = NULL, lower = TRUE)mongo.rune(collection, prefix = NULL, db_name = NULL, lower = TRUE)
collection |
Character string specifying the Mongo collection |
prefix |
Character string; default NULL, if specified returns only the dataframe with this prefix |
db_name |
Character string specifying the Mongo database |
lower |
default TRUE convert prefixes to lower case |
The function performs the following steps:
Retrieves the raw Qualtrics data using the getSurvey() function
Identifies which identifier column to use (participantId, workerId, PROLIFIC_PID, or src_subject_id)
Determines survey prefixes by analyzing column names
Creates separate dataframes for each survey prefix found
Assigns each dataframe to the global environment with names matching the survey prefixes
If prefix is specified, returns a single dataframe with that prefix. Otherwise, creates multiple dataframes in the global environment, one for each survey detected in the data. Each dataframe is named after its survey prefix.
## Not run: # Parse a MongoDB collection into its component dataframes mongo.rune("combined_surveys") # After running, access individual survey dataframes directly: head(pss) # Access the PSS survey dataframe head(cesd) # Access the CESD survey dataframe # Parse a single survey from composite collection rgpts <- mongo.rune("combined_surveys", prefix = "rgpts") ## End(Not run)## Not run: # Parse a MongoDB collection into its component dataframes mongo.rune("combined_surveys") # After running, access individual survey dataframes directly: head(pss) # Access the PSS survey dataframe head(cesd) # Access the CESD survey dataframe # Parse a single survey from composite collection rgpts <- mongo.rune("combined_surveys", prefix = "rgpts") ## End(Not run)
This function processes requests for clean data sequentially for specified measures. It makes a request to the NIH NDA API for the named data structures and runs the associated data remediation routines. It then runs a series of unit tests to verify that the data quality standards are met.
nda( ..., csv = FALSE, rdata = FALSE, spss = FALSE, limited_dataset = FALSE, skip_prompt = TRUE, verbose = FALSE, strict = TRUE, dcc = FALSE )nda( ..., csv = FALSE, rdata = FALSE, spss = FALSE, limited_dataset = FALSE, skip_prompt = TRUE, verbose = FALSE, strict = TRUE, dcc = FALSE )
... |
Strings, specifying the measures to process, which can be a Mongo collection, REDCap instrument, or Qualtrics survey. |
csv |
Optional; Boolean, if TRUE creates a .csv extract in ./tmp. |
rdata |
Optional; Boolean, if TRUE creates an .rdata extract in ./tmp. |
spss |
Optional; Boolean, if TRUE creates a .sav extract in ./tmp. |
limited_dataset |
Optional; Boolean, if TRUE does not perform date-shifting of interview_date or age-capping of interview_age |
skip_prompt |
Logical. If TRUE (default), skips confirmation prompts unless preferences aren't set yet. If FALSE, prompts for confirmation unless the user has previously chosen to remember their preference. |
verbose |
Logical. If TRUE, shows detailed processing information. If FALSE (default), shows only essential user-facing messages. |
strict |
Logical. If TRUE (default), enforce strict NDA validation: required fields with ANY missing data or recommended fields with ALL missing data will cause validation failure. If FALSE (lenient mode), missing data triggers warnings but allows processing to continue. |
dcc |
Logical. If TRUE, include 11 DCC (Data Coordinating Center) fields from ndar_subject01 (7 required + 4 recommended). Default FALSE. |
Prints the time taken for the data request process.
Joshua Kenney [email protected]
## Not run: nda("prl", csv=TRUE) nda("rgpts", "kamin", rdata=TRUE) # Skip confirmation prompts nda("prl", csv=TRUE, skip_prompt=TRUE) # Show detailed processing information nda("prl", verbose=TRUE) # Use lenient validation mode (allow missing data with warnings) nda("prl", strict=FALSE) # Include DCC fields from ndar_subject01 nda("prl", dcc=TRUE) ## End(Not run)## Not run: nda("prl", csv=TRUE) nda("rgpts", "kamin", rdata=TRUE) # Skip confirmation prompts nda("prl", csv=TRUE, skip_prompt=TRUE) # Show detailed processing information nda("prl", verbose=TRUE) # Use lenient validation mode (allow missing data with warnings) nda("prl", strict=FALSE) # Include DCC fields from ndar_subject01 nda("prl", dcc=TRUE) ## End(Not run)
This function is deprecated. Please use 'nda' instead. This is a legacy alias for the 'nda' function to maintain compatibility with older code.
ndaRequest(...)ndaRequest(...)
... |
Strings, specifying the measures to process, which can be a Mongo collection, REDCap instrument, or Qualtrics survey. |
Prints the time taken for the data request process.
## Not run: # DEPRECATED - use nda() instead prl01 <- ndaRequest("prl01") ## End(Not run)## Not run: # DEPRECATED - use nda() instead prl01 <- ndaRequest("prl01") ## End(Not run)
Retrieves data from an Oracle table or view and optionally joins it with a primary keys table as specified in the configuration.
oracle( table_name = NULL, ..., fields = NULL, where_clause = NULL, join_primary_keys = TRUE, custom_query = NULL, max_rows = NULL, date_format = NULL, batch_size = 1000, pii = FALSE, interview_date = NULL, all = FALSE, schema = NULL )oracle( table_name = NULL, ..., fields = NULL, where_clause = NULL, join_primary_keys = TRUE, custom_query = NULL, max_rows = NULL, date_format = NULL, batch_size = 1000, pii = FALSE, interview_date = NULL, all = FALSE, schema = NULL )
table_name |
Name of the SQL table or view to query |
... |
Optional column names to filter for. Only rows with non-missing values in ALL specified columns will be returned. |
fields |
Optional vector of specific fields to select |
where_clause |
Optional WHERE clause to filter results (without the "WHERE" keyword) |
join_primary_keys |
Boolean, whether to join with the primary keys table (default: TRUE) |
custom_query |
Optional custom SQL query to execute instead of building one |
max_rows |
Optional limit on the number of rows to return |
date_format |
Optional format for date fields (default uses ISO format) |
batch_size |
Number of records to retrieve per batch for large datasets |
pii |
Logical; if FALSE (default), remove fields marked as PII. TRUE keeps PII. |
interview_date |
Optional; can be either: - A date string in various formats (ISO, US, etc.) to filter data up to that date - A boolean TRUE to return only rows with non-NA interview_date values |
all |
Logical; if TRUE, use LEFT OUTER JOIN instead of INNER JOIN (default: FALSE), similar to the 'all' parameter in base R's merge() function |
schema |
Optional schema name to use for table qualification |
A data frame containing the requested SQL data
## Not run: # Get data from a specific table data <- oracle("participants") # Get data with a where clause survey_data <- oracle("vw_surveyquestionresults", where_clause = "resultidentifier = 'NRS'") # Get all records, including those without matching primary key all_data <- oracle("candidate", all = TRUE) # Specify schema explicitly schema_data <- oracle("survey_results", schema = "STUDY_DATA") ## End(Not run)## Not run: # Get data from a specific table data <- oracle("participants") # Get data with a where clause survey_data <- oracle("vw_surveyquestionresults", where_clause = "resultidentifier = 'NRS'") # Get all records, including those without matching primary key all_data <- oracle("candidate", all = TRUE) # Specify schema explicitly schema_data <- oracle("survey_results", schema = "STUDY_DATA") ## End(Not run)
Get Oracle table columns/metadata
oracle.desc(table_name, schema = NULL)oracle.desc(table_name, schema = NULL)
table_name |
Name of the table to get metadata for |
schema |
Optional schema name |
A data frame with column information
Get a list of tables from the Oracle database
oracle.index(pattern = NULL, schema = NULL)oracle.index(pattern = NULL, schema = NULL)
pattern |
Optional regex string; if supplied, only tables whose name matches (case-insensitive) are shown. |
schema |
Optional schema name to filter tables |
A data frame with table information
Perform a direct Oracle query with minimal processing
oracle.query(query, pii = FALSE, schema = NULL)oracle.query(query, pii = FALSE, schema = NULL)
query |
The SQL query to execute |
pii |
Logical; if FALSE (default), remove fields marked as PII. TRUE keeps PII. |
schema |
Optional schema name to qualify table names in the query |
A data frame with the query results
Tests the connection to the Oracle database using the configured DSN and credentials. This is a simple connectivity test that doesn't perform any data operations.
oracle.test()oracle.test()
A logical value indicating whether the connection was successful
## Not run: # Test the Oracle connection if (oracle.test()) { message("Oracle connection successful!") } else { message("Oracle connection failed!") } ## End(Not run)## Not run: # Test the Oracle connection if (oracle.test()) { message("Oracle connection successful!") } else { message("Oracle connection failed!") } ## End(Not run)
Retrieve Survey Data from Qualtrics
qualtrics( qualtrics_alias, ..., institution = NULL, label = FALSE, interview_date = NULL, complete = FALSE )qualtrics( qualtrics_alias, ..., institution = NULL, label = FALSE, interview_date = NULL, complete = FALSE )
qualtrics_alias |
The alias for the Qualtrics survey to be retrieved. |
... |
Optional column names to filter for. Only rows with non-missing values in ALL specified columns will be returned. This is useful for filtering data to only include complete cases for specific variables of interest. |
institution |
Optional. The institution name (e.g., "temple" or "nu"). If NULL, all institutions will be searched. |
label |
Logical indicating whether to return coded values or their associated labels (default is FALSE). |
interview_date |
Optional; can be either: - A date string in various formats (ISO, US, etc.) to filter data up to that date - A boolean TRUE to return only rows with non-NA interview_date values |
complete |
Logical; default FALSE, if TRUE only returns rows where Progress == 100 |
A cleaned and harmonized data frame containing the survey data with superkeys first.
## Not run: # Get survey by alias (will search all institutions) survey_data <- qualtrics("rgpts") ## End(Not run)## Not run: # Get survey by alias (will search all institutions) survey_data <- qualtrics("rgpts") ## End(Not run)
This function extracts column mappings from the metadata of a Qualtrics survey data frame. It can accept either a data frame containing Qualtrics data, a variable name as string, or a survey alias string.
qualtrics.dict(survey_alias, exclude_embedded = TRUE)qualtrics.dict(survey_alias, exclude_embedded = TRUE)
survey_alias |
Can either be an existing dataframe, variable name as string, or survey alias string |
exclude_embedded |
Only select QIDs |
A list containing the mappings of column names to survey questions.
Retrieves a list of all available surveys from the Qualtrics API. Shows all surveys pulled down from Qualtrics, with alias and institution information merged from config.yml where available.
qualtrics.index(pattern = NULL, institution = NULL, all = FALSE)qualtrics.index(pattern = NULL, institution = NULL, all = FALSE)
pattern |
Optional regex string; if supplied, only surveys whose name or alias matches (case-insensitive) are shown. |
institution |
Optional; the institution identifier to use. If NULL, uses all institutions specified in the configuration file (or all available credentials if no config). |
all |
Logical; deprecated parameter kept for backward compatibility. All surveys are now shown by default. Default is FALSE. |
A data frame containing the IDs and names of all available surveys from the Qualtrics API. Surveys with aliases configured in config.yml will show the alias and institution; unmapped surveys will show NA for these fields.
This function fetches a Qualtrics data frame containing multiple surveys and separates it into individual data frames for each survey detected in the data. It identifies the appropriate identifier column (e.g., participantId, workerId) and splits the data based on column name prefixes.
qualtrics.rune( qualtrics_alias, prefix = NULL, institution = NULL, label = FALSE, interview_date = NULL, complete = FALSE, lower = TRUE )qualtrics.rune( qualtrics_alias, prefix = NULL, institution = NULL, label = FALSE, interview_date = NULL, complete = FALSE, lower = TRUE )
qualtrics_alias |
Character string specifying the Qualtrics survey alias to retrieve. |
prefix |
Character string; default NULL, if specified returns only the dataframe with this prefix |
institution |
Character string; default NULL, specify location |
label |
Logical; default TRUE, returns coded values as labels instead of raw values. |
interview_date |
Logical or Date String, returns all data before date |
complete |
Logical; default FALSE, if TRUE only returns rows where Progress == 100 |
lower |
default TRUE convert prefixes to lower case |
The function performs the following steps:
Retrieves the raw Qualtrics data using the getSurvey() function
Identifies which identifier column to use (participantId, workerId, PROLIFIC_PID, or src_subject_id)
Determines survey prefixes by analyzing column names
Creates separate dataframes for each survey prefix found
Assigns each dataframe to the global environment with names matching the survey prefixes
Creates multiple dataframes in the global environment, one for each survey detected in the data. Each dataframe is named after its survey prefix.
## Not run: # Parse a a Qualtrics survey into its component dataframes qualtrics.rune("combined_surveys", label = FALSE) # After running, access individual survey dataframes directly: head(pss) # Access the PSS survey dataframe head(cesd) # Access the CESD survey dataframe # Parse a single Qualtrics survey from composite survey rgpts <- qualtrics.rune("combined_surveys", prefix = "rgpts") ## End(Not run)## Not run: # Parse a a Qualtrics survey into its component dataframes qualtrics.rune("combined_surveys", label = FALSE) # After running, access individual survey dataframes directly: head(pss) # Access the PSS survey dataframe head(cesd) # Access the CESD survey dataframe # Parse a single Qualtrics survey from composite survey rgpts <- qualtrics.rune("combined_surveys", prefix = "rgpts") ## End(Not run)
Retrieves data from a REDCap instrument and ensures subject identifiers are propagated across all events
redcap( instrument_name = NULL, ..., raw_or_label = "raw", redcap_event_name = NULL, batch_size = 1000, records = NULL, fields = NULL, pii = FALSE, interview_date = NULL, date_format = "ymd", complete = NULL )redcap( instrument_name = NULL, ..., raw_or_label = "raw", redcap_event_name = NULL, batch_size = 1000, records = NULL, fields = NULL, pii = FALSE, interview_date = NULL, date_format = "ymd", complete = NULL )
instrument_name |
Name of the REDCap instrument |
... |
Optional column names to filter for. Only rows with non-missing values in ALL specified columns will be returned. This is useful for filtering data to only include complete cases for specific variables of interest. |
raw_or_label |
Whether to return raw or labeled values |
redcap_event_name |
Optional event name filter. Can be a single string
or a vector of event names (e.g., |
batch_size |
Number of records to retrieve per batch |
records |
Optional vector of specific record IDs |
fields |
Optional vector of specific fields |
pii |
Logical; if FALSE (default), remove fields marked as PII. TRUE keeps PII. |
interview_date |
Optional; can be either: - A date string in various formats (ISO, US, etc.) to filter data up to that date - A boolean TRUE to return only rows with non-NA interview_date values |
date_format |
Default ymd define date format for interview_date |
complete |
Option boolean TRUE will return only forms marked as complete in REDCap |
A data frame containing the requested REDCap data
## Not run: # Get data from a specific instrument data <- redcap("demographics") ## End(Not run)## Not run: # Get data from a specific instrument data <- redcap("demographics") ## End(Not run)
This function extracts metadata/dictionary information from REDCap. It can accept either an instrument name to fetch new data, an existing data frame with instrument attributes, or a variable name as string.
redcap.dict(instrument_name)redcap.dict(instrument_name)
instrument_name |
Can either be an instrument name to fetch new data, a data frame returned by redcap(), or a variable name as string |
A data frame containing the data dictionary/metadata for the specified instrument
Retrieves a list of all available REDCap forms as a formatted table
redcap.index(pattern = NULL)redcap.index(pattern = NULL)
pattern |
Optional regex string; if supplied, only instruments whose name or label matches (case-insensitive) are shown. |
A formatted table (kable) of available REDCap instruments/forms
This function fetches a REDCap instrument and separates it into individual data frames for each survey/collection detected in the data based on column name prefixes. It identifies the appropriate identifier column and splits the data accordingly.
redcap.rune( instrument_name, prefix = NULL, raw_or_label = "raw", redcap_event_name = NULL, batch_size = 1000, records = NULL, fields = NULL, pii = FALSE, interview_date = NULL, date_format = "ymd", lower = TRUE )redcap.rune( instrument_name, prefix = NULL, raw_or_label = "raw", redcap_event_name = NULL, batch_size = 1000, records = NULL, fields = NULL, pii = FALSE, interview_date = NULL, date_format = "ymd", lower = TRUE )
instrument_name |
Name of the REDCap instrument |
prefix |
Character string; default NULL, if specified returns only the dataframe with this prefix |
raw_or_label |
Whether to return raw or labeled values |
redcap_event_name |
Optional event name filter. Can be a single string
or a vector of event names (e.g., |
batch_size |
Number of records to retrieve per batch |
records |
Optional vector of specific record IDs |
fields |
Optional vector of specific fields |
pii |
Logical; if FALSE (default), remove fields marked as PII. TRUE keeps PII. |
interview_date |
Optional; date filtering parameter |
date_format |
Default ymd define date format for interview_date |
lower |
default TRUE convert prefixes to lower case |
If prefix is specified, returns a single dataframe with that prefix. Otherwise, creates multiple dataframes in the parent environment, one for each survey detected in the data. Each dataframe is named after its survey prefix.
## Not run: # Parse a REDCap instrument into its component dataframes redcap.rune("baseline_assessment") # After running, access individual survey dataframes directly: head(pss) # Access the PSS survey dataframe head(cesd) # Access the CESD survey dataframe # Parse a single survey from composite instrument rgpts <- redcap.rune("baseline_assessment", prefix = "rgpts") ## End(Not run)## Not run: # Parse a REDCap instrument into its component dataframes redcap.rune("baseline_assessment") # After running, access individual survey dataframes directly: head(pss) # Access the PSS survey dataframe head(cesd) # Access the CESD survey dataframe # Parse a single survey from composite instrument rgpts <- redcap.rune("baseline_assessment", prefix = "rgpts") ## End(Not run)
This function takes a data frame containing multiple measures and separates it into individual data frames for each measure detected in the data. It identifies the appropriate identifier column (e.g., participantId, workerId) and splits the data based on column name prefixes.
rune(df, prefix = NULL, lower = TRUE)rune(df, prefix = NULL, lower = TRUE)
df |
a dataframe containing multiple, prefixed measures |
prefix |
Character string; default NULL, if specified returns only the dataframe with this prefix |
lower |
default TRUE convert prefixes to lower case |
The function performs the following steps:
Identifies which identifier column to use (participantId, workerId, PROLIFIC_PID, or src_subject_id)
Determines survey prefixes by analyzing column names
Creates separate dataframes for each survey prefix found
Assigns each dataframe to the global environment with names matching the survey prefixes
If prefix is specified, returns a single dataframe with that prefix. Otherwise, creates multiple dataframes in the global environment, one for each survey detected in the data. Each dataframe is named after its survey prefix.
# Parse a data frame containing multiple surveys combined_df <- data.frame( record_id = c("REC001", "REC002", "REC003", "REC004"), src_subject_id = c("SUB001", "SUB002", "SUB003", "SUB004"), subjectkey = c("KEY001", "KEY002", "KEY003", "KEY004"), site = c("Yale", "NU", "Yale", "NU"), phenotype = c("A", "B", "A", "C"), visit = c(1, 2, 2, 1), state = c("complete", "completed baseline", "in progress", NA), status = c(NA, NA, NA, "complete"), lost_to_followup = c(FALSE, FALSE, TRUE, NA), interview_date = c("2023-01-15", "2023/02/20", NA, "2023-03-10"), foo_1 = c(1, 3, 5, 7), foo_2 = c("a", "b", "c", "d"), bar_1 = c(2, 4, 6, 8), bar_2 = c("w", "x", "y", "z") ) rune(combined_df) # After running, access individual survey dataframes directly: head(foo) # Access the foo dataframe head(bar) # Access the bar dataframe # Parse a single survey from composite dataframe foo_df <- rune(combined_df, prefix = "foo")# Parse a data frame containing multiple surveys combined_df <- data.frame( record_id = c("REC001", "REC002", "REC003", "REC004"), src_subject_id = c("SUB001", "SUB002", "SUB003", "SUB004"), subjectkey = c("KEY001", "KEY002", "KEY003", "KEY004"), site = c("Yale", "NU", "Yale", "NU"), phenotype = c("A", "B", "A", "C"), visit = c(1, 2, 2, 1), state = c("complete", "completed baseline", "in progress", NA), status = c(NA, NA, NA, "complete"), lost_to_followup = c(FALSE, FALSE, TRUE, NA), interview_date = c("2023-01-15", "2023/02/20", NA, "2023-03-10"), foo_1 = c(1, 3, 5, 7), foo_2 = c("a", "b", "c", "d"), bar_1 = c(2, 4, 6, 8), bar_2 = c("w", "x", "y", "z") ) rune(combined_df) # After running, access individual survey dataframes directly: head(foo) # Access the foo dataframe head(bar) # Access the bar dataframe # Parse a single survey from composite dataframe foo_df <- rune(combined_df, prefix = "foo")
Creates the standard directory structure required for the wizaRdry package to function properly. This includes folders for data cleaning scripts, NDA submission templates, and temporary outputs. It can detect and repair incomplete directory structures, and optionally create an R project.
scry( study_alias = NULL, path = ".", overwrite = FALSE, repair = FALSE, show_tree = NULL, create_project = FALSE, examples = FALSE, skip_prompt = TRUE )scry( study_alias = NULL, path = ".", overwrite = FALSE, repair = FALSE, show_tree = NULL, create_project = FALSE, examples = FALSE, skip_prompt = TRUE )
study_alias |
Character string specifying the short name for the study e.g. impact, capr, sing |
path |
Character string specifying the directory path where the structure should be created. Defaults to the current working directory. |
overwrite |
Logical. If TRUE, will overwrite existing files. If FALSE (default), will not replace existing files. |
repair |
Logical. If TRUE, will attempt to repair an incomplete directory structure. If FALSE, will abort with an error message when encountering an incomplete structure. |
show_tree |
Logical. If TRUE (default on first run), will display a visual file tree. Set to FALSE to suppress the tree view. |
create_project |
Logical. If TRUE, will create an R project file if one doesn't exist. If FALSE (default), will not create an R project. |
examples |
Logical. If TRUE (default when not repairing), will create example script templates. If FALSE (default when repairing), will skip creating example scripts. |
skip_prompt |
Logical. If TRUE (default), will skip the initial confirmation prompt if y/n preferences are not set yet. FALSE if specified. |
The function creates the following directory structure:
clean/
csv/
mongo/
qualtrics/
redcap/
oracle/
sql/
nda/
csv/
mongo/
qualtrics/
redcap/
oracle/
sql/
tmp/
It also creates template config.yml and secrets.R files, and optionally an R project file.
Invisible TRUE if successful.
## Not run: # Initialize in current directory scry() # Repair structure in current directory scry(repair = TRUE) # Initialize in a specific directory with an R project scry("path/to/project", create_project = TRUE, repair = TRUE) # Skip the tree display scry(repair = TRUE, show_tree = FALSE) # Explicitly create example scripts when repairing scry(repair = TRUE, examples = TRUE) # Skip the confirmation prompt scry(skip_prompt = TRUE) ## End(Not run)## Not run: # Initialize in current directory scry() # Repair structure in current directory scry(repair = TRUE) # Initialize in a specific directory with an R project scry("path/to/project", create_project = TRUE, repair = TRUE) # Skip the tree display scry(repair = TRUE, show_tree = FALSE) # Explicitly create example scripts when repairing scry(repair = TRUE, examples = TRUE) # Skip the confirmation prompt scry(skip_prompt = TRUE) ## End(Not run)
Filter data frame by superkey parameters, rows, and columns
sift( df, rows = NULL, cols = NULL, record_id = NULL, src_subject_id = NULL, subjectkey = NULL, site = NULL, subsiteid = NULL, sex = NULL, race = NULL, ethnic_group = NULL, phenotype = NULL, phenotype_description = NULL, status = NULL, lost_to_followup = NULL, twins_study = NULL, sibling_study = NULL, family_study = NULL, sample_taken = NULL, visit = NULL, week = NULL, arm = NULL, interview_date = NULL )sift( df, rows = NULL, cols = NULL, record_id = NULL, src_subject_id = NULL, subjectkey = NULL, site = NULL, subsiteid = NULL, sex = NULL, race = NULL, ethnic_group = NULL, phenotype = NULL, phenotype_description = NULL, status = NULL, lost_to_followup = NULL, twins_study = NULL, sibling_study = NULL, family_study = NULL, sample_taken = NULL, visit = NULL, week = NULL, arm = NULL, interview_date = NULL )
df |
Dataframe to be filtered and trimmed based on the provided parameters. |
rows |
Optional; either a single row name or a vector of row names to be retained in the final output. If NULL or empty, all rows in the dataframe are retained. |
cols |
Optional; either a single column name or a vector of column names to be retained in the final output. If NULL or empty, all columns in the dataframe are retained.#' Data Filter |
record_id |
Optional; either a single record_id or a vector of record_ids to filter the dataframe by |
src_subject_id |
Optional; either a single subject ID or a vector of subject IDs to filter the dataframe by |
subjectkey |
Optional; either a single subjectkey or a vector of subjectkeys to filter the dataframe by |
site |
Optional; either a single site value or a vector of site values to filter the dataframe by (e.g., Yale, NU) |
subsiteid |
Optional; either a single subsiteid or a vector of subsiteids to filter the dataframe by |
sex |
Optional; either a single sex value or a vector of sex values at birth to filter the dataframe by (e.g., 'M', 'F') |
race |
Optional; either a single race value or a vector of race values to filter the dataframe by |
ethnic_group |
Optional; either a single ethnic_group value or a vector of ethnic_group values to filter the dataframe by |
phenotype |
Optional; either a single phenotype value or a vector of phenotype values to filter the dataframe by |
phenotype_description |
Optional; either a single phenotype_description or a vector of phenotype_descriptions to filter the dataframe by |
status |
Optional; either a single status string or a vector of status conditions to filter the dataframe by. Used if either 'state' or 'status' column exists in the dataframe. Can include values like 'complete', 'completed baseline', 'completed 12m', 'completed 24m', etc. |
lost_to_followup |
Optional; either a single value or a vector of values to filter the dataframe by (checks both 'lost_to_followup' and 'lost_to_follow-up' columns) |
twins_study |
Optional; either a single twins_study value or a vector of twins_study values to filter the dataframe by |
sibling_study |
Optional; either a single sibling_study value or a vector of sibling_study values to filter the dataframe by |
family_study |
Optional; either a single family_study value or a vector of family_study values to filter the dataframe by |
sample_taken |
Optional; either a single sample_taken value or a vector of sample_taken values to filter the dataframe by |
visit |
Optional; either a single visit value or a vector of visit values to filter the dataframe by. Only used if 'visit' column exists in the dataframe. |
week |
Optional; either a single week value or a vector of week values to filter the dataframe by. Only used if 'week' column exists in the dataframe. |
arm |
Optional; either a single arm value or a vector of arm values to filter the dataframe by (e.g., drug, placebo) |
interview_date |
Optional; can be either: - A date string in various formats (ISO, US, etc.) to filter data up to that date - A boolean TRUE to return only rows with non-NA interview_date values |
A filtered dataframe based on the provided parameters, and containing only the columns specified in 'cols'. If no columns are specified, returns the entire dataframe with applied row filters.
# Create a sample dataframe sample_df <- data.frame( record_id = c("REC001", "REC002", "REC003", "REC004"), src_subject_id = c("SUB001", "SUB002", "SUB003", "SUB004"), subjectkey = c("KEY001", "KEY002", "KEY003", "KEY004"), site = c("Yale", "NU", "Yale", "NU"), phenotype = c("A", "B", "A", "C"), visit = c(1, 2, 2, 1), state = c("complete", "completed baseline", "in progress", NA), status = c(NA, NA, NA, "complete"), lost_to_followup = c(FALSE, FALSE, TRUE, NA), interview_date = c("2023-01-15", "2023/02/20", NA, "2023-03-10") ) # Set row names for demonstration rownames(sample_df) <- c("foo", "bar", "baz", "qux") # Filter by specific date filtered1 <- sift(sample_df, cols = c("src_subject_id", "phenotype"), visit = 2, interview_date = "01/31/2023") # Filter to include only rows with non-NA interview dates filtered2 <- sift(sample_df, interview_date = TRUE) # Filter by status (works with either state or status column) filtered3 <- sift(sample_df, status = c("complete", "completed baseline")) # Filter with specific row names filtered4 <- sift(sample_df, rows = c("foo", "qux")) # Filter with vector of visit values filtered6 <- sift(sample_df, visit = c(1, 2)) # Filter by lost_to_followup filtered10 <- sift(sample_df, lost_to_followup = FALSE) # Filter by src_subject_id filtered11 <- sift(sample_df, src_subject_id = c("SUB001", "SUB004")) # Multiple filters combined filtered12 <- sift(sample_df, site = "Yale", visit = 1, cols = c("record_id", "src_subject_id", "site"))# Create a sample dataframe sample_df <- data.frame( record_id = c("REC001", "REC002", "REC003", "REC004"), src_subject_id = c("SUB001", "SUB002", "SUB003", "SUB004"), subjectkey = c("KEY001", "KEY002", "KEY003", "KEY004"), site = c("Yale", "NU", "Yale", "NU"), phenotype = c("A", "B", "A", "C"), visit = c(1, 2, 2, 1), state = c("complete", "completed baseline", "in progress", NA), status = c(NA, NA, NA, "complete"), lost_to_followup = c(FALSE, FALSE, TRUE, NA), interview_date = c("2023-01-15", "2023/02/20", NA, "2023-03-10") ) # Set row names for demonstration rownames(sample_df) <- c("foo", "bar", "baz", "qux") # Filter by specific date filtered1 <- sift(sample_df, cols = c("src_subject_id", "phenotype"), visit = 2, interview_date = "01/31/2023") # Filter to include only rows with non-NA interview dates filtered2 <- sift(sample_df, interview_date = TRUE) # Filter by status (works with either state or status column) filtered3 <- sift(sample_df, status = c("complete", "completed baseline")) # Filter with specific row names filtered4 <- sift(sample_df, rows = c("foo", "qux")) # Filter with vector of visit values filtered6 <- sift(sample_df, visit = c(1, 2)) # Filter by lost_to_followup filtered10 <- sift(sample_df, lost_to_followup = FALSE) # Filter by src_subject_id filtered11 <- sift(sample_df, src_subject_id = c("SUB001", "SUB004")) # Multiple filters combined filtered12 <- sift(sample_df, site = "Yale", visit = 1, cols = c("record_id", "src_subject_id", "site"))
Retrieves data from a SQL table and optionally joins it with a primary keys table as specified in the configuration.
sql( table_name = NULL, ..., fields = NULL, where_clause = NULL, join_primary_keys = TRUE, custom_query = NULL, max_rows = NULL, date_format = NULL, batch_size = 1000, pii = FALSE, interview_date = NULL, all = FALSE )sql( table_name = NULL, ..., fields = NULL, where_clause = NULL, join_primary_keys = TRUE, custom_query = NULL, max_rows = NULL, date_format = NULL, batch_size = 1000, pii = FALSE, interview_date = NULL, all = FALSE )
table_name |
Name of the SQL table or view to query |
... |
Optional column names to filter for. Only rows with non-missing values in ALL specified columns will be returned. |
fields |
Optional vector of specific fields to select |
where_clause |
Optional WHERE clause to filter results (without the "WHERE" keyword) |
join_primary_keys |
Boolean, whether to join with the primary keys table (default: TRUE) |
custom_query |
Optional custom SQL query to execute instead of building one |
max_rows |
Optional limit on the number of rows to return |
date_format |
Optional format for date fields (default uses ISO format) |
batch_size |
Number of records to retrieve per batch for large datasets |
pii |
Logical; if FALSE (default), remove fields marked as PII. TRUE keeps PII. |
interview_date |
Optional; can be either: - A date string in various formats (ISO, US, etc.) to filter data up to that date - A boolean TRUE to return only rows with non-NA interview_date values |
all |
Logical; if TRUE, use LEFT OUTER JOIN instead of INNER JOIN (default: FALSE), similar to the 'all' parameter in base R's merge() function |
A data frame containing the requested SQL data
## Not run: # Get data from a specific table data <- sql("participants") # Get data with a where clause survey_data <- sql("vw_surveyquestionresults", where_clause = "resultidentifier = 'NRS'") # Get all records, including those without matching primary key all_data <- sql("candidate", all = TRUE) ## End(Not run)## Not run: # Get data from a specific table data <- sql("participants") # Get data with a where clause survey_data <- sql("vw_surveyquestionresults", where_clause = "resultidentifier = 'NRS'") # Get all records, including those without matching primary key all_data <- sql("candidate", all = TRUE) ## End(Not run)
Get SQL table columns/metadata
sql.desc(table_name)sql.desc(table_name)
table_name |
Name of the table to get metadata for |
A data frame with column information
Get a list of tables from the SQL database
sql.index(pattern = NULL, schema = NULL)sql.index(pattern = NULL, schema = NULL)
pattern |
Optional regex string; if supplied, only tables whose name matches (case-insensitive) are shown. |
schema |
Optional schema name to filter tables |
A data frame with table information
Perform a direct SQL query with minimal processing
sql.query(query, pii = FALSE)sql.query(query, pii = FALSE)
query |
The SQL query to execute |
pii |
Logical; if FALSE (default), remove fields marked as PII. TRUE keeps PII. |
A data frame with the query results
This function exports a given R data frame to a CSV file format. The resulting file is saved in the "tmp" directory. If a filename is not specified, the function uses the name of the data frame variable. The ".csv" extension is appended automatically to the filename. The function will prompt for confirmation before creating the file, with an option to remember the user's preference for future calls.
to.csv(df, df_name = NULL, path = ".", skip_prompt = TRUE)to.csv(df, df_name = NULL, path = ".", skip_prompt = TRUE)
df |
Data frame to be exported to CSV format. |
df_name |
Optional; a custom file name for the saved CSV file. If not provided, the name of the data frame variable is used. The function adds the ".csv" extension automatically. |
path |
Character string specifying the directory path where the "tmp" folder and CSV file should be created. Defaults to the current working directory. |
skip_prompt |
Logical. If TRUE (default), skips the confirmation prompt. If FALSE, will prompt for confirmation unless the user has previously chosen to remember their preference. |
Invisible TRUE if successful. The function writes a CSV file to the specified path and prints a message indicating the file's location.
Joshua Kenney [email protected]
## Not run: # Create a sample data frame sample_df <- data.frame( id = 1:3, name = c("Alice", "Bob", "Charlie") ) # Basic usage with prompt to.csv(sample_df) # Custom filename to.csv(sample_df, "participants_data") # Skip the confirmation prompt to.csv(sample_df, skip_prompt = TRUE) # Save in a different directory to.csv(sample_df, path = "path/to/project") ## End(Not run)## Not run: # Create a sample data frame sample_df <- data.frame( id = 1:3, name = c("Alice", "Bob", "Charlie") ) # Basic usage with prompt to.csv(sample_df) # Custom filename to.csv(sample_df, "participants_data") # Skip the confirmation prompt to.csv(sample_df, skip_prompt = TRUE) # Save in a different directory to.csv(sample_df, path = "path/to/project") ## End(Not run)
This function creates a CSV template file for National Data Archive (NDA) submissions. It extracts the data from a specified data frame and formats it according to NDA requirements, with the structure name split into base name and suffix in the first line. The function will prompt for confirmation before creating the file, with an option to remember the user's preference for future calls.
This function creates a CSV template file for National Data Archive (NDA) submissions. It extracts the data from a specified data frame and formats it according to NDA requirements, with the structure name split into base name and suffix in the first line. The function will prompt for confirmation before creating the file, with an option to remember the user's preference for future calls.
to.nda( df, path = ".", skip_prompt = TRUE, selected_fields = NULL, skip_prompts = FALSE, verbose = FALSE ) to.nda( df, path = ".", skip_prompt = TRUE, selected_fields = NULL, skip_prompts = FALSE, verbose = FALSE )to.nda( df, path = ".", skip_prompt = TRUE, selected_fields = NULL, skip_prompts = FALSE, verbose = FALSE ) to.nda( df, path = ".", skip_prompt = TRUE, selected_fields = NULL, skip_prompts = FALSE, verbose = FALSE )
df |
Data frame to be used as template or character string naming a data frame in the global environment. |
path |
Character string specifying the directory path where the "tmp" folder and template file should be created. Defaults to the current working directory. |
skip_prompt |
Logical. If TRUE (default), skips the confirmation prompt. If FALSE, will prompt for confirmation unless the user has previously chosen to remember their preference. |
selected_fields |
Character vector of field names to include in template. If NULL (default), uses all fields from data frame. Used by create_nda_files() for centralized field selection. |
skip_prompts |
Logical. If TRUE, skip ALL interactive prompts (used when called from create_nda_files() with pre-selected fields). Default: FALSE. |
verbose |
Logical. If TRUE, show detailed progress messages. Default: FALSE. |
The function will:
Create a 'tmp' directory if it doesn't exist
Parse the structure name into base and suffix components (e.g., "eefrt01" -> "eefrt" and "01")
Write the structure name components as the first line
Write column headers as the second line
Write the data rows below
The function will:
Create a 'tmp' directory if it doesn't exist
Parse the structure name into base and suffix components (e.g., "eefrt01" -> "eefrt" and "01")
Write the structure name components as the first line
Write column headers as the second line
Write the data rows below
Invisible TRUE if successful. Creates a CSV file at the specified path and prints a message with the file location.
Invisible TRUE if successful. Creates a CSV file at the specified path and prints a message with the file location.
## Not run: # First create some sample data eefrt01 <- data.frame( src_subject_id = c("SUB001", "SUB002"), interview_age = c(240, 360), interview_date = c("01/01/2023", "02/15/2023"), response_time = c(450, 520) ) # Create the NDA template using the data frame directly to.nda(eefrt01) # Or using the name as a string to.nda("eefrt01") # Skip the confirmation prompt to.nda(eefrt01, skip_prompt = TRUE) ## End(Not run) ## Not run: # First create some sample data eefrt01 <- data.frame( src_subject_id = c("SUB001", "SUB002"), interview_age = c(240, 360), interview_date = c("01/01/2023", "02/15/2023"), response_time = c(450, 520) ) # Create the NDA template using the data frame directly to.nda(eefrt01) # Or using the name as a string to.nda("eefrt01") # Skip the confirmation prompt to.nda(eefrt01, skip_prompt = TRUE) ## End(Not run)## Not run: # First create some sample data eefrt01 <- data.frame( src_subject_id = c("SUB001", "SUB002"), interview_age = c(240, 360), interview_date = c("01/01/2023", "02/15/2023"), response_time = c(450, 520) ) # Create the NDA template using the data frame directly to.nda(eefrt01) # Or using the name as a string to.nda("eefrt01") # Skip the confirmation prompt to.nda(eefrt01, skip_prompt = TRUE) ## End(Not run) ## Not run: # First create some sample data eefrt01 <- data.frame( src_subject_id = c("SUB001", "SUB002"), interview_age = c(240, 360), interview_date = c("01/01/2023", "02/15/2023"), response_time = c(450, 520) ) # Create the NDA template using the data frame directly to.nda(eefrt01) # Or using the name as a string to.nda("eefrt01") # Skip the confirmation prompt to.nda(eefrt01, skip_prompt = TRUE) ## End(Not run)
This function exports a given R data frame to an RDS file format. The resulting file is saved in the "tmp" directory. If a filename is not specified, the function uses the name of the data frame variable. The ".rds" extension is appended automatically to the filename. The function will prompt for confirmation before creating the file, with an option to remember the user's preference for future calls.
to.rds(df, df_name = NULL, path = ".", skip_prompt = TRUE)to.rds(df, df_name = NULL, path = ".", skip_prompt = TRUE)
df |
Data frame to be exported to RDS format. |
df_name |
Optional; a custom file name for the saved RDS file. If not provided, the name of the data frame variable is used. The function adds the ".rds" extension automatically. |
path |
Character string specifying the directory path where the "tmp" folder and RDS file should be created. Defaults to the current working directory. |
skip_prompt |
Logical. If TRUE (default), skips the confirmation prompt. If FALSE, will prompt for confirmation unless the user has previously chosen to remember their preference. |
Invisible TRUE if successful. The function writes an RDS file to the specified path and prints a message indicating the file's location.
## Not run: # Create a sample data frame sample_df <- data.frame( id = 1:3, name = c("Alice", "Bob", "Charlie") ) # Basic usage with prompt to.rds(sample_df) # Custom filename to.rds(sample_df, "participants_data") # Skip the confirmation prompt to.rds(sample_df, skip_prompt = TRUE) # Save in a different directory to.rds(sample_df, path = "path/to/project") ## End(Not run)## Not run: # Create a sample data frame sample_df <- data.frame( id = 1:3, name = c("Alice", "Bob", "Charlie") ) # Basic usage with prompt to.rds(sample_df) # Custom filename to.rds(sample_df, "participants_data") # Skip the confirmation prompt to.rds(sample_df, skip_prompt = TRUE) # Save in a different directory to.rds(sample_df, path = "path/to/project") ## End(Not run)
This function takes a R data frame and writes it to an SPSS file using the Haven package. The resulting file will be stored in the "tmp" directory with a default name derived from the data frame variable name, but can be customized if desired. The function will prompt for confirmation before creating the file, with an option to remember the user's preference for future calls.
to.sav(df, df_name = NULL, path = ".", skip_prompt = TRUE)to.sav(df, df_name = NULL, path = ".", skip_prompt = TRUE)
df |
Data frame to be exported to SPSS format. |
df_name |
Optional; custom file name for the saved SPSS file. If not provided, the name of the data frame variable will be used. The ".sav" extension will be appended automatically. |
path |
Character string specifying the directory path where the "tmp" folder and SPSS file should be created. Defaults to the current working directory. |
skip_prompt |
Logical. If TRUE (default), skips the confirmation prompt. If FALSE, will prompt for confirmation unless the user has previously chosen to remember their preference. |
Invisible TRUE if successful. Writes an SPSS file to the designated path and prints a message indicating the file's location.
## Not run: # Create a sample data frame sample_df <- data.frame( id = 1:3, score = c(85, 92, 78), group = c("A", "B", "A") ) # Basic usage with prompt to.sav(sample_df) # Custom filename to.sav(sample_df, "participants_data") # Skip the confirmation prompt to.sav(sample_df, skip_prompt = TRUE) # Save in a different directory to.sav(sample_df, path = "path/to/project") ## End(Not run)## Not run: # Create a sample data frame sample_df <- data.frame( id = 1:3, score = c(85, 92, 78), group = c("A", "B", "A") ) # Basic usage with prompt to.sav(sample_df) # Custom filename to.sav(sample_df, "participants_data") # Skip the confirmation prompt to.sav(sample_df, skip_prompt = TRUE) # Save in a different directory to.sav(sample_df, path = "path/to/project") ## End(Not run)
Deprecated functions in wizaRdry
These functions are deprecated and may be removed in a future release. Prefer the suggested replacements.
createCsv(...)Use to.csv(...) instead.
createRds(...)Use to.rds(...) instead.
createSpss(...)Use to.sav(...) instead.
dataFilter(...)Use sift(...) instead.
dataMerge(...)Use meld(...) instead.
dataRequest(...)Use clean(...) instead.
getRedcap(...)Use redcap(...) instead.
getSurvey(...)Use qualtrics(...) instead.
getTask(...)Use mongo(...) instead.
ndaRequest(...)Use nda(...) instead.
help("Deprecated")