ProjectTemplate: Automates the Creation of New Statistical Analysis Projects
Description
Provides functions to automatically build a directory structure for a new R project. Using this structure, 'ProjectTemplate' automates data loading, preprocessing, library importing and unit testing.
Author(s)
Maintainer: Kenton White jkentonwhite@gmail.com [contributor]
Authors:
John Myles White [copyright holder]
Other contributors:
Aleksandar Blagotic [contributor]
Diego Valle-Jones [contributor]
Jeffrey Breen [contributor]
Joakim Lundborg [contributor]
Josh Bode [contributor]
Kirill Mueller [contributor]
Matteo Redaelli [contributor]
Noah Lorang [contributor]
Patrick Schalk [contributor]
Dominik Schneider [contributor]
Gerold Hepp [contributor]
Zunaira Jamil [contributor]
Glen Falk [contributor]
See Also
Useful links:
Associate a reader function with an extension.
Description
This function will associate an extension with a custom reader function.
Usage
.add.extension(extension, reader)
Arguments
extension
The extension of the new data file.
reader
The function to use when reading the data file. It should
accept three arguments: data.file, filename and
variable.name (in that order). The function should read the
contents of the file filename, and save it into the workspace
under the name variable.name. The data.file argument
is just a relative file name and can be ignored.
Value
No value is returned; this function is called for its side effects.
Warning
This interface should not be considered as stable and is likely to be replaced by a different mechanism in a forthcoming version of this package.
See Also
Examples
## Not run: .add.extension('foo', foo.reader)
Attach a package or add a namespace
Description
Internal method to attach a package or only add the namespace.
Usage
.attach.or.add.namespace(package.name, attach)
Arguments
package.name
name of the package to load, as a character vector
attach
boolean indicating whether to attach the package in the global namespace
Value
Boolean indicating whether the package was successfully loaded
Construct the file names for the cache and hash
Description
Construct the file names for the cache and hash
Usage
.cache.filename(variable, cache_format)
Arguments
variable
Variable name for which to construct file names
cache_format
expression as returned by .cache.format
Details
The returned object is a list with two fields:
-
data: The path to the file in which the variable contents will be saved; -
hash: The path to the file in which the cache metadata will be stored.
Value
A list with file names
Get configured cache file format strategy
Description
Get configured cache file format strategy
Usage
.cache.format()
Value
A named object of class expression .
Calculate the hash of the data stored in a variable
Description
Calculate the hash of the data stored in a variable
Usage
.cache.hash(variables, env = .TargetEnv)
Arguments
variables
character vector of variable names
env
environment from which to load the variable
Details
The hashes are calculated using the digest::digest function.
Value
data.frame with the variable names and the corresponding hashes
Print the current cache status
Description
Print the current cache status
Usage
.cache.status()
Value
No value is returned; this function is called for its side effects.
List all cached variables
Description
List all variables for which files are available in the cache. The info is
purely based on the files in the cache directory. There is no
guarantee the variable can actually be loaded from the cache.
Usage
.cached.variables()
Value
Character vector of cached variables
Compare the project version with the current ProjectTemplate version
Description
Compare the project version with the current ProjectTemplate version
Usage
.check.version(config, warn.migrate = TRUE)
Arguments
config
Project configuration
warn.migrate
Logical indicating whether a warning should be raised if the project version is older than the installed version of ProjectTemplate.
Value
0 if the numbers are equal, -1 if b is later
and 1 if a is later (analogous to the C function
strcmp).
Gives an R error on malformed inputs.
Convert one or more data sets to data.tables
Description
Converts all base::data.frames referred to in the input to
data.tables. The resulting data set is stored in the
.TargetEnv.
Usage
.convert.to.data.table(data.sets)
Arguments
data.sets
A character vector of variable names.
Value
No value is returned; this function is called for its side effects.
Convert one or more data sets to tibbles
Description
Converts all base::data.frames referred to in the input to
tibbles. The resulting data set is stored in the
.TargetEnv.
Usage
.convert.to.tibble(data.sets)
Arguments
data.sets
A character vector of variable names.
Value
No value is returned; this function is called for its side effects.
Create a data.frame with the cache metadata
Description
Create a data.frame with the cache metadata
Usage
.create.cache.hash(variable, depends, CODE)
Arguments
variable
Name of the variable to be cached
depends
Vector of variable names of dependencies for the variable to be cached, optional.
CODE
Code block to generate variable, registered as a dependency, optional.
Details
The hashes for the various objects are calculated using the .cache.hash
function.
Value
data.frame containing the variable name and its dependencies, with the
corresponding hashes appended.
See Also
Create a project structure
Description
.create.project.existing creates a project directory structure inside
an existing directory with the default files from a given template.
.create.project.new first creates a new directory and then passes
further control to .create.project.existing. In case the project
creation fails, the newly created directory is cleaned up.
Usage
.create.project.existing(
project.name,
merge.strategy,
template,
rstudio.project
)
.create.project.new(project.name, template, rstudio.project)
Arguments
project.name
Character vector with the name of the project directory
merge.strategy
Character vector determining whether the directory should be empty or is allowed to contain non-conflicting files
template
Name of the template from which the project should be created
rstudio.project
Logical indicating whether an .Rproj file
should be created
Value
No value is returned; this function is called for its side effects.
See Also
create.project , create.template
Check if a directory is empty
Description
Checks if the directory listing by .list.files.and.dirs is empty.
Usage
.dir.empty(path)
Arguments
path
Character vector containing the path to the directory to check.
Value
Logical indicating whether the passed directory was empty.
Run code and assign the results to variable
Description
Run code and assign the results to variable
Usage
.evaluate.code(variable, CODE)
Arguments
variable
variable name in which to store the result of CODE
CODE
code block that returns a result which can be stored in a variable
Details
No error handling is done on the executed code, nor is the
Get the location of a template from its name
Description
Checks the configured option('ProjectTemplate.templatedir') for the
template. If no matching template was found the system templates are checked,
and finally the current directory is checked. If no template was found with
the given name an error is raised.
Usage
.get.template(template)
Arguments
template
Character vector containing the name of the template
Value
Character vector containing the location of the template. If no template was found by the given name an error is raised.
Check if the project was loaded
Description
Currently does a very basic check to see if the variable project.info
exists in the .TargetEnv. No check is performed on the contents of the
variable.
Usage
.has.project()
Value
Logical indicating whether the project was loaded.
Initialize the logger for the project
Description
Creates a log4r::logger and provides a default log file
log/project.log.
Usage
.init.logger(config, my.project.info)
Arguments
config
Named list containing the project configuration
my.project.info
Named list containing the project information
Value
Returns my.project.info amended with the new information.
Test whether a given path is a ProjectTemplate project
Description
Test whether a given path is a ProjectTemplate project
Usage
.is.ProjectTemplate(path = getwd())
Arguments
path
Directory to check, defaults to the current working directory.
Value
Logical indicating whether the given path is a valid project.
Check whether the cache is empty
Description
Check whether the cache is empty
Usage
.is.cache.empty()
Value
Logical indicating whether the cache is empty
Check whether variables are cached
Description
Check whether variables are cached
Usage
.is.cached(varnames)
Arguments
varnames
Character vector of variable names
Value
Logical vector indicating whether the variable is in the cache.
Check if path is an existing directory
Description
Checks if a given path exists, and if so if it is a directory.
Usage
.is.dir(path)
Arguments
path
Character vector containing the path to the directory to check.
Value
Logical indicating a valid directory was passed.
Build the list of data available for loading into memory
Description
This function produces a data.frame of all data files in the project, with
meta data on if and how the file will be loaded by load.project.
Usage
.list.data(config)
Arguments
config
List containing the configuration to use.
Details
The returned data.frame contains the following variables, with one
observation per file in data/:
filename Character variable containing the filename relative
to data/ directory.
varname Character variable containing the name of the variable
into which the file will be imported. *
is_ignored Logical variable that indicates whether the file.
is ignored through the data_ignore option in the configuration
is_directory Logical variable that indicates whether the file
is a directory.
is_cached Logical variable that indicates whether the file is
already available in the cache/ directory.
cached_only Logical variable that indicates whether the
variable is only available in the cache/ directory. This occurs
when calling the cache function with a code fragment in a munge script.
reader Character variable containing the name of the reader
function that will be used to load the data. Contains a
character(0) if no suitable reader was found.
* Note that some readers return more than one variable, usually with the
listed variable name as prefix. This is true for for example the
xls.reader and xlsx.reader.
Value
A data.frame listing the available data, with relevant meta data
List all files and directories, excluding .. and .
Description
Creates a directory listing of a given path, including hidden files and subdirectories, but excluding the .. and . aliases.
Usage
.list.files.and.dirs(path)
Arguments
path
Character vector indicating the path to the parent folder of which the contents should be listed.
Value
Directory listing of path
Load the data from the cache and data directories
Description
Gets the list of available variables in cache/ and data/ and
loads the data in memory. Data from the cache is loaded first, then in
alphabetical order.
Usage
.load.data(config, my.project.info)
Arguments
config
Named list containing the project configuration
my.project.info
Named list containing the project information
Value
Returns my.project.info amended with the new information.
Load the helper functions
Description
Sources all helper scripts in lib. If lib/globals.R exists this
is loaded first, all other scripts are sourced in alphabetical order.
Usage
.load.helpers(config, my.project.info)
Arguments
config
Named list containing the project configuration
my.project.info
Named list containing the project information
Value
Returns my.project.info amended with the new information.
Load the libraries listed in the configuration into memory
Description
Load the libraries listed in the libraries entry in global.dcf and add the
library names to the project.info.
Usage
.load.libraries(config, my.project.info)
Arguments
config
Named list containing the project configuration
my.project.info
Named list containing the project information
Value
Returns my.project.info amended with the new information.
Source all munge scripts
Description
Sources all munge scripts in the munge directory in alphabetical
order.
Usage
.munge.data(config, my.project.info)
Arguments
config
Named list containing the project configuration
my.project.info
Named list containing the project information
Value
Returns my.project.info amended with the new information.
Get the current ProjectTemplate version
Description
Reads the installed version of ProjectTemplate from the DESCRIPTION
file.
Usage
.package.version()
Value
Version as a character vector.
Match readers to the extensions of the data files
Description
Match readers to the extensions of the data files
Usage
.parse.extensions(data.files, config)
Arguments
data.files
a vector of paths to data files
Value
A list of readers and varnames
Prepare a regular expression for matching files to be ignored
Description
Constructs a single regular expression for matching file names in data that should not be imported. It can detect literal names, globs with wildcards and regular expressions.
Usage
.prepare.data.ignore.regex(ignore_files)
Arguments
ignore_files
A comma separated character vector that lists all patterns to be matched for ignoring
Value
A chained regular expression that matches all patterns in the
ignore_files variable.
Make sure a required directory exists before usage
Description
Checks if the requested directory exists, and if not creates the directory. In the latter case a warning is raised.
Usage
.provide.directory(name)
Arguments
name
Character vector containing the name of the required directory.
Value
No value is returned; this function is called for its side effects.
Stop silently
Description
Temporarily disable option(show.error.messages) and stop execution.
Usage
.quietstop()
Value
No value is returned; this function is called for its side effects.
Read metadata for a variable in the cache
Description
Read metadata for a variable in the cache
Usage
.read.cache.info(variable)
Arguments
variable
Variable name for which to look up the metadata
Details
The returned object is a list with two fields:
-
in.cache: Logical indicating whether the requested variable was found in the cache -
hash: A data.frame as was created by.create.cache.hash
Value
list with metadata, see Details for more info.
Remove variables to keep from a list of candidates for removal
Description
Remove variables to keep from a list of candidates for removal
Usage
.remove.sticky.vars(names, keep)
Arguments
names
character vector of variable names that are candidate for removal
keep
character vector of variable names that should not be removed
Details
If the sticky_variables option is part of the config
variable the config variable itself is added to the list of variables
to keep. Also all variables listed in config$sticky_variables in a
comma separated list are added to keep.
Value
A character vector containing the variables to remove.
Require internal package
Description
Internal method to require a package that is necessary for the internal functioning of ProjectTemplate. Never attaches the package unless configured to do so in global.dcf (which throws a warning).
Usage
.require.package(package.name)
Arguments
package.name
name of the package to load, as a character vector
Value
No value is returned; this function is called for its side effects.
Return an RStudio project file as character vector
Description
Return an RStudio project file as character vector
Usage
.rstudioprojectfile()
Value
Character vector with the contents of an empty RStudio project file
Raise an error if given path is not a valid project
Description
Function to stop processing if the path is not a Project Template return the project name if it is a Project Template directory.
Usage
.stopifnotproject(additional_message = "", path = getwd())
Arguments
additional_message
Optional message to show if the given path is not a valid project
path
Path to check if it is a valid project
Value
Project name if it is a valid Project.
Raise an error if given path is a valid project
Description
Function to stop processing if the path is a Project Template.
Usage
.stopifproject(additional_message = "", path = getwd())
Arguments
additional_message
Optional message to show if the given path is not a valid project
path
Path to check if it is a valid project
Value
No value is returned; this function is called for its side effects
Unload the project variables keeping the data
Description
Removes the config, logger and project.info variables
from memory, leaving all data variables in place.
Usage
.unload.project()
Value
No value is returned; this function is called for its side effects.
Compare sets of variable names
Description
Compare the variables (excluding functions) in the global env with a passed in string of names and return the set difference.
Usage
.var.diff.from(given.var.list = "", env = .TargetEnv)
Arguments
given.var.list
Character vector of variable names
env
Environment in which to compare the sets of variables
Write a variable and its metadata to cache
Description
Write a variable and its metadata to cache
Usage
.write.cache(cache.hash, ...)
Arguments
cache.hash
a data.frame with metadata about the variable, see details for more information.
...
extra parameters passed to save .
Details
cache.hash is a data frame with two columns: variable and hash.
Row name VAR is the name of the variable to save.
Row name CODE is the hash value of the code to compute variable.
Row name DEPENDS.* are the dependent variables that CODE depends on.c
The helper function .create.cache.hash creates a suitable dataframe
Value
No value is returned, this function is called for its side effects.
Add project specific config to the global config
Description
Enables project specific configuration to be added to the global config object. The
allowable format is key value pairs which are appended to the end of the config
object, which is accessible from the global environment.
Usage
add.config(..., apply.override = FALSE)
Arguments
...
A series of key-value pairs containing the configuration. The key is the
name that gets added to the config object. These can be overridden at load
time through the ... argument to load.project .
apply.override
A boolean indicating whether overrides should be applied. This
can be used to add a setting disregarding arguments to load.project
Details
Once defined, the value can be accessed from any ProjectTemplate script by
referencing config$my_project_var.
Examples
library('ProjectTemplate')
## Not run:
add.config(
keep_bigdata=TRUE, # Whether to keep the big data file in memory
parse=7 # number of fields to parse
)
if (config$keep_bigdata) ...
## End(Not run)
Cache a data set for faster loading.
Description
This function will store a copy of the named data set in the cache
directory. This cached copy of the data set will then be given precedence
at load time when calling load.project . Cached data sets are
stored as .RData or optionally as .qs files.
Usage
cache(variable = NULL, CODE = NULL, depends = NULL, tidyCODE = TRUE, ...)
Arguments
variable
A character string containing the name of the variable to be saved. If the CODE parameter is defined, it is evaluated and saved, otherwise the variable with that name in the global environment is used.
CODE
A sequence of R statements enclosed in {..} which produce the object to be
cached. Requires suggested package formatR.
depends
A character vector of other global environment objects that the CODE depends upon. Caching will be forced if those objects have changed since last caching
tidyCODE
A logical scalar specifying if the CODE shall be tidied with
the help of tidy_source . As, for example, whitespace
changes do not change the meaning of the code and therefore should not
invalidate the cache, this usually is a desired feature. However, in case
the CODE contains, for example, complex SQL statements this might fail and
skipping this step is an even more desirable feature.
...
Additional arguments passed on to save or optionally
to qsave . See project.config for further
information.
Details
Usually you will want to cache datasets during munging. This can be the raw
data just loaded, or it can be the result of further processing during munge. Either
way, it can take a while to cache large variables, so cache will only cache when it
needs to.
The clear.cache("variable") command
can be run to flush individual items from the cache.
Calling cache() with no arguments returns the current status of the cache.
Value
No value is returned; this function is called for its side effects.
See Also
Examples
library('ProjectTemplate')
## Not run: create.project('tmp-project')
setwd('tmp-project')
dataset1 <- 1:5
cache('dataset1')
setwd('..')
unlink('tmp-project')
## End(Not run)
Translate a variable name into a file name for caching.
Description
This function will translate a variable name into a form that is suitable as a filename on most OS's.
Usage
cache.name(data.filename)
Arguments
data.filename
The variable name to be translated into a filename.
Value
A translated variable name.
Examples
library('ProjectTemplate')
## Not run: cache.name('example.1')
Cache a project's data sets in binary format.
Description
This function will cache all of the data sets that were loaded by
the load.project function in a binary format that is
easier to load quickly. This is particularly useful for data sets
that you've modified during a slow munging process that does not
need to be repeated.
Usage
cache.project()
Value
No value is returned; this function is called for its side effects.
See Also
create.project , load.project ,
get.project , show.project
Examples
library('ProjectTemplate')
## Not run: load.project()
cache.project()
## End(Not run)
Translate a file name into a valid R variable name.
Description
This function will translate a file name into a name that is a valid variable name in R. Non-alphabetic characters on the boundaries of the file name will be stripped; non-alphabetic characters inside of the file name will be replaced with dots.
Usage
clean.variable.name(variable.name, config = .load.config())
Arguments
variable.name
A character vector containing a variable's proposed name that should be standardized.
config
A list of configuration variables. Defaults to those loaded by load.project
Value
A translated variable name.
Examples
library('ProjectTemplate')
## Not run: clean.variable.name('example_1')
Clear objects from the global environment
Description
This function removes specific (or all by default) named objects from the global
environment. If used within a ProjectTemplate project, then any variables
defined in the config$sticky_variables will remain.
Usage
clear(..., keep = c(), force = FALSE)
Arguments
...
A sequence of character strings of the objects to
be removed from the global environment. If none given, then all items except
those in keep will be deleted. This includes items beginning with .
keep
A character vector of variables that should remain in the global environment
force
If TRUE, then variables will be deleted even if
specified in keep or config$sticky_variables
Value
The variables kept and removed are reported
Examples
library('ProjectTemplate')
## Not run:
clear("x", "y", "z")
clear(keep="a")
clear()
## End(Not run)
Clear data sets from the cache
Description
This function remove specific (or all by default) named data sets from the cache
directory. This will force that data to be read in from the data directory
next time load.project is called.
Usage
clear.cache(...)
Arguments
...
A sequence of character strings of the variables to be removed from the cache. If none given, then all items in the cache will be removed.
Value
Success or failure is reported
Examples
library('ProjectTemplate')
## Not run:
clear.cache("x", "y", "z")
## End(Not run)
Create a new project.
Description
This function will create all of the scaffolding for a new project.
It will set up all of the relevant directories and their initial
contents. For those who only want the minimal functionality, the
template argument can be set to minimal to create a subset of
ProjectTemplate's default directories. For those who want to dump
all of ProjectTemplate's functionality into a directory for extensive
customization, the dump argument can be set to TRUE.
Usage
create.project(
project.name = "new-project",
template = "full",
dump = FALSE,
merge.strategy = c("require.empty", "allow.non.conflict"),
rstudio.project = FALSE
)
Arguments
project.name
A character vector containing the name for this new project. Must be a valid directory name for your file system.
template
A character vector containing the name of the template to
use for this project. By default a full and minimal template
are provided, but custom templates can be created using
create.template.
dump
A boolean value indicating whether the entire functionality of ProjectTemplate should be written out to flat files in the current project.
merge.strategy
What should happen if the target directory exists and
is not empty?
If "force.empty", the target directory must be empty;
if "allow.non.conflict", the method succeeds if no files or
directories with the same name exist in the target directory.
rstudio.project
A boolean value indicating whether the project should
also be an 'RStudio Project'. Defaults to FALSE. If TRUE,
then a 'projectname.Rproj' with usable defaults is added to the ProjectTemplate
directory.
Details
If the target directory does not exist, it is created. Otherwise, it can only contain files and directories allowed by the merge strategy.
Value
No value is returned; this function is called for its side effects.
See Also
load.project , get.project ,
cache.project , show.project
Examples
library('ProjectTemplate')
## Not run: create.project('MyProject')
Create a new template
Description
This function writes a skeleton directory structure for creating your own custom templates.
Usage
create.template(target, source = "minimal")
Arguments
target
Name of the new template. It is created under the directory
specified by options('ProjectTemplate.templatedir'), or, when
missing, in the current directory.
source
Name of an existing template to copy, defaults to the built in 'minimal' template.
Show information about the current project.
Description
This function will return all of the information that ProjectTemplate has
about the current project. This information is gathered when
load.project is called. At present, ProjectTemplate keeps a
record of the project's configuration settings, all packages that were loaded
automatically and all of the data sets that were loaded automatically. The
information about autoloaded data sets is used by the
cache.project function.
Usage
get.project()
Details
In previous releases this information has been available through the
global variable project.info. Using this variable is now deprecated
and will result in a warning.
Value
A named list.
See Also
create.project , load.project ,
cache.project , show.project
Examples
library('ProjectTemplate')
## Not run: load.project()
get.project()
## End(Not run)
Listing the data for the current project
Description
This function produces a data.frame of all data files in the project, with
meta data on if and how the file will be loaded by load.project.
Usage
list.data(...)
Arguments
...
Named arguments to override configuration from
config/global.dcf and lib/global.R.
Details
The returned data.frame contains the following variables, with one
observation per file in data/:
filename Character variable containing the filename relative
to data/ directory.
varname Character variable containing the name of the variable
into which the file will be imported. *
is_ignored Logical variable that indicates whether the file.
is ignored through the data_ignore option in the configuration
is_directory Logical variable that indicates whether the file
is a directory.
is_cached Logical variable that indicates whether the file is
already available in the cache/ directory.
cached_only Logical variable that indicates whether the
variable is only available in the cache/ directory. This occurs
when calling the cache function with a code fragment in a munge script.
reader Character variable containing the name of the reader
function that will be used to load the data. Contains a
character(0) if no suitable reader was found.
* Note that some readers return more than one variable, usually with the
listed variable name as prefix. This is true for for example the
xls.reader and xlsx.reader.
Value
A data.frame listing the available data, with relevant meta data
See Also
load.project , show.project ,
project.config
Examples
library('ProjectTemplate')
## Not run: list.data()
Automatically load data and packages for a project.
Description
This function automatically load all of the data and packages used by
the project from which it is called. The behavior can be controlled by
adjusting the project.config configuration.
Usage
load.project(...)
Arguments
...
Named arguments to override configuration from config/global.dcf
and lib/global.R.
Details
... can take an argument override.config or a single named
list for backward compatibility. This cannot be mixed with the new style
override. When a named argument override.config is present it takes
precedence over the other options. If any of the provided arguments is
unnamed an error is raised.
Value
No value is returned; this function is called for its side effects.
See Also
create.project , get.project ,
cache.project , show.project , project.config
Examples
library('ProjectTemplate')
## Not run: load.project()
Load Project
Description
Call this function as an addin to load the library and run 'load.project()'
Usage
loadproject_addin()
Migrates a project from a previous version of ProjectTemplate
Description
This function automatically performs all necessary steps to migrate an existing project so that it is compatible with this version of ProjectTemplate
Usage
migrate.project()
Value
No value is returned; this function is called for its side effects.
See Also
Examples
library('ProjectTemplate')
## Not run: migrate.project()
Migrate a template to a new version of ProjectTemplate
Description
This function updates a skeleton project to the current version of ProjectTemplate.
Usage
migrate.template(template)
Arguments
template
Name of the template to upgrade.
Automatically read data into memory
Description
The preinstalled readers are automatically loaded in the list preinstalled.readers.
The reader functions will load a data set stored in the data directory into
the specified global variable binding. These functions are not meant to be called directly.
Usage
preinstalled.readers
arff.reader(data.file, filename, variable.name)
csv.reader(data.file, filename, variable.name)
csv2.reader(data.file, filename, variable.name)
db.reader(data.file, filename, variable.name)
dbf.reader(data.file, filename, variable.name)
epiinfo.reader(data.file, filename, variable.name)
feather.reader(data.file, filename, variable.name)
file.reader(data.file, filename, variable.name)
mp3.reader(data.file, filename, variable.name)
mtp.reader(data.file, filename, variable.name)
octave.reader(data.file, filename, variable.name)
ppm.reader(data.file, filename, variable.name)
r.reader(data.file, filename, variable.name)
rdata.reader(data.file, filename, variable.name)
rds.reader(data.file, filename, variable.name)
spss.reader(data.file, filename, variable.name)
sql.reader(data.file, filename, variable.name)
stata.reader(data.file, filename, variable.name)
systat.reader(data.file, filename, variable.name)
tsv.reader(data.file, filename, variable.name)
url.reader(data.file, filename, variable.name)
wsv.reader(data.file, filename, variable.name)
xls.reader(data.file, filename, workbook.name)
xlsx.reader(data.file, filename, workbook.name)
xport.reader(data.file, filename, variable.name)
Arguments
data.file
The name of the data file to be read.
filename
The path to the data set to be loaded.
variable.name
The name to be assigned to in the global environment.
Format
An object of class list of length 55.
Details
Some file formats can contain more than one dataset. In this case all datasets are loaded
into separate variables in the format <variable.name>.<subset.name>, where the
subset.name is determined by the reader automatically.
The sql.reader function will load data from a SQL database based on configuration
information found in the specified .sql file. The .sql file must specify
a database to be accessed. All tables from the database, one specific tables
or one specific query against any set of tables may be executed to generate
a data set.
queries can support string interpolation to execute code snippets using mustache syntax (http://mustache.github.io). This is used to create queries that depend on data from other sources. Code delimited is {{...}}
Example: query: SELECT * FROM my_table WHERE id IN ({{ids}}). Here ids is a vector previously loaded into the Global Environment through ProjectTemplate
Examples of the DCF format and settings used in a .sql file are shown below:
Example 1 type: mysql user: sample_user password: sample_password host: localhost dbname: sample_database table: sample_table
Example 2 type: mysql user: sample_user password: sample_password host: localhost port: 3306 socket: /Applications/MAMP/tmp/mysql/mysql.sock dbname: sample_database table: sample_table
Example 3 type: sqlite dbname: /path/to/sample_database table: sample_table
Example 4 type: sqlite dbname: /path/to/sample_database query: SELECT * FROM users WHERE user_active == 1
Example 5 type: sqlite dbname: /path/to/sample_database table: *
Example 6 type: postgres user: sample_user password: sample_password host: localhost dbname: sample_database table: sample_table
Example 7 type: odbc dsn: sample_dsn user: sample_user password: sample_password dbname: sample_database query: SELECT * FROM sample_table
Example 8 type: oracle user: sample_user password: sample_password dbname: sample_database table: sample_table
Example 9 type: jdbc class: oracle.jdbc.OracleDriver classpath: /path/to/ojdbc5.jar (or set in CLASSPATH) user: scott password: tiger url: jdbc:oracle:thin:@myhost:1521:orcl query: select * from emp
Example 10 type: heroku classpath: /path/to/jdbc4.jar (or set in CLASSPATH) user: scott password: tiger host: heroku.postgres.url port: 1234 dbname: herokudb query: select * from emp
Example 11 In this example RSQLite::initExtension() is automatically called on the established connection.
Liam Healy has written extension-functions.c, which is available on http://www.sqlite.org/contrib. It provides mathematical and string extension functions for SQL queries using the loadable extensions mechanism.
type: sqlite dbname: /path/to/sample_database plugin: extension query: SELECT *,STDEV(value1) FROM example_table
Value
No value is returned; the reader functions are called for its side effects.
Functions
-
arff.reader(): Read the Weka file format from files with the.arffextension. -
csv.reader(): Read a comma separated values file with the.csvextension. -
csv2.reader(): Read a semicolon separated values file with the.csv2extension.In May 2018, the default behavior of the reader for .csv2 files changed to use R's read.csv2(), where the field separator is assumed to be ';' and the decimal separator to be ','.
-
db.reader(): Read a SQlite3 database with a.dbfile extension.If you want to specify a single table or query to execute against the database, move it elsewhere and use a .sql file interpreted by
sql.reader. -
dbf.reader(): Read an XBASE file with a.dbffile extension. -
epiinfo.reader(): Read an Epi Info file with a .rec file extension. -
feather.reader(): Read a feather file in Apache Arrow format with a.featherfile extension. -
file.reader(): Read an arbitrary file described in a.filefile.A
.filefile must contain DCF that specifies the path to the data set and which extension should be used from the dispatch table to load the data set.Examples of the DCF format and settings used in a .file file are shown below:
path: http://www.johnmyleswhite.com/ProjectTemplate/sample_data.csv extension: csv
-
mp3.reader(): Read an MP3 file with a.mp3file extension.This function will load the specified MP3 file into memory using the tuneR package. This is useful for working with music files as a data set.
-
mtp.reader(): Read a Minitab Portable Worksheet with a.mtp3file extension. -
octave.reader(): Read an Octave file with a.mfile extension.This function will load the specified Octave file into memory using the
foreign::read.octavefunction. -
ppm.reader(): Read a PPM file with a.ppmfile extension.Data is loaded using the
pixmap::read.pnmfunction. -
r.reader(): Read an R source file with a.Rfile extension.This function will call source on the specified R file, executing the code inside of it as a way of generating data sets dynamically, as in many Monte Carlo applications.
-
rdata.reader(): Read an RData file with a.rdataor.rdafile extension.This function will load the specified RData file into memory using the
loadfunction. This may generate many data sets simultaneously. -
rds.reader(): Read the RDS file format from files with the.rdsextension. -
spss.reader(): Read an SPSS file with a.savfile extension.This function will load the specified SPSS file into memory. It will convert the resulting list object into a data frame before inserting the data set into the global environment.
-
sql.reader(): Read a database described in a.sqlfile. -
stata.reader(): Read a Stata file with a.statafile extension. -
systat.reader(): Read a Systat file with a.sysor.sydfile extension. -
tsv.reader(): Read a tab separated values file with the.tsvor.tabfile extensions. -
url.reader(): Read a remote file described in a.urlfile.This function will load data from a remote source accessible through HTTP or FTP based on configuration information found in the specified .url file. The
.urlfile must specify the URL of the remote data source and the type of data that is available remotely. Only one data source per.urlfile is supported currently.Examples of the DCF format and settings used in a .url file are shown below:
Example 1 url: http://www.johnmyleswhite.com/ProjectTemplate/sample_data.csv separator: ,
-
wsv.reader(): Read a whitespace separated values file with the.wsvor.txtfile extensions. -
xls.reader(): Read an Excel file with a.xlsfile extension.This function will load the specified Excel file into memory using the
readxlpackage. -
xlsx.reader(): Read an Excel 2007 file with a.xlsxfile extension.This function will load the specified Excel file into memory using the
readxlpackage. -
xport.reader(): Read an XPort file with a.xportfile extension.
See Also
ProjectTemplate Configuration file
Description
Every ProjectTemplate project has a configuration file found at
config/global.dcf that contains various options that can be tweaked
to control runtime behavior. The valid options are shown below, and must
be encoded using the DCF format.
Usage
project.config()
Details
Calling the project.config() function will display the current project
configuration.
The options that can be configured in the config/global.dcf are
shown below
data_loading This can be set to TRUE or FALSE. If data_loading is on,
the system will load data from both the cache and data directories with
cache taking precedence in the case of name conflict.
data_loading_header This can be set to TRUE or FALSE. If data_loading_header is on,
the system will load text data files, such as CSV, TSV, or XLSX, treating the first row as header.
data_ignore A comma separated list of files to be ignored when importing
from the data/ directory. Regular expressions can be used but should be delimited
(on both sides) by /. Note that filenames and filepaths should never begin with
a /, entire directories under data/ can be ignored by adding a trailing /.
cache_loading This can be set to TRUE or FALSE. If cache_loading is on,
the system will load data from the cache directory before any attempt to load
from the data directory.
recursive_loading This can be set to TRUE or FALSE. If recursive_loading
is on, the system will load data from the data directory and all its sub
directories recursively.
munging This can be set to TRUE or FALSE. If munging is on, the system
will execute the files in the munge directory sequentially using the order
implied by the sort() function. If munging is FALSE, none of the files in the
munge directory will be executed.
logging This can be set to TRUE or FALSE. If logging is on, a logger
object using the log4r package is automatically created when you run
load.project(). This logger will write to the logs directory.
logging_level The value of logging_level is passed to a logger object
using the log4r package during logging when when you run load.project().
load_libraries This can be set to TRUE or FALSE. If load_libraries is on,
the system will load all of the R packages listed in the libraries field
described below.
libraries This is a comma separated list of all the R packages that the user
wants to automatically load when load.project() is called. These packages must
already be installed before calling load.project().
as_factors This can be set to TRUE or FALSE. If as_factors is on, the system
will convert every character vector into a factor when creating data frames; most
importantly, this automatic conversion occurs when reading in data automatically.
If FALSE, character vectors will remain character vectors.
tables_type This is the format for default tables. Values can be 'tibble' (default),
'data_table', or 'data_frame'
attach_internal_libraries This can be set to TRUE or FALSE. If
attach_internal_libraries is on, then every time a new package is loaded into memory
during load.project() a warning will be displayed informing that has happened.
cache_loaded_data This can be set to TRUE or FALSE. If cache_loaded_data is
on, then data loaded from the data directory during load.project() will be
automatically cached (so it won't need to be reloaded next time load.project()
is called).
sticky_variables This is a comma separated list of any project-specific
variables that should remain in the global environment after a clear() command.
This can be used to clear the global environment, but keep any large datasets in
place so they are not unnecessarily re-generated during load.project().
Note that any this will be over-ridden if the force=TRUE parameter is passed
to clear()`.
underscore_variables This can be set to TRUE to use
underscores ('_') in variable names or FALSE to replace underscores
('_') with dots ('.'). The default is TRUE. When migrating old
projects, underscore_variables is set to FALSE.
cache_file_format The default file format for cached data is
'RData'. This can be set to 'qs' in order to benefit from the quick
serialization of R objects provided by qs.
If the config/globals.dcf is missing some items (for example because it was created under an
old version of ProjectTemplate, then the following configuration is used for any missing items
during load.project():
data_loading TRUE
data_loading_header TRUE
data_ignore
cache_loading TRUE
recursive_loading FALSE
munging TRUE
logging FALSE
logging_level INFO
load_libraries FALSE
libraries reshape2, plyr, tidyverse, stringr, lubridate
as_factors FALSE
tables_type tibble
attach_internal_libraries TRUE
cache_loaded_data FALSE
sticky_variables NONE
underscore_variables FALSE
cache_file_format RData
When a new project is created using create.project(), the following values are pre-populated:
version 0.11.1
data_loading TRUE
data_loading_header TRUE
data_ignore
cache_loading TRUE
recursive_loading FALSE
munging TRUE
logging FALSE
logging_level INFO
load_libraries FALSE
libraries reshape2, plyr, tidyverse, stringr, lubridate
as_factors FALSE
tables_type tibble
attach_internal_libraries FALSE
cache_loaded_data TRUE
sticky_variables NONE
underscore_variables TRUE
cache_file_format RData
Value
The current project configuration is displayed.
See Also
Reload or reset a project
Description
This function will clear the global environment and reload a project. This is
useful when you've updated your data sets or changed your preprocessing scripts.
Any sticky_variables configuration parameter in project.config
will remain both in memory and (if present) in the cache by default. If the reset
parameter is TRUE, then all variables are cleared from both the global
environment and the cache.
Usage
reload.project(..., reset = FALSE)
Arguments
...
Optional parameters passed to load.project
reset
A boolean value, which if set TRUE clears the cache and everything
in the global environment, including any sticky_variables
Value
No value is returned; this function is called for its side effects.
Examples
library('ProjectTemplate')
## Not run: load.project()
reload.project()
## End(Not run)
Reload Project
Description
Call this function as an addin to load the library and run 'reload.project()'
Usage
reloadproject_addin()
Require a package for use in the project
Description
This functions will require the given package. If the package is not installed it will stop execution and print a message to the user instructing them which package to install and which function caused the error.
Usage
require.package(package.name, attach = TRUE)
Arguments
package.name
A character vector containing the package name. Must be a valid package name installed on the system.
attach
Should the package be attached to the search path (as with
library ) or not (as with loadNamespace )?
Defaults to TRUE. (Internal code will use FALSE by default
unless a compatibility switch is set, see below.)
Details
The function .require.package is called by internal code. It will
attach the package to the search path (with a warning) only if the
compatibility configuration attach_internal_libraries is set to
TRUE. Normally, packages used for loading data are not
needed on the search path, but not loading them might break existing code.
In a forthcoming version this compatibility setting will be removed,
and no packages will be attached to the search path by internal code.
Value
No value is returned; this function is called for its side effects.
Examples
library('ProjectTemplate')
## Not run: require.package('PackageName')
Run all of the analyses in the src directory.
Description
This function will run each of the analyses in the src
directory in separate processes. At present, this is done serially, but
future versions of this function will provide a means of running
the analyses in parallel.
Usage
run.project()
Value
No value is returned; this function is called for its side effects.
Examples
library('ProjectTemplate')
## Not run: run.project()
Show information about the current project.
Description
This function will show the user all of the information that
ProjectTemplate has about the current project. This information is
gathered when load.project is called. At present,
ProjectTemplate keeps a record of the project's configuration settings,
all packages that were loaded automatically and all of the data sets that
were loaded automatically. The information about autoloaded data sets
is used by the cache.project function.
Usage
show.project()
Value
No value is returned; this function is called for its side effects.
See Also
create.project , load.project ,
get.project , cache.project
Examples
library('ProjectTemplate')
## Not run: load.project()
show.project()
## End(Not run)
Generate unit tests for your helper functions.
Description
This function will parse all of the functions defined in files inside
of the lib directory and will generate a trivial unit test for
each function. The resulting tests are stored in the file
tests/autogenerated.R. Every test is excepted to fail by default,
so you should edit them before calling test.project .
Usage
stub.tests()
Value
No value is returned; this function is called for its side effects.
Examples
library('ProjectTemplate')
## Not run: stub.tests()
Run all unit tests for this project.
Description
This function will run all of the testthat style unit tests
for the current project that are defined inside of the tests
directory. The tests will be run in the order defined by the filenames
for the tests: it is recommend that each test begin with a number
specifying its position in the sequence.
Usage
test.project()
Value
No value is returned; this function is called for its side effects.
Examples
library('ProjectTemplate')
## Not run: load.project()
test.project()
## End(Not run)
Read a DCF file into an R list.
Description
This function will read a DCF file and translate the resulting data frame into a list. The DCF format is used throughout ProjectTemplate for configuration settings and ad hoc file format specifications.
Usage
translate.dcf(filename)
Arguments
filename
A character vector specifying the DCF file to be translated.
Details
The content of the DCF file are stored as character strings. If the content is placed between the back tick character , then the content is evaluated as R code and the result returned in a string
Value
Returns a list containing the entries from the DCF file.
Examples
library('ProjectTemplate')
## Not run: translate.dcf(file.path('config', 'global.dcf'))