Background
This routine is used in a package that calculates tree (as in Christmas tree) volumes for various species codes (spcd) and geographic regions. The equation forms and coefficients vary by species and region, so I have a dataframe of functions along with their respective species and region that calculate the volume based off of the height (ht) of the tree and diameter (dbh).
Data Setup
Note: In my package, this part is taken care of by other functions, this is just to create a reproducible example (please ignore the sloppyness)
I have a data frame that includes a column of functions, along with some information about "where" to apply those functions in another data frame.
The functions (in reality these are more complex):
func1 <- function(dbh,ht){dbh^2 + ht}
func2 <- function(dbh,ht){dbh^2 - ht}
The data frame (in reality this data frame is much longer):
spcd <- c(122, 122, 141, 141)
region <- c('OR_W', 'OR_E', 'OR_W', 'OR_E')
funcs_df <- data.frame(spcd, region, funcs)
funcs_df$funcs <- c("func1", "func2", "func1", "func2")
Then, I have another frame that has some information, including the spcd
and region
that should match the values in func_df
:
spcd <- c(122, 141, 141, 122, 141, 122)
region <- c('OR_W', 'OR_E', 'OR_W', 'OR_E', 'OR_W', 'OR_W')
dbh <- c(12, 13, 15, 11, 10, 21)
ht <- c(101, 121, 100, 99, 88, 76)
tree_df <- data.frame(spcd, region, dbh, ht)
Applying the Functions
This is the part I would prefer feedback on.
First, I split the tree_df
into distinct groups based on spcd and region so I can apply the functions that correspond to these distinct groups.
tree_split <- split(tree_df, list(tree_df$region, tree_df$spcd))
Then, I create an empty data frame to append to.
new_tree <- data.frame()
Next, (and this is where things get messy) I loop through each group, grab the top left cell that acts as a "key" to get the equation from the func_df
and use mapply
on each group (with some conditionals to handle NA values).
for (group in tree_split) {
# Get the 'group key'
region <- group$region[1]
spcd <- group$spcd[1]
# Get the equation from eqs
eq <- funcs_df$funcs[which((funcs_df$spcd == spcd & funcs_df$region ==
region))]
# Convert func string into actual function
eq <- eq[[1]]
eq <- eval(parse(text=eq))
# Apply the equation to each record in the group
group$cvts <- mapply(eq, group$dbh, group$ht)
# Append to new_tree
new_tree <- rbind(new_tree ,group)
}
Discussion
This results in the desired output with the new cvts outputs according to each function defined in the dataframe:
spcd region dbh ht cvts
4 122 OR_E 11 99 22
1 122 OR_W 12 101 245
6 122 OR_W 21 76 517
2 141 OR_E 13 121 48
3 141 OR_W 15 100 325
5 141 OR_W 10 88 188
I have a few concerns with this approach:
The old adage "if you write a for-loop you are doing it wrong" seems to apply here. Is there some way I could reduce this for-loop to some sort of
apply
ormapply
type function?Grabbing the key from a cell (see "# Get the 'group key'" comment above) seems sloppy. Is there a way to get this 'group key' in a more formal fashion?
Other advice is, of course, welcome.
-
\$\begingroup\$ Can you confirm if your functions are all vectorized, like it is the case in your reproducible example? \$\endgroup\$flodel– flodel2018年01月06日 12:01:52 +00:00Commented Jan 6, 2018 at 12:01
-
\$\begingroup\$ @flodel that is the case. \$\endgroup\$Bryce Frank– Bryce Frank2018年01月06日 16:35:53 +00:00Commented Jan 6, 2018 at 16:35
-
\$\begingroup\$ did you get a chance to review and test my answer? \$\endgroup\$flodel– flodel2018年01月09日 23:13:43 +00:00Commented Jan 9, 2018 at 23:13
-
\$\begingroup\$ @flodel I will try to get a chance to review it sometime this week \$\endgroup\$Bryce Frank– Bryce Frank2018年01月10日 17:10:24 +00:00Commented Jan 10, 2018 at 17:10
2 Answers 2
I would suggest you merge your data.frames so you can split the data per function:
mg <- merge(funcs_df, tree_df)
sp <- split(mg, mg$funcs)
print(sp)
# $func1
# spcd region funcs dbh ht
# 2 122 OR_W func1 12 101
# 3 122 OR_W func1 21 76
# 5 141 OR_W func1 15 100
# 6 141 OR_W func1 10 88
#
# $func2
# spcd region funcs dbh ht
# 1 122 OR_E func2 11 99
# 4 141 OR_E func2 13 121
Then, you just have to call each function once (since you confirmed the functions are vectorized):
cvts_list <- Map(function(fun, x) fun(x$dbh, x$ht),
fun = mget(names(sp)), x = sp)
print(cvts_list)
# $func1
# [1] 245 517 325 188
#
# $func2
# [1] 22 48
and stack the results into a new column, using unsplit
:
mg$cvts <- unsplit(cvts_list, mg$funcs)
print(mg)
# spcd region funcs dbh ht cvts
# 1 122 OR_E func2 11 99 22
# 2 122 OR_W func1 12 101 245
# 3 122 OR_W func1 21 76 517
# 4 141 OR_E func2 13 121 48
# 5 141 OR_W func1 15 100 325
# 6 141 OR_W func1 10 88 188
Consider merging the two dataframes then use by
, the method designed to split a dataframe by one or more factors. As the object-oriented wrapper to tapply
, by
tends to be a more streamlined handler than split...lapply
or split ... for
since you can attach a function directly to pass subsetted dataframes into it.
Then, take the list of dataframes returned from by
and bind them with a do.call
instead of initalizing an empty dataframe and iteratively expanding it in a loop.
merged_df <- merge(funcs_df, tree_df, by=c("spcd", "region"))
process_func <- function(df) {
# Get the equation from eqs
eq <- df$funcs[[1]]
# Convert func string into actual function
eq <- eval(parse(text=eq))
# Apply the equation to each record in the group
df$cvts <- mapply(eq, df$dbh, df$ht)
return(df)
}
df_list <- by(merged_df, list(merged_df$region, merged_df$spcd), FUN=process_func)
finaldf <- do.call(rbind, df_list)