2
\$\begingroup\$

Background

This routine is used in a package that calculates tree (as in Christmas tree) volumes for various species codes (spcd) and geographic regions. The equation forms and coefficients vary by species and region, so I have a dataframe of functions along with their respective species and region that calculate the volume based off of the height (ht) of the tree and diameter (dbh).

Data Setup

Note: In my package, this part is taken care of by other functions, this is just to create a reproducible example (please ignore the sloppyness)

I have a data frame that includes a column of functions, along with some information about "where" to apply those functions in another data frame.

The functions (in reality these are more complex):

func1 <- function(dbh,ht){dbh^2 + ht}
func2 <- function(dbh,ht){dbh^2 - ht} 

The data frame (in reality this data frame is much longer):

spcd <- c(122, 122, 141, 141)
region <- c('OR_W', 'OR_E', 'OR_W', 'OR_E')
funcs_df <- data.frame(spcd, region, funcs)
funcs_df$funcs <- c("func1", "func2", "func1", "func2")

Then, I have another frame that has some information, including the spcd and region that should match the values in func_df:

spcd <- c(122, 141, 141, 122, 141, 122)
region <- c('OR_W', 'OR_E', 'OR_W', 'OR_E', 'OR_W', 'OR_W')
dbh <- c(12, 13, 15, 11, 10, 21)
ht <- c(101, 121, 100, 99, 88, 76)
tree_df <- data.frame(spcd, region, dbh, ht)

Applying the Functions

This is the part I would prefer feedback on.

First, I split the tree_df into distinct groups based on spcd and region so I can apply the functions that correspond to these distinct groups.

tree_split <- split(tree_df, list(tree_df$region, tree_df$spcd))

Then, I create an empty data frame to append to.

new_tree <- data.frame()

Next, (and this is where things get messy) I loop through each group, grab the top left cell that acts as a "key" to get the equation from the func_df and use mapply on each group (with some conditionals to handle NA values).

for (group in tree_split) {
 # Get the 'group key'
 region <- group$region[1]
 spcd <- group$spcd[1]
 # Get the equation from eqs
 eq <- funcs_df$funcs[which((funcs_df$spcd == spcd & funcs_df$region == 
 region))]
 # Convert func string into actual function
 eq <- eq[[1]]
 eq <- eval(parse(text=eq))
 # Apply the equation to each record in the group
 group$cvts <- mapply(eq, group$dbh, group$ht)
 # Append to new_tree
 new_tree <- rbind(new_tree ,group)
}

Discussion

This results in the desired output with the new cvts outputs according to each function defined in the dataframe:

 spcd region dbh ht cvts
4 122 OR_E 11 99 22
1 122 OR_W 12 101 245
6 122 OR_W 21 76 517
2 141 OR_E 13 121 48
3 141 OR_W 15 100 325
5 141 OR_W 10 88 188

I have a few concerns with this approach:

  1. The old adage "if you write a for-loop you are doing it wrong" seems to apply here. Is there some way I could reduce this for-loop to some sort of apply or mapply type function?

  2. Grabbing the key from a cell (see "# Get the 'group key'" comment above) seems sloppy. Is there a way to get this 'group key' in a more formal fashion?

Other advice is, of course, welcome.

asked Dec 25, 2017 at 1:38
\$\endgroup\$
4
  • \$\begingroup\$ Can you confirm if your functions are all vectorized, like it is the case in your reproducible example? \$\endgroup\$ Commented Jan 6, 2018 at 12:01
  • \$\begingroup\$ @flodel that is the case. \$\endgroup\$ Commented Jan 6, 2018 at 16:35
  • \$\begingroup\$ did you get a chance to review and test my answer? \$\endgroup\$ Commented Jan 9, 2018 at 23:13
  • \$\begingroup\$ @flodel I will try to get a chance to review it sometime this week \$\endgroup\$ Commented Jan 10, 2018 at 17:10

2 Answers 2

3
\$\begingroup\$

I would suggest you merge your data.frames so you can split the data per function:

mg <- merge(funcs_df, tree_df)
sp <- split(mg, mg$funcs)
print(sp)
# $func1
# spcd region funcs dbh ht
# 2 122 OR_W func1 12 101
# 3 122 OR_W func1 21 76
# 5 141 OR_W func1 15 100
# 6 141 OR_W func1 10 88
# 
# $func2
# spcd region funcs dbh ht
# 1 122 OR_E func2 11 99
# 4 141 OR_E func2 13 121

Then, you just have to call each function once (since you confirmed the functions are vectorized):

cvts_list <- Map(function(fun, x) fun(x$dbh, x$ht),
 fun = mget(names(sp)), x = sp)
print(cvts_list)
# $func1
# [1] 245 517 325 188
# 
# $func2
# [1] 22 48

and stack the results into a new column, using unsplit:

mg$cvts <- unsplit(cvts_list, mg$funcs)
print(mg)
# spcd region funcs dbh ht cvts
# 1 122 OR_E func2 11 99 22
# 2 122 OR_W func1 12 101 245
# 3 122 OR_W func1 21 76 517
# 4 141 OR_E func2 13 121 48
# 5 141 OR_W func1 15 100 325
# 6 141 OR_W func1 10 88 188
answered Jan 6, 2018 at 18:36
\$\endgroup\$
0
\$\begingroup\$

Consider merging the two dataframes then use by, the method designed to split a dataframe by one or more factors. As the object-oriented wrapper to tapply, by tends to be a more streamlined handler than split...lapply or split ... for since you can attach a function directly to pass subsetted dataframes into it.

Then, take the list of dataframes returned from by and bind them with a do.call instead of initalizing an empty dataframe and iteratively expanding it in a loop.

merged_df <- merge(funcs_df, tree_df, by=c("spcd", "region"))
process_func <- function(df) {
 # Get the equation from eqs
 eq <- df$funcs[[1]]
 # Convert func string into actual function
 eq <- eval(parse(text=eq))
 # Apply the equation to each record in the group
 df$cvts <- mapply(eq, df$dbh, df$ht)
 return(df)
}
df_list <- by(merged_df, list(merged_df$region, merged_df$spcd), FUN=process_func)
finaldf <- do.call(rbind, df_list)
answered Jan 24, 2018 at 21:55
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.