Is there a one-liner that can accomplish this same thing? Essentially I have two dataframes: one that has my raw data (data
), and one that has values that I want to lookup (lookup
). I want to find the percent of values in data
(by 2 factors, Site
and variable
) that are less than the lookup
value for those same 2 factors.
I'd love to just be able add the result to the lookup
dataframe as a column, instead of creating two new objects.
combine <- merge(data, lookup, by = c("Site", "variable"), allow.cartesian = TRUE)
test <- ddply(combine, .(Site, variable), summarize, percent = sum(data.values < lookup.value) / length(data.values))
Any advice is appreciated!
1 Answer 1
If you want to encapsulate the ddply
logic, you can use the lapply(split(data, data[ , c("Site", "variable") ], function)
-approach. I'm assuming that lookup only has one "value" for each combination of "Site" and "variable". Something like:
lapply( split(data, data[ , c("Site", "variable") ]),
function(d) { percent= 100*sum(d$value)/ lookup[
lookup$Site==d$Site[1] lookup$variable==d$variable[1] ,
"value"]}
)
Another approach (only successful if the factor levels in both dataframes match) would be to use the quotient of two tapply
-(matrix) results from each dataframes:
percent= 100*tapply( data$value, data[ , c("Site", "variable")] , sum, na.rm=TRUE)/
tapply( lookup$value, lookup[ , c("Site", "variable")] , sum, na.rm=TRUE)