Value matching in R

Asked 10 years, 3 months ago

Viewed 70 times

\$\begingroup\$

I have three data frames A, B and C:

set.seed(0)
N <- 5
A<-data.frame(cbind(date=c(2,3,5,1), x=NA, id=sample(letters[1:2], 4, replace=T)), stringsAsFactors = F)
B<-data.frame(cbind(date=1:N, y=runif(N)), stringsAsFactors = F)
C<-data.frame(cbind(date=1:N, z=100+sample(N), id=rep(letters[1:2], N, replace=T)), stringsAsFactors = F)
C$z<-as.numeric(C$z)

and they look like this:

A
B
C
> A
 date x id
1 2 <NA> b
2 3 <NA> a
3 5 <NA> a
4 1 <NA> b
> B
 date y
1 1 0.9082
2 2 0.2017
3 3 0.8984
4 4 0.9447
5 5 0.6608
> C
 date z id
1 1 104 a
2 2 101 b
3 3 105 a
4 4 103 b
5 5 102 a
6 1 104 b
7 2 101 a
8 3 105 b
9 4 103 a
10 5 102 b

I would like to fill in A$x with a function of y and z, let's say, for instance, the product of B$y*C$z for the corresponding dates and ids, like this:

for (i in 1:length(A$x)){
 A$x[i] <- B$y[A$date[i] == B$date] * C$z[A$date[i] == C$date & A$id[i] == C$id]
}
> A
 date x id
1 2 20.369875034783 b
2 3 94.3309169216082 a
3 5 67.4013748336583 a
4 1 94.4536101594567 b

This is a very bad idea for a data set with many elements (obviously), as it is slow. I also tried with match() and which(), but there isn't any significant speed up, I believe. Maybe I could use dcast(), after merging everything into one data frame, but I would prefer not to merge the data frames at all (if this can be avoided).

Is it possible to do it more efficiently?

edited Jun 15, 2015 at 20:53

Jamal's user avatar

Jamal

35.2k13 gold badges134 silver badges238 bronze badges

asked Jun 15, 2015 at 19:56

Per's user avatar

Per Per

1533 bronze badges

\$\endgroup\$

Add a comment |

1 Answer 1

Sorted by: Reset to default

\$\begingroup\$

Although you mention that you don't want to merge the data frames, your for loop could be replaced with this single line using merge:

with(merge(A, merge(B, C)), data.frame(date, x=y * z, id))

Given your example of A, B, C, this is returns a data frame:

 date x id
1 1 94.45361 b
2 2 20.36988 b
3 3 94.33092 a
4 5 67.40137 a

The problem with the for loop is that it's discouraged in R because it's inefficient. Using merge should be fast. I don't think you can get around merging, as the meaning of your logic is in fact merging.

answered Jun 15, 2015 at 22:54

janos's user avatar

janos janos

113k15 gold badges154 silver badges396 bronze badges

\$\endgroup\$

Add a comment |

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-r

Stack Exchange Network

Value matching in R

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Value matching in R

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions