I'm trying to do something very simple in R that I can do in Stata but I can't quite get it right.
Here is my sample of my data
data<-data.frame(
C1=c(rep(2,5), rep(20,5), rep(70,5)),
C2=c(rep(20,5), rep(70,5), rep(80,5)),
year=rep(1990:1994, 3),
VAR1=NA,
VAR2=NA,
VAR3=NA
)
in Stata I can do this
replace VAR1=1 if CC1=2 & CC2==20 & year == 1990
replace VAR2=60 if CC1=2 & CC2==20 & year == 1990
replace VAR3=70 if CC1=2 & CC2==20 & year == 1990
annoyingly Stata syntax does not allow
replace VAR1=1 & VAR2=60 & VAR3=70 if CC1=2 & CC2==20 & year == 1990
using the first Stata code
this
data1<-data.frame(C1=c(2),C2=c(20),year=c(1990),VAR1=NA,VAR2=NA,VAR3=NA)
becomes this
data2<-data.frame(C1=c(2),C2=c(20),year=c(1990),VAR1=c(1),VAR2=c(60),VAR3=c(70))
I can't find anything similar to this problem (it's very likely that I'm not asking/looking for the right phrase)
I'd like to do either the 1st but preferably the 2nd Stata command in R.
2 Answers 2
If your condition is going to remain the same for all the columns you can calculate them once to get indices in different column and assign the values together.
inds <- with(data, C1 == 2 & C2 == 20 & year == 1990)
data[inds, paste0("VAR", 1:3)] <- as.list(c(1, 60, 70))
data
# C1 C2 year VAR1 VAR2 VAR3
#1 2 20 1990 1 60 70
#2 2 20 1991 NA NA NA
#3 2 20 1992 NA NA NA
#4 2 20 1993 NA NA NA
#5 2 20 1994 NA NA NA
#6 20 70 1990 NA NA NA
#7 20 70 1991 NA NA NA
#8 20 70 1992 NA NA NA
#9 20 70 1993 NA NA NA
#10 20 70 1994 NA NA NA
#11 70 80 1990 NA NA NA
#12 70 80 1991 NA NA NA
#13 70 80 1992 NA NA NA
#14 70 80 1993 NA NA NA
#15 70 80 1994 NA NA NA
If you might have different conditions for different columns you can have a look at dplyr package which makes it easier such replacement using pipes
library(dplyr)
data %>%
mutate(VAR1 = replace(VAR1, C1 == 2 & C2 == 20 & year == 1990, 1),
VAR2 = replace(VAR2, C1 == 2 & C2 == 20 & year == 1990, 60),
VAR3 = replace(VAR3, C1 == 2 & C2 == 20 & year == 1990, 70))
Comments
Here is one option using data.table
library(data.table)
nm1 <- grep("VAR", names(data))
setDT(data)[C1 == 2 & C2 == 20 & year == 1990, (nm1) := .(1, 60, 70)]
data
# C1 C2 year VAR1 VAR2 VAR3
# 1: 2 20 1990 1 60 70
# 2: 2 20 1991 NA NA NA
# 3: 2 20 1992 NA NA NA
# 4: 2 20 1993 NA NA NA
# 5: 2 20 1994 NA NA NA
# 6: 20 70 1990 NA NA NA
# 7: 20 70 1991 NA NA NA
# 8: 20 70 1992 NA NA NA
# 9: 20 70 1993 NA NA NA
#10: 20 70 1994 NA NA NA
#11: 70 80 1990 NA NA NA
#12: 70 80 1991 NA NA NA
#13: 70 80 1992 NA NA NA
#14: 70 80 1993 NA NA NA
#15: 70 80 1994 NA NA NA
Or another option is to set the key while creating the data.table and then specify the i with the values
setDT(data, key = c("C1", "C2", "year"))
data[.(2, 20, 1990), (nm1) := .(1, 60, 70)]
Or using tidyverse
library(tidyverse)
i1 <- with(data, C1 == 2 & C2 == 20 & year == 1990)
data %>%
select(starts_with("VAR")) %>%
map2_df(., c(1, 60, 70), ~ replace(.x, i1, .y)) %>%
bind_cols(data %>%
select(1:3), .)
data
data <- structure(list(C1 = c(2, 2, 2, 2, 2, 20, 20, 20, 20, 20, 70,
70, 70, 70, 70), C2 = c(20, 20, 20, 20, 20, 70, 70, 70, 70, 70,
80, 80, 80, 80, 80), year = c(1990L, 1991L, 1992L, 1993L, 1994L,
1990L, 1991L, 1992L, 1993L, 1994L, 1990L, 1991L, 1992L, 1993L,
1994L), VAR1 = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_), VAR2 = c(NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_), VAR3 = c(NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_)),
class = "data.frame", row.names = c(NA,
-15L))
&in two quite different senses,&is a logical operator, not punctuation in a list of things to be done.