Stata to R Replace Values based on Condition

Question 1

I'm trying to do something very simple in R that I can do in Stata but I can't quite get it right.

Here is my sample of my data

data<-data.frame(
 C1=c(rep(2,5), rep(20,5), rep(70,5)),
 C2=c(rep(20,5), rep(70,5), rep(80,5)),
 year=rep(1990:1994, 3), 
 VAR1=NA,
 VAR2=NA,
 VAR3=NA
)

in Stata I can do this

replace VAR1=1 if CC1=2 & CC2==20 & year == 1990
replace VAR2=60 if CC1=2 & CC2==20 & year == 1990
replace VAR3=70 if CC1=2 & CC2==20 & year == 1990

annoyingly Stata syntax does not allow

replace VAR1=1 & VAR2=60 & VAR3=70 if CC1=2 & CC2==20 & year == 1990

using the first Stata code

this

data1<-data.frame(C1=c(2),C2=c(20),year=c(1990),VAR1=NA,VAR2=NA,VAR3=NA)

becomes this

data2<-data.frame(C1=c(2),C2=c(20),year=c(1990),VAR1=c(1),VAR2=c(60),VAR3=c(70))

I can't find anything similar to this problem (it's very likely that I'm not asking/looking for the right phrase)

I'd like to do either the 1st but preferably the 2nd Stata command in R.

Question 2

The Stata syntax you want uses & in two quite different senses, & is a logical operator, not punctuation in a list of things to be done.

Question 3

If your condition is going to remain the same for all the columns you can calculate them once to get indices in different column and assign the values together.

inds <- with(data, C1 == 2 & C2 == 20 & year == 1990)
data[inds, paste0("VAR", 1:3)] <- as.list(c(1, 60, 70))
data
# C1 C2 year VAR1 VAR2 VAR3
#1 2 20 1990 1 60 70
#2 2 20 1991 NA NA NA
#3 2 20 1992 NA NA NA
#4 2 20 1993 NA NA NA
#5 2 20 1994 NA NA NA
#6 20 70 1990 NA NA NA
#7 20 70 1991 NA NA NA
#8 20 70 1992 NA NA NA
#9 20 70 1993 NA NA NA
#10 20 70 1994 NA NA NA
#11 70 80 1990 NA NA NA
#12 70 80 1991 NA NA NA
#13 70 80 1992 NA NA NA
#14 70 80 1993 NA NA NA
#15 70 80 1994 NA NA NA

If you might have different conditions for different columns you can have a look at dplyr package which makes it easier such replacement using pipes

library(dplyr)
data %>%
 mutate(VAR1 = replace(VAR1, C1 == 2 & C2 == 20 & year == 1990, 1), 
 VAR2 = replace(VAR2, C1 == 2 & C2 == 20 & year == 1990, 60), 
 VAR3 = replace(VAR3, C1 == 2 & C2 == 20 & year == 1990, 70))

Question 4

Here is one option using data.table

library(data.table)
nm1 <- grep("VAR", names(data))
setDT(data)[C1 == 2 & C2 == 20 & year == 1990, (nm1) := .(1, 60, 70)]
data
# C1 C2 year VAR1 VAR2 VAR3
# 1: 2 20 1990 1 60 70
# 2: 2 20 1991 NA NA NA
# 3: 2 20 1992 NA NA NA
# 4: 2 20 1993 NA NA NA
# 5: 2 20 1994 NA NA NA
# 6: 20 70 1990 NA NA NA
# 7: 20 70 1991 NA NA NA
# 8: 20 70 1992 NA NA NA
# 9: 20 70 1993 NA NA NA
#10: 20 70 1994 NA NA NA
#11: 70 80 1990 NA NA NA
#12: 70 80 1991 NA NA NA
#13: 70 80 1992 NA NA NA
#14: 70 80 1993 NA NA NA
#15: 70 80 1994 NA NA NA

Or another option is to set the key while creating the data.table and then specify the i with the values

setDT(data, key = c("C1", "C2", "year"))
data[.(2, 20, 1990), (nm1) := .(1, 60, 70)]

Or using tidyverse

library(tidyverse)
i1 <- with(data, C1 == 2 & C2 == 20 & year == 1990)
data %>% 
 select(starts_with("VAR")) %>%
 map2_df(., c(1, 60, 70), ~ replace(.x, i1, .y)) %>%
 bind_cols(data %>% 
 select(1:3), .)

data

data <- structure(list(C1 = c(2, 2, 2, 2, 2, 20, 20, 20, 20, 20, 70, 
70, 70, 70, 70), C2 = c(20, 20, 20, 20, 20, 70, 70, 70, 70, 70, 
80, 80, 80, 80, 80), year = c(1990L, 1991L, 1992L, 1993L, 1994L, 
1990L, 1991L, 1992L, 1993L, 1994L, 1990L, 1991L, 1992L, 1993L, 
1994L), VAR1 = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_), VAR2 = c(NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_), VAR3 = c(NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_)), 
class = "data.frame", row.names = c(NA, 
-15L))

Ronak Shah 391k20 gold badges173 silver badges237 bronze badges · Accepted Answer · 2019-07-21 02:26:47Z

If your condition is going to remain the same for all the columns you can calculate them once to get indices in different column and assign the values together.

inds <- with(data, C1 == 2 & C2 == 20 & year == 1990)
data[inds, paste0("VAR", 1:3)] <- as.list(c(1, 60, 70))
data
# C1 C2 year VAR1 VAR2 VAR3
#1 2 20 1990 1 60 70
#2 2 20 1991 NA NA NA
#3 2 20 1992 NA NA NA
#4 2 20 1993 NA NA NA
#5 2 20 1994 NA NA NA
#6 20 70 1990 NA NA NA
#7 20 70 1991 NA NA NA
#8 20 70 1992 NA NA NA
#9 20 70 1993 NA NA NA
#10 20 70 1994 NA NA NA
#11 70 80 1990 NA NA NA
#12 70 80 1991 NA NA NA
#13 70 80 1992 NA NA NA
#14 70 80 1993 NA NA NA
#15 70 80 1994 NA NA NA

If you might have different conditions for different columns you can have a look at dplyr package which makes it easier such replacement using pipes

library(dplyr)
data %>%
 mutate(VAR1 = replace(VAR1, C1 == 2 & C2 == 20 & year == 1990, 1), 
 VAR2 = replace(VAR2, C1 == 2 & C2 == 20 & year == 1990, 60), 
 VAR3 = replace(VAR3, C1 == 2 & C2 == 20 & year == 1990, 70))

CollectivesTM on Stack Overflow

Stata to R Replace Values based on Condition

2 Answers 2

Comments

data

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

Comments

data

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related