Assign 0 or 1 with different probabilities conditional on another column in R

Asked 3 years, 9 months ago

Viewed 305 times

\$\begingroup\$

I am trying to assign a 0 or 1 with some stochasticity based on another column in a data frame (outcome). If outcome == 1, the new column exposure should equal 1 about 90% of the time. Conversely if outcome == 0 it should equal 1 about 20% of the time.

I am currently doing this with a for loop but wondering if there are more efficient/elegant ways to accomplish this (i.e. via vectorization).

To be clear, though the data frame is labeled example_data this is not an example - its the test data I am generating to test a series of functions related to GEE models.

set.seed(05062020)
example_data <- data.frame(id = as.factor(rep(sprintf("Record %s",seq(1:50)), each = 2)),
 outcome = as.factor(rep(sample(0:1, 50, prob = c(0.8,0.2), replace = TRUE), each = 2)))
for (i in 1:nrow(example_data)){
 example_data$exposure[i] <- ifelse(example_data$outcome[i] == 1, 
 sample(0:1, 1, prob = c(0.1, 0.9)),
 sample(0:1, 1, prob = c(0.8, 0.2)))
}

edited Dec 13, 2021 at 16:59

jpsmithjpsmith

asked Dec 13, 2021 at 16:26

jpsmith's user avatar

jpsmith jpsmith

1546 bronze badges

\$\endgroup\$

Add a comment |

2 Answers 2

Sorted by: Reset to default

\$\begingroup\$

example_data$exposure <- ifelse(example_data$outcome == 1, 
 sample(0:1, nrow(example_data), prob = c(0.1, 0.9), replace = T),
 sample(0:1, nrow(example_data), prob = c(0.8, 0.2), replace = T))

ifelse is vectorized, so we can do this with one function call.

answered Dec 13, 2021 at 19:38

minem's user avatar

minem minem

1,0021 gold badge9 silver badges12 bronze badges

\$\endgroup\$

\$\begingroup\$ Ack I tried this but forgot to add the replace = TRUE and so it didnt work. Thank you! \$\endgroup\$

jpsmith
– jpsmith

2021年12月14日 01:24:16 +00:00
Commented Dec 14, 2021 at 1:24

Add a comment |

\$\begingroup\$

Sampling coin flips with data-dependent probabilities can often be done elegantly by thresholding a uniform random variable:

example_data$exposure <-
 as.numeric(runif(nrow(example_data)) <= 0.2 + 0.7*(example_data$outcome==1))

So basically the threshold is 0.2 when example_data$outcome == 0 and is 0.9 when example_data$outcome == 1. I used 0.7*(example_data$outcome==1) instead of just 0.7*example_data$outcome because example_data$outcome is defined as a factor in your data frame, and the as.numeric converts TRUE/FALSE into 1/0.

answered Dec 13, 2021 at 20:16

josliber's user avatar

josliber josliber

1,2219 silver badges17 bronze badges

\$\endgroup\$

Add a comment |

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-r

Stack Exchange Network

Assign 0 or 1 with different probabilities conditional on another column in R

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Assign 0 or 1 with different probabilities conditional on another column in R

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions