How to rewrite this Stata code in R?

Question 1

One of the things Stata does well is the way it constructs new variables (see example below). How to do this in R?

foreach i in A B C D { 
 forval n=1990/2000 { 
 local m = 'n'-1 
 # create new columns from existing ones on-the-fly 
 generate pop'i''n' = pop'i''m' * (1 + trend'n') 
 } 
}

Question 2

for those that don't speak stata, maybe add what the final output should look like? And the input data for that matter...

Question 3

I'm wondering what idiot designer of a statistical package decided that 1990/2000 was a range rather than a division facepalm

Question 4

@Spacedman: You don't know the half of it. I used Stata for 3 years. Worst. Programming. Language. Ever.

Question 5

@Joshua : May I kindly agree :-) But it has to be said, it is quite a powerful statistical package. You just shouldn't be dreaming about anything else but scripting your analysis.

Question 6

@Joris: Though I didn't explicitly say so, I agree that Stata has a lot of statistical capability. That's why I was careful to specifically say programming in Stata is terrible. ;-)

Question 7

DONT do it in R. The reason its messy is because its UGLY code. Constructing lots of variables with programmatic names is a BAD THING. Names are names. They have no structure, so do not try to impose one on them. Decent programming languages have structures for this - rubbishy programming languages have tacked-on 'Macro' features and end up with this awful pattern of constructing variable names by pasting strings together. This is a practice from the 1970s that should have died out by now. Don't be a programming dinosaur.

For example, how do you know how many popXXXX variables you have? How do you know if you have a complete sequence of pop1990 to pop2000? What if you want to save the variables to a file to give to someone. Yuck, yuck yuck.

Use a data structure that the language gives you. In this case probably a list.

Question 8

Both Spacedman and Joshua have very valid points. As Stata has only one dataset in memory at any given time, I'd suggest to add the variables to a dataframe (which is also a kind of list) instead of to the global environment (see below).

But honestly, the more R-ish way to do so, is to keep your factors factors instead of variable names.

I make some data as I believe it is in your R version now (at least, I hope so...)

Data <- data.frame(
 popA1989 = 1:10,
 popB1989 = 10:1,
 popC1989 = 11:20,
 popD1989 = 20:11
)
Trend <- replicate(11,runif(10,-0.1,0.1))

You can then use the stack() function to obtain a dataframe where you have a factor pop and a numeric variable year

newData <- stack(Data)
newData$pop <- substr(newData$ind,4,4)
newData$year <- as.numeric(substr(newData$ind,5,8))
newData$ind <- NULL

Filling up the dataframe is then quite easy :

for(i in 1:11){
 tmp <- newData[newData$year==(1988+i),]
 newData <- rbind(newData,
 data.frame( values = tmp$values*Trend[,i],
 pop = tmp$pop,
 year = tmp$year+1
 )
 )
}

In this format, you'll find most R commands (selections of some years, of a single population, modelling effects of either or both, ...) a whole lot easier to perform later on.

And if you insist, you can still create a wide format with unstack()

unstack(newData,values~paste("pop",pop,year,sep=""))

Adaptation of Joshua's answer to add the columns to the dataframe :

for(L in LETTERS[1:4]) {
 for(i in 1990:2000) {
 new <- paste("pop",L,i,sep="") # create name for new variable
 old <- get(paste("pop",L,i-1,sep=""),Data) # get old variable
 trend <- Trend[,i-1989] # get trend variable
 Data <- within(Data,assign(new, old*(1+trend)))
 }
}

Question 9

Can you explain what you mean by "keep your factors factors instead of variable names"?

Question 10

@KevinM That's the difference between "long format" and "wide format". You put all data in a single column, and use a factor or categorical variable to describe which data is from which population and year. If you use your variable names to indicate which year and population we're talking about, you'll have more difficulty using that information. Both population and year are categorical variables in terms of statistical analysis. So I keep them as a categorical variable (factor) instead of combining them to construct variable names.

Question 11

Assuming popA1989, popB1989, popC1989, popD1989 already exist in your global environment, the code below should work. There are certainly more "R-like" ways to do this, but I wanted to give you something similar to your Stata code.

for(L in LETTERS[1:4]) {
 for(i in 1990:2000) {
 new <- paste("pop",L,i,sep="") # create name for new variable
 old <- get(paste("pop",L,i-1,sep="")) # get old variable
 trend <- get(paste("trend",i,sep="")) # get trend variable
 assign(new, old*(1+trend))
 }
}

Question 12

Assuming you have population data in vector pop1989 and data for trend in trend.

require(stringr)# because str_c has better default for sep parameter
dta <- kronecker(pop1989,cumprod(1+trend))
names(dta) <- kronecker(str_c("pop",LETTERS[1:4]),1990:2000,str_c)

Spacedman 94.7k12 gold badges148 silver badges231 bronze badges · Accepted Answer · 2011-02-17 08:05:48Z

DONT do it in R. The reason its messy is because its UGLY code. Constructing lots of variables with programmatic names is a BAD THING. Names are names. They have no structure, so do not try to impose one on them. Decent programming languages have structures for this - rubbishy programming languages have tacked-on 'Macro' features and end up with this awful pattern of constructing variable names by pasting strings together. This is a practice from the 1970s that should have died out by now. Don't be a programming dinosaur.

For example, how do you know how many popXXXX variables you have? How do you know if you have a complete sequence of pop1990 to pop2000? What if you want to save the variables to a file to give to someone. Yuck, yuck yuck.

Use a data structure that the language gives you. In this case probably a list.

CollectivesTM on Stack Overflow

How to rewrite this Stata code in R?

4 Answers 4

Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

4 Answers 4

Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related