2

I have a dataset with over 170 variables that looks as follows:

df <- data.frame(var1 = 1:3, var2 = 2:4, var3 = 2:4, var4 = 2:4, var5 = 2:4)

I have manually added variable values using Hmisc:

library(Hmisc)
var.labels = c(var1 = "label 1",
 var3 = "label 2",
 var4="label 4" )
label(df) = as.list(var.labels[match(names(df), names(var.labels))]) 

Do note that not all variables have labels and that it is much easier for me to specify variable labels by name (var*), rather than position given my large number of colums.

The problem is that when I save my dataset with write.csv(df,"df.csv")or with write.dta(df,"df.dta") my variable labels get lost.

How can I save the data in a way that can be re-imported into R and Stata with the labels restored?

asked Oct 8, 2019 at 14:34
1
  • 2
    Standard CSV files don't have any way to save labels. You can save an object in R in a binary format with saveRDS which would preserve labels but you can't read that into Stata. Perhaps you can just save the labels in a separate file and merge within each program. Commented Oct 8, 2019 at 14:48

2 Answers 2

2

It is a little tricky. You need to label all variables. If you do not want to label a variable, you may still need to put variable = "". Otherwise write.dta will ignore all variable labels.

In R

df <- data.frame(var1 = 1:3, var2 = 2:4, var3 = 2:4, var4 = 2:4, var5 = 2:4)
attr(df, "var.labels") <- c(var1 = "label 1", var2 = "label 2", 
 var3 = "", var4="label 4", var5 = "")
foreign::write.dta(df, "dat_stata.dta")

In Stata, you get:

. des
Contains data from C:...dat_stata.dta
 obs: 3 Written by R. 
 vars: 5 
 size: 60 
------------------------------------------------------------------------------------------------------
 storage display value
variable name type format label variable label
------------------------------------------------------------------------------------------------------
var1 long %9.0g label 1
var2 long %9.0g label 2
var3 long %9.0g 
var4 long %9.0g label 4
var5 long %9.0g 
------------------------------------------------------------------------------------------------------
Sorted by: 

Please note: I used Stata 14, and R package foreign

answered Oct 10, 2019 at 11:40
Sign up to request clarification or add additional context in comments.

2 Comments

I was wondering is there a way to automatically attribute empty labels to those variables to which I do not specifically assign a value? My dataset has over 700 variables and i only need to label those that are most relevant. something like attr(df, "var.labels") <- c(var1 = "label 1", var2 = "label 2" , var4="label 4") and then tell r to automatically assign "" to the rest of the variables
It works, but you have to make sure to ADD A LABEL TO EACH VARIABLE, otherwise, it will not work!
1

You can set the colnames of the dataframe to those labels that are available:

colnames(df)[which(!is.na(label(df)))]<-as.character(label(df)[which(!is.na(label(df)))])
answered Oct 8, 2019 at 14:47

1 Comment

Thanks for this, but this is not really what I want, I would like to have my variable names that are easily accessible and then have labels saved so that when I have doubts on the data I know what it is about.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.