How to load .dta (preserving labels) most comfortable in R?

Question 1

I work with .dta files and try to make loading data as comfortable as possible. In my view, I need a combination of haven and readstata13.

haven looks perfect. It provides best "sub-labels". But it does not provide a column-selector-function. I cannot use read_dta for large files ( ~ 1 GB / on 64 GB RAM, Intel Xeon E5). enter image description here Question: Is there a way to select/load a subset of data?
read.dta13 is my best workaround. It has select.cols. But I have to get attr later, save and merge them (for about 10 files).

Question: How can I manually add these second labels which the haven package creates? (How are they called?)

enter image description here

Here is the MWE:

library(foreign)
write.dta(mtcars, "mtcars.dta")
library(haven)
mtcars <- read_dta("mtcars.dta")
library(readstata13)
mtcars2 <- read.dta13("mtcars.dta", convert.factors = FALSE, select.cols=(c("mpg", "cyl", "vs")))
var.labels <- attr(mtcars2,"var.labels")
data.key.mtcars2 <- data.frame(var.name=names(mtcars2),var.labels)

Question 2

haven's development version supports selecting columns with the col_select argument:

library(haven) # devtools::install_github("tidyverse/haven")
mtcars <- read_dta("mtcars.dta", col_select = c(mpg, cyl, vs))

Alternatively; the column labels in RStudio's viewer are taken from the data frame's columns' "label" attribute. You can use a simple loop to assign them from the labels read by readstata13:

for (i in seq_along(mtcars2)) {
 attr(mtcars2[[i]], "label") <- var.labels[i]
}
View(mtcars2)

Question 3

Thanks so much for the development. The col_select feature makes haven perfect, no more reason to use other packages. But the install of dev version didn't work for me, it says "Error: Failed to install 'haven' from GitHub: (converted from warning) installation of package ‘XX/haven_2.1.1.9000.tar.gz’ had non-zero exit status"

Question 4

Ok, I had to delete a 00LOCK-haven folder in my lib manually. Then it worked. Looking foward to the new release of haven.

Mikko Marttila 12.2k1 gold badge22 silver badges39 bronze badges · Accepted Answer · 2019-08-29 08:02:47Z

haven's development version supports selecting columns with the col_select argument:

library(haven) # devtools::install_github("tidyverse/haven")
mtcars <- read_dta("mtcars.dta", col_select = c(mpg, cyl, vs))

Alternatively; the column labels in RStudio's viewer are taken from the data frame's columns' "label" attribute. You can use a simple loop to assign them from the labels read by readstata13:

for (i in seq_along(mtcars2)) {
 attr(mtcars2[[i]], "label") <- var.labels[i]
}
View(mtcars2)

Thanks so much for the development. The col_select feature makes haven perfect, no more reason to use other packages. But the install of dev version didn't work for me, it says "Error: Failed to install 'haven' from GitHub: (converted from warning) installation of package ‘XX/haven_2.1.1.9000.tar.gz’ had non-zero exit status"
Ok, I had to delete a 00LOCK-haven folder in my lib manually. Then it worked. Looking foward to the new release of haven.

CollectivesTM on Stack Overflow

How to load .dta (preserving labels) most comfortable in R?

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related