I work with .dta files and try to make loading data as comfortable as possible. In my view, I need a combination of haven and readstata13.
havenlooks perfect. It provides best "sub-labels". But it does not provide a column-selector-function. I cannot useread_dtafor large files ( ~ 1 GB / on 64 GB RAM, Intel Xeon E5). enter image description here Question: Is there a way to select/load a subset of data?read.dta13is my best workaround. It hasselect.cols. But I have to getattrlater, save and merge them (for about 10 files).Question: How can I manually add these second labels which the
havenpackage creates? (How are they called?)
Here is the MWE:
library(foreign)
write.dta(mtcars, "mtcars.dta")
library(haven)
mtcars <- read_dta("mtcars.dta")
library(readstata13)
mtcars2 <- read.dta13("mtcars.dta", convert.factors = FALSE, select.cols=(c("mpg", "cyl", "vs")))
var.labels <- attr(mtcars2,"var.labels")
data.key.mtcars2 <- data.frame(var.name=names(mtcars2),var.labels)
1 Answer 1
haven's development version supports selecting columns with the col_select argument:
library(haven) # devtools::install_github("tidyverse/haven")
mtcars <- read_dta("mtcars.dta", col_select = c(mpg, cyl, vs))
Alternatively; the column labels in RStudio's viewer are taken from the data frame's columns' "label" attribute. You can use a simple loop to assign them from the labels read by readstata13:
for (i in seq_along(mtcars2)) {
attr(mtcars2[[i]], "label") <- var.labels[i]
}
View(mtcars2)
2 Comments
col_select feature makes haven perfect, no more reason to use other packages. But the install of dev version didn't work for me, it says "Error: Failed to install 'haven' from GitHub: (converted from warning) installation of package ‘XX/haven_2.1.1.9000.tar.gz’ had non-zero exit status"00LOCK-haven folder in my lib manually. Then it worked. Looking foward to the new release of haven.