How to treat encoding when reading .dta-files into R from Stata-files prior to version 14?

Question 1

How can one dodge the encoding problems when reading Stata-data into R?

The dataset I wish to read is a .dta in either Stata 12 or Stata 13 (before Stata introduced support for utf-8 in version 14). Text-variables with Swedish and German letters å, ä, ö, ß, as well as other characters do not import well.

I have tried these answers, read.dta in foreign, the haven package (with no encoding-parameters), and now read_stata13, which informs me that it expects Stata files to be encoded in CP1252. But alas, the encoding doesn't work. Should I give up and and use a .csv-export as a bridge instead, or is it actually possible to read .dta-files in R?

Minimal example:
This code downloads the first few lines of my dataset, and illustrates the problem, for example in the variable vocation which contain Scandinavian languages.

setwd("~/Downloads/")
system("curl -O http://www.lilljegren.com/stackoverflow/example.stata13.dta", intern=F)
library(foreign)
?read_dta
df1 <- read_dta('example.stata13.dta', encoding="latin1")
df2 <- read_dta('example.stata13.dta', encoding="CP1252")
library(readstata13)
df3 <- read.dta13('example.stata13.dta', fromEncoding="latin1")
df4 <- read.dta13('example.stata13.dta', fromEncoding="CP1252")
df5 <- read.dta13('example.stata13.dta', fromEncoding="utf-8")
vocation <- c("Brandkorpral","Sömmerska","Jungfru","Timmerman","Skomakare","Skräddare","Föreståndare","Platsförsäljare","Sömmerska")
df4$vocation == vocation
# [1] TRUE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE

Question 2

csv is probably the best thing to do. Or if you have Stata 14 convert the files to Unicode first and save.

Question 3

This is what I'm fearing. I'm looking at different files Stata builds using enca, but it is not able to guess what encoding they are, and I also have some encoding problems reading the csv-files that Stata generates. Uhhh. Stata really isn't awesome :/ 21st century software without support for utf-8 :(

Question 4

Stata's current version is 15 and as of version 14 supports Unicode. Not sure why you are complaining for features that are not available in software that is two versions behind and no longer supported / maintained. Upgrade?

Question 5

I am poor, and Stata is a licensed software that'd cost me expensively for an upgrade needed merely to resolve this encoding-problem that, I think one could argue, shouldn't have to belong to our decade. But duly noted: I was grumpy. :) Besides, the correct encoding was "macroman", and I found out by going through the csv-solution, as you suggested, so thank you.

Question 6

The correct encoding to read files generated by Stata prior to version 14 on Macs is "macroman"

df <- read.dta13('example.stata13.dta', fromEncoding="macroman")

On my Mac, both .dta-files in stata13 and stata12 formats (saved by saveold in Stata 13) imported nicely like this.

Supposedly, the manual of read_stata13, correctly assumes "CP1252" on other platforms. To me, "macroman", however, did the trick, (also for the .csv-files that Stata 13 generated with export delimited).

Question 7

Note that you make no mention whatsoever in your question that you are using a Mac. Which is probably why nobody answered.

nJGL 8647 silver badges18 bronze badges · Accepted Answer · 2018-11-07 08:52:03Z

The correct encoding to read files generated by Stata prior to version 14 on Macs is "macroman"

df <- read.dta13('example.stata13.dta', fromEncoding="macroman")

On my Mac, both .dta-files in stata13 and stata12 formats (saved by saveold in Stata 13) imported nicely like this.

Supposedly, the manual of read_stata13, correctly assumes "CP1252" on other platforms. To me, "macroman", however, did the trick, (also for the .csv-files that Stata 13 generated with export delimited).

Note that you make no mention whatsoever in your question that you are using a Mac. Which is probably why nobody answered.

CollectivesTM on Stack Overflow

How to treat encoding when reading .dta-files into R from Stata-files prior to version 14?

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related