How to import .dta via pandas and describe data?

Question 1

I am new to python and have a simple problem. In a first step, I want to load some sample data I created in Stata. In a second step, I would like to describe the data in python - that is, I'd like a list of the imported variable names. So far I've done this:

from pandas.io.stata import StataReader
reader = StataReader('sample_data.dta')
data = reader.data()
dir()

I get the following error:

anaconda/lib/python3.5/site-packages/pandas/io/stata.py:1375: UserWarning: 'data' is deprecated, use 'read' instead
 warnings.warn("'data' is deprecated, use 'read' instead")

What does it mean and how can I resolve the issue? And, is dir() the right way to get an understanding of what variables I have in the data?

Question 2

Using pandas.io.stata.StataReader.data to read from a stata file has been deprecated in pandas 0.18.1 version and hence you are getting that warning.

Instead, you must use pandas.read_stata to read the file as shown:

df = pd.read_stata('sample_data.dta')
df.dtypes ## Return the dtypes in this object

Question 3

Thanks I used import pandas and the command you suggested. However, df.dtyps does not return the data types. Any hints on why?

Question 4

You must add a print statement before it.

Question 5

Perfect, works! Thank you! I hope, I can now use the vars. Can I simple call the variables by their names or do I have to specify them first?

Question 6

If you mean the columns, you could access them via df['column name'].

Question 7

Sometimes this did not work for me especially when the dataset is large. So the thing I propose here is 2 steps (Stata and Python)

In Stata write the following commands:

export excel Cevdet.xlsx, firstrow(variables)

and to copy the variable labels write the following

describe, replace
 list
 export excel using myfile.xlsx, replace first(var)
restore

this will generate for you two files Cevdet.xlsx and myfile.xlsx

Now you go to your jupyter notebook

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel('Cevdet.xlsx')

This will allow you to read both files into jupyter (python 3)

My advice is to save this data file (especially if it is big)

df.to_pickle('Cevdet')

The next time you open jupyter you can simply run

df=pd.read_pickle("Cevdet")

Nickil Maveli 29.8k10 gold badges86 silver badges88 bronze badges · Accepted Answer · 2016-08-21 14:18:16Z

2

Using pandas.io.stata.StataReader.data to read from a stata file has been deprecated in pandas 0.18.1 version and hence you are getting that warning.

Instead, you must use pandas.read_stata to read the file as shown:

df = pd.read_stata('sample_data.dta')
df.dtypes ## Return the dtypes in this object

Share

Improve this answer

answered Aug 21, 2016 at 14:18

Nickil Maveli's user avatar

Nickil Maveli

29.8k10 gold badges86 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Rachel

Rachel Over a year ago

Thanks I used import pandas and the command you suggested. However, df.dtyps does not return the data types. Any hints on why?

2016年08月21日T14:25:21.88Z+00:00

Nickil Maveli

Nickil Maveli Over a year ago

You must add a print statement before it.

2016年08月21日T14:25:58.95Z+00:00

Rachel

Rachel Over a year ago

Perfect, works! Thank you! I hope, I can now use the vars. Can I simple call the variables by their names or do I have to specify them first?

2016年08月21日T14:27:57.667Z+00:00

Nickil Maveli

Nickil Maveli Over a year ago

If you mean the columns, you could access them via df['column name'].

2016年08月21日T14:34:01.11Z+00:00

CollectivesTM on Stack Overflow

How to import .dta via pandas and describe data?

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related