0

I am new to python and have a simple problem. In a first step, I want to load some sample data I created in Stata. In a second step, I would like to describe the data in python - that is, I'd like a list of the imported variable names. So far I've done this:

from pandas.io.stata import StataReader
reader = StataReader('sample_data.dta')
data = reader.data()
dir()

I get the following error:

anaconda/lib/python3.5/site-packages/pandas/io/stata.py:1375: UserWarning: 'data' is deprecated, use 'read' instead
 warnings.warn("'data' is deprecated, use 'read' instead")

What does it mean and how can I resolve the issue? And, is dir() the right way to get an understanding of what variables I have in the data?

asked Aug 21, 2016 at 13:28

2 Answers 2

2

Using pandas.io.stata.StataReader.data to read from a stata file has been deprecated in pandas 0.18.1 version and hence you are getting that warning.

Instead, you must use pandas.read_stata to read the file as shown:

df = pd.read_stata('sample_data.dta')
df.dtypes ## Return the dtypes in this object
answered Aug 21, 2016 at 14:18
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks I used import pandas and the command you suggested. However, df.dtyps does not return the data types. Any hints on why?
You must add a print statement before it.
Perfect, works! Thank you! I hope, I can now use the vars. Can I simple call the variables by their names or do I have to specify them first?
If you mean the columns, you could access them via df['column name'].
0

Sometimes this did not work for me especially when the dataset is large. So the thing I propose here is 2 steps (Stata and Python)

In Stata write the following commands:

export excel Cevdet.xlsx, firstrow(variables)

and to copy the variable labels write the following

describe, replace
 list
 export excel using myfile.xlsx, replace first(var)
restore

this will generate for you two files Cevdet.xlsx and myfile.xlsx

Now you go to your jupyter notebook

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel('Cevdet.xlsx')

This will allow you to read both files into jupyter (python 3)

My advice is to save this data file (especially if it is big)

df.to_pickle('Cevdet')

The next time you open jupyter you can simply run

df=pd.read_pickle("Cevdet")
answered Mar 31, 2019 at 15:03

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.