1

I have a csv file which has a size of around 800MB which I'm trying to load into a dataframe via pandas but I keep getting a memory error. I need to load it so I can join it to another smaller dataframe.

Why am I getting a memory error even though I'm using 64bit versions of Windows, and Python 3.4 64bit and have over 8GB of RAM and plenty of harddisk? Is this is a bug in Pandas? How can I solve this memory issue?

asked Jun 15, 2016 at 13:03
3
  • Possible duplicate of Memory error when using pandas read_csv Commented Jun 15, 2016 at 13:36
  • I knew the answer, but I forgot. Commented Jun 15, 2016 at 15:44
  • 1
    You already have two questions about this: here and here stop reposting Commented Jun 16, 2016 at 16:00

1 Answer 1

1

reading your CSV in chunks might help:

chunk_size = 10**5
df = pd.concat([chunk for chunk in pd.read_csv(filename, chunksize=chunk_size)],
 ignore_index=False)
answered Jun 15, 2016 at 20:51
Sign up to request clarification or add additional context in comments.

4 Comments

That might help but won't solve the problem that a merge will kill it. Pandas is incredibly wasteful with memory
@nickpick, so what is your problem - reading 800MB CSV file or merging your DF with another smaller one?
Both is causing problems. The fact that reading in chunks and concatenating it makes a difference at all, points to issues in pandas
@nickpick, did you try to read your CSV file in chunks? if yes what shows df.info() after it's done?

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.