I have a csv file which has a size of around 800MB which I'm trying to load into a dataframe via pandas but I keep getting a memory error. I need to load it so I can join it to another smaller dataframe.
Why am I getting a memory error even though I'm using 64bit versions of Windows, and Python 3.4 64bit and have over 8GB of RAM and plenty of harddisk? Is this is a bug in Pandas? How can I solve this memory issue?
asked Jun 15, 2016 at 13:03
Nickpick
6,67720 gold badges72 silver badges137 bronze badges
-
Possible duplicate of Memory error when using pandas read_csvhashcode55– hashcode552016年06月15日 13:36:51 +00:00Commented Jun 15, 2016 at 13:36
-
I knew the answer, but I forgot.piRSquared– piRSquared2016年06月15日 15:44:16 +00:00Commented Jun 15, 2016 at 15:44
-
1You already have two questions about this: here and here stop repostingNoelkd– Noelkd2016年06月16日 16:00:56 +00:00Commented Jun 16, 2016 at 16:00
1 Answer 1
reading your CSV in chunks might help:
chunk_size = 10**5
df = pd.concat([chunk for chunk in pd.read_csv(filename, chunksize=chunk_size)],
ignore_index=False)
answered Jun 15, 2016 at 20:51
MaxU - stand with Ukraine
212k37 gold badges403 silver badges437 bronze badges
Sign up to request clarification or add additional context in comments.
4 Comments
Nickpick
That might help but won't solve the problem that a merge will kill it. Pandas is incredibly wasteful with memory
MaxU - stand with Ukraine
@nickpick, so what is your problem - reading 800MB CSV file or merging your DF with another smaller one?
Nickpick
Both is causing problems. The fact that reading in chunks and concatenating it makes a difference at all, points to issues in pandas
MaxU - stand with Ukraine
@nickpick, did you try to read your CSV file in chunks? if yes what shows
df.info() after it's done?lang-py