2
\$\begingroup\$

I have a dataframe (obtained from a csv saved from mysql) with several columns, and one of them consist of a string which is the representation of a json. The data looks like:

id email_id provider raw_data ts
1 [email protected] A {'a':'A', 2019-23-08 00:00:00
 'b':'B',
 'c':'C'}

And what my desired output is:

email_id a b c 
[email protected] A B C 

What I have coded so far is the following:

import pandas as pd
import ast
df = pd.read_csv('data.csv')
df1 = pd.DataFrame()
for i in range(len(df)):
 dict_1 = ast.literal_eval(df['raw_content'][i])
 df1 = df1.append(pd.Series(dict_1),ignore_index=True)
pd.concat([df['email_id'],df1])

This works but it has a very big problem: it is extremely low (it takes hours for 100k rows). How could I make this operation faster?

asked Aug 26, 2019 at 10:43
\$\endgroup\$
3
  • 2
    \$\begingroup\$ Would json.loads be any faster? With a more limited structure it might. Another thing to consider is doing a string join on all those raw_data strings, and calling loads once. I haven't worked with json enough to know what's fast or slow in its parsing. \$\endgroup\$ Commented Aug 26, 2019 at 20:32
  • 1
    \$\begingroup\$ Is the above exactly how your file looks? \$\endgroup\$ Commented Aug 27, 2019 at 20:53
  • \$\begingroup\$ I forgot to close the json, sorry \$\endgroup\$ Commented Aug 28, 2019 at 9:05

1 Answer 1

1
\$\begingroup\$

Finally I got an amazing improvement thanks to stack overflow, regarding two things: https://stackoverflow.com/questions/10715965/add-one-row-to-pandas-dataframe https://stackoverflow.com/questions/37757844/pandas-df-locz-x-y-how-to-improve-speed

Also, as hpaulj pointed, changing to json.loads slightly increases the performance.

It went from 16 hours to 30 seconds

row_list = []
for i in range(len(df)):
 dict1 = {}
 dict1.update(json.loads(df.at[i,'raw_content']))
 row_list.append(dict1)
df1 = pd.DataFrame(row_list)
df2 = pd.concat([df['email_id'],df1],axis=1)
answered Aug 28, 2019 at 9:08
\$\endgroup\$
2
  • \$\begingroup\$ "got an amazing improvement thanks to stack overflow" Do you have a link to that answer? \$\endgroup\$ Commented Aug 28, 2019 at 12:56
  • 1
    \$\begingroup\$ Yes, I have edited the answer to add those links \$\endgroup\$ Commented Aug 28, 2019 at 13:53

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.