2
\$\begingroup\$

I want to populate columns of a dataframe (df) by iteratively looping over a list (A_list) generating a dictionary where the keys are the names of the desired columns of df (in the example below the new columns are 'C', 'D', and 'E')

import pandas
def gen_data(key):
 #THIS FUNCTION IS JUST AN EXAMPLE THE COLUMNS ARE NOT NECESSARILY RELATED OR USE THE KEY 
 data_dict = {'C':key+key, 'D':key, 'E':key+key+key}
 return data_dict
A_list = ['a', 'b', 'c', 'd', 'f']
df = pandas.DataFrame({'A': ['a', 'b', 'c', 'd', 'f'], 'B': [1,2,3,3,2]})
for A_value in A_list:
 data_dict = gen_data(A_value)
 for data_key in data_dict:
 df.loc[df.A == A_value, data_key] = data_dict[key]

So the result of this should be:

df = pandas.DataFrame({'A': ['a', 'b', 'c', 'd', 'e','f'], 
 'B': [1,2,3,3,2,1],
 'C': ['aa','bb','cc','dd',nan,'ff'],
 'D': ['a', 'b', 'c', 'd', nan,'f'],
 'E': ['aaa','bbb','ccc','ddd',nan,'fff']})

I feel that

for data_key in data_dict:
 df.loc[df.A == A_value, data_key] = data_dict[key]

is really inefficient if there are a lot of rows in df and I feel that there should be a way to remove the for loop in this code.

for A_value in A_list:
 data_dict = gen_data(A_value)
 for data_key in data_dict:
 df.loc[df.A == key, data_key] = data_dict[key]
Sᴀᴍ Onᴇᴌᴀ
29.5k16 gold badges45 silver badges201 bronze badges
asked Jul 13, 2019 at 23:19
\$\endgroup\$
2
  • \$\begingroup\$ Since you're looking for a specific improvement in your code it belongs on Stack Overflow instead. \$\endgroup\$ Commented Jul 13, 2019 at 23:44
  • \$\begingroup\$ Welcome to Code Review! Please see What to do when someone answers. I have rolled back Rev 3 → 2 \$\endgroup\$ Commented Jul 16, 2019 at 16:49

1 Answer 1

1
\$\begingroup\$

Since there is an e missing in the input dataframe in col A provided by you, I have added it:

#input
A_list = ['a', 'b', 'c', 'd', 'f']
df = pd.DataFrame({'A': ['a', 'b', 'c', 'd','e','f'], 'B': [1,2,3,3,2,1]})

You can start by joining the list you have:

pat='({})'.format('|'.join(A_list))
#pat --> '(a|b|c|d|f)'

Then using series.str.extract() I am extracting the matching keys from the series based on the pattern we created.

s=df.A.str.extract(pat,expand=False) #expand=False returns a series for further assignment
print(s)

0 a
1 b
2 c
3 d
4 NaN
5 f

Once you have this series, you can decide what you want to do with it. For,example if I take your function:

def gen_data(key):
 #THIS FUNCTION IS JUST AN EXAMPLE THE COLUMNS ARE NOT NECESSARILY RELATED OR USE THE KEY 
 data_dict = {'C':key*2, 'D':key, 'E':key*3}
 return data_dict

And do the below:

df.join(pd.DataFrame(s.apply(gen_data).values.tolist()))

We get the desired output:

 A B C D E
0 a 1 aa a aaa
1 b 2 bb b bbb
2 c 3 cc c ccc
3 d 3 dd d ddd
4 e 2 NaN NaN NaN
5 f 1 ff f fff

However I personally wouldn't use apply unless mandatory, so here is another way using df.assign() where you can pass a dictionary of the extracted series and assign it to the dataframe:

df=df.assign(**{'C':s*2,'D':s,'E':s*3})

 A B C D E
0 a 1 aa a aaa
1 b 2 bb b bbb
2 c 3 cc c ccc
3 d 3 dd d ddd
4 e 2 NaN NaN NaN
5 f 1 ff f fff
answered Jul 14, 2019 at 6:37
\$\endgroup\$
3
  • \$\begingroup\$ Hey anky_91, Thank you for your reply. I really like the df.assign example you showed however my problem is that my "gen_data" is a bit complex requiring file io access so I won't be able to do any vectorization (i.e. {'C':s*2,'D':s,'E':s*3}) as per your example. However I have iteratively used assign with df.loc[df.A == key] = df.loc[df.A == key].assign(**metric_dict) and it now only take 1/3 the amount of time. is there a more efficient way of using assign? \$\endgroup\$ Commented Jul 16, 2019 at 0:54
  • 1
    \$\begingroup\$ @kkawabat if vectorization isn't possible, you're doing it right IMO. \$\endgroup\$ Commented Jul 16, 2019 at 2:27
  • \$\begingroup\$ I've editted the submission to use assign() which seems to finish a bit faster. TY \$\endgroup\$ Commented Jul 16, 2019 at 2:31

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.