0

As shown in this code, I need to use certain data in one table as a basis to modify another table and add some information. When this kind of table information scale is large, this violent traversal method is very inefficient. How should I modify it? Moreover, multiple sheets need to be compared, which makes the efficiency even lower.

df = pd.DataFrame({'id': [123, 321, 456, 543], 'name': ['xxx', 'yyy', 'zzz', 'www']})
df.set_index('id', inplace=True)
df_1 = pd.DataFrame({'id': [123, 321, 456, 543], 'name': ['xxx', 'yyy', 'zzz', 'www'], 'complete': ['yes', 'yes', 'yes', 'yes'], 'course_name':['AA', 'BB', 'AA', 'DD'], 'complete_date': ['1.1', '1.2', '1.1', '1.5']})
df_1.set_index('id', inplace=True)
group_df = df_1.groupby('course_name')
info = dict()
for course_name, course_df in group_df:
 info[course_name]=[]
 def process(row):
 info[course_name].append(Subscriber(*row.tolist()))
 get_info = course_df.loc[course_df["complete"] == "yes"]
 get_columns = ['name', 'complete_date']
 finish_df = get_info[get_columns]
 Subscriber = namedtuple('Subscriber', ['name', 'complete_date'])
 finish_df.apply(process, axis = 1)
# print(info)
# {'AA': [Subscriber(name='xxx', complete='yes'), Subscriber(name='zzz', complete='yes')], 'BB': [Subscriber(name='yyy', complete='yes')], 'DD': [Subscriber(name='www', complete='yes')]}
'''modify df'''
names = set(df['name'])
for course in info.keys():
 for name, date in info[course]:
 if name in names:
 df.loc[df['name'] == name, course] = date + 'yes' 
 pass
# print(df)
Alez
2,88510 gold badges29 silver badges38 bronze badges
asked May 24, 2024 at 1:50
2
  • Welcome here, with the current output, it looks like a pivot would do the job, something like df_1.reset_index().assign(val=lambda x: x['complete_date']+'yes').pivot(index=['id', 'name'], columns='course_name', values='val') plus a bit of cosmetic. But I assume your real case is a bit more complex, could you elaborate if this solution does not fill your need? Commented May 24, 2024 at 13:13
  • Thank you very much and sorry for the late reply. Yes, the requirements will be a little more complicated than this, because the table is used to maintain personnel information, and subsequent information will be used to add to the original table instead of overwriting it. I'll think about how to accomplish it, Commented May 27, 2024 at 1:29

1 Answer 1

2

As Ben.T is pointing out in the comments, a pivot approach would do the job. Here is an example of how you can do this:

import pandas as pd
from collections import namedtuple
df = pd.DataFrame({'id': [123, 321, 456, 543], 'name': ['xxx', 'yyy', 'zzz', 'www']})
df_1 = pd.DataFrame({'id': [123, 321, 456, 543], 'name': ['xxx', 'yyy', 'zzz', 'www'], 'complete': ['yes', 'yes', 'yes', 'yes'], 'course_name': ['AA', 'BB', 'AA', 'DD'], 'complete_date': ['1.1', '1.2', '1.1', '1.5']})
completed_courses = df_1[df_1['complete'] == 'yes']
pivot_df = completed_courses.pivot_table(index='id', columns='course_name', values='complete_date', aggfunc='first')
result_df = df.set_index('id').join(pivot_df, on='id')
result_df = result_df.fillna('')
for col in pivot_df.columns:
 result_df[col] = result_df[col].apply(lambda x: 'yes' if x != '' else x)
result_df.reset_index(inplace=True)
print(result_df)

which gives you

 id name AA BB DD
0 123 xxx yes 
1 321 yyy yes 
2 456 zzz yes 
3 543 www yes
answered May 25, 2024 at 5:38
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.