Efficient Pandas to MySQL "UPDATE... WHERE"

Question 1

I have a pandas DataFrame and a (MySQL) database with the same columns. The database is not managed by me.

I want to update the values in the database in an "UPDATE... WHERE" style, updating only some columns wherever some other columns match.

Here's my code:

import sqlalchemy as sqla
def save_to_db(final_df, passwd):
 engine_str = 'mysql+mysqldb://username:{}@localhost/mydb'.format(passwd)
 engine = sqla.create_engine(engine_str)
 sm = sessionmaker(bind=engine)
 session = sm()
 metadata = sqla.MetaData(bind=engine)
 datatable = sqla.Table('AdcsLogForProduct', metadata, autoload=True)
 for ind, row in final_df.iterrows():
 u = sqla.sql.update(datatable) \
 .values({"q_ECI_B_x": row.q_ECI_B_x,
 "q_ECI_B_y": row.q_ECI_B_y,
 "q_ECI_B_z": row.q_ECI_B_z,
 "q_ECI_B_s": row.q_ECI_B_s}) \
 .where(sqla.and_(datatable.c.year == row.year,
 datatable.c.month == row.month,
 datatable.c.day == row.day,
 datatable.c.hours == row.hours,
 datatable.c.minutes == row.minutes,
 datatable.c.seconds == row.seconds,
 datatable.c.milliseconds == row.milliseconds,
 datatable.c.microseconds == row.microseconds))
 session.execute(u)
 session.flush()
 session.commit()

I'm doing this with plain sqlalchemy because apparently pandas' built-in SQL functions can't handle "UPDATE... WHERE" scenarios. However, this is really slow.

Isn't there a more efficient way to do this?

Question 2

I've now shown the full function, with only minimal editing. Can I take that the function inputs as given? Sorry if this takes a few iterations.

Question 3

You have eight conditions to match for every UPDATE. A typical solution would store timestamps using a DATETIME or TIMESTAMP column, so that there is only one value to match.

For reasonable performance, ensure that the timestamp field is indexed.

Question 4

Thank you. As I've now included in my question, I'm not in control of the database. Do you think this multiple matching could be the main bottleneck rather than the pandas side (iterrows)?

Question 5

Ensure that there is an index on (year, ..., microseconds). Otherwise, dump final_df to a table using .to_sql() and do one

UPDATE AdcsLogForProduct log JOIN tmp ON log.year=tmp.year AND ... log.microseconds=tmp.microseconds SET log.q_ECI_B_x = tmp.q_ECI_B_x, log.q_ECI_B_y = tmp.q_ECI_B_y, ...

. If that giant update is slow, then make your whoever is in charge of the database deal with it — you can't blame PANDAS anymore. Design decisions have consequences, and I think this performance problem is one of them.

200_success 200_success 145k22 gold badges190 silver badges478 bronze badges · Answer 1 · 2015-02-18 03:44:26Z

7

\$\begingroup\$

You have eight conditions to match for every UPDATE. A typical solution would store timestamps using a DATETIME or TIMESTAMP column, so that there is only one value to match.

For reasonable performance, ensure that the timestamp field is indexed.

Share

answered Feb 18, 2015 at 3:44

200_success's user avatar

200_success 200_success

145k22 gold badges190 silver badges478 bronze badges

\$\endgroup\$

2

\$\begingroup\$ Thank you. As I've now included in my question, I'm not in control of the database. Do you think this multiple matching could be the main bottleneck rather than the pandas side (iterrows)? \$\endgroup\$

IGRSR
– IGRSR

2015年02月18日 07:37:01 +00:00
Commented Feb 18, 2015 at 7:37
2

\$\begingroup\$ Ensure that there is an index on (year, ..., microseconds). Otherwise, dump final_df to a table using .to_sql() and do one UPDATE AdcsLogForProduct log JOIN tmp ON log.year=tmp.year AND ... log.microseconds=tmp.microseconds SET log.q_ECI_B_x = tmp.q_ECI_B_x, log.q_ECI_B_y = tmp.q_ECI_B_y, .... If that giant update is slow, then make your whoever is in charge of the database deal with it — you can't blame PANDAS anymore. Design decisions have consequences, and I think this performance problem is one of them. \$\endgroup\$

200_success
– 200_success

2015年02月18日 07:51:48 +00:00
Commented Feb 18, 2015 at 7:51

Add a comment |

Stack Exchange Network

Efficient Pandas to MySQL "UPDATE... WHERE"

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Efficient Pandas to MySQL "UPDATE... WHERE"

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions