delete rows based on references to other tables in a mysql database?

Question 1

I have a database with the following hierarchy.

A dataset can have multiple scans (foreign key scan.id_dataset -> dataset.id)
A scan can have multiple labels (foreign key label.id_scan -> scan.id)
A label has a labeltype (foreign key label.id_labeltype-> labeltype.id)
The dataset, scan and labeltype tables all have a name column

I want to delete a label based on the names of the dataset, scan and labeltype. Here is my working code using the python mysql-connector:

def delete_by_labeltype(
 self, dataset_name: str, scan_name, labeltype_name: str
) -> int:
 sql = (
 "DELETE l.* FROM label l "
 "INNER JOIN labeltype lt "
 "ON l.id_labeltype = lt.id "
 "WHERE lt.name = %s AND l.id_scan = ("
 " SELECT scan.id FROM scan "
 " INNER JOIN dataset "
 " ON dataset.id = scan.id_dataset "
 " WHERE scan.name = %s AND dataset.name = %s"
 ");"
 )
 with self.connection.cursor() as cursor:
 cursor.execute(
 sql,
 (
 labeltype_name,
 scan_name,
 dataset_name,
 ),
 )
 self.connection.commit()
 return cursor.rowcount

The function is within a class handling access to the label table so connection is already established (and later closed correctly). I am simply wondering if the query is the best way to do this, since I am not working with SQL too often.

Question 2

nit: For consistency with scan.id I would choose the column name scan_id instead of id_scan, meh, whatever.

Kudos on correctly using bind variables, keeping Little Bobby Tables out of your hair. The %s, %s, %s in the query don't seem especially convenient, for matching up against {label, scan, ds} names in a tuple. Prefer to use three distinct names, and pass in a dict. If you do that habitually, then when a query grows to mention half a dozen parameters, and some newly hired maintenance engineer adds a seventh one, you won't be sorry.

And I like the explicit COMMIT.

best way to do this?

Looks good to me. I didn't hear you complaining about "too slow!" or reliability / flakiness. You didn't say that EXPLAIN PLAN showed some index was being ignored.

Including an ERD entity-relationship diagram would have been helpful, or at least the CREATE TABLE DDL including FK definitions. Let me try to organize your prose. Here's what I heard.

label --> labeltype
label --> scan --> dataset

It's not obvious to me why scan and dataset should be in a subquery rather than at same level -- it just arbitrarily came out that way, no biggie.

The l.id_scan = ( fragment suggests to me that there's enough UNIQUE indexes on {scan, ds} name to ensure we get no more than one result row from the subquery, else you would have mentioned l.id_scan IN ( .... It's perfectly fine the way it is. Maybe JOIN label against labeltype and also against scan? (And then scan against ds.)

If I was chicken, I would do all the JOINing in SELECT statements, obtain an id that I could do a sanity check on, and then send a simple DELETE single row command. But you're braver than me, and it works, so that's cool.

When reasoning about complex JOINs I often find it convenient to bury some of the complexity behind a VIEW. Minimally you would want to define a scan_dataset_v view so you never have to type the name of the "dataset" table again.

CREATE VIEW scan_dataset_v AS
 SELECT s.*, ds.*
 FROM scan s
 JOIN dataset ds ON ds.id = s.id_dataset;

Now label has just two relationships to worry about. (And, the * stars probably have at least one conflicting column name -- just write them all out. Absent the DDL, I don't know what all the columns are.)

You might possibly find it convenient to produce a view that JOINs them both to label, and another that JOINs labeltype to label. It just saves some typing during interactive queries. There's no efficiency hit -- think of it as the backend doing a macro expansion before issuing the query.

Summary: Any improvements? Nope, not really, LGTM.

J_H J_HJ_H 41.4k3 gold badges38 silver badges157 bronze badges · Accepted Answer · 2023-11-14 17:51:41Z

nit: For consistency with scan.id I would choose the column name scan_id instead of id_scan, meh, whatever.

Kudos on correctly using bind variables, keeping Little Bobby Tables out of your hair. The %s, %s, %s in the query don't seem especially convenient, for matching up against {label, scan, ds} names in a tuple. Prefer to use three distinct names, and pass in a dict. If you do that habitually, then when a query grows to mention half a dozen parameters, and some newly hired maintenance engineer adds a seventh one, you won't be sorry.

And I like the explicit COMMIT.

best way to do this?

Looks good to me. I didn't hear you complaining about "too slow!" or reliability / flakiness. You didn't say that EXPLAIN PLAN showed some index was being ignored.

Including an ERD entity-relationship diagram would have been helpful, or at least the CREATE TABLE DDL including FK definitions. Let me try to organize your prose. Here's what I heard.

label --> labeltype
label --> scan --> dataset

It's not obvious to me why scan and dataset should be in a subquery rather than at same level -- it just arbitrarily came out that way, no biggie.

The l.id_scan = ( fragment suggests to me that there's enough UNIQUE indexes on {scan, ds} name to ensure we get no more than one result row from the subquery, else you would have mentioned l.id_scan IN ( .... It's perfectly fine the way it is. Maybe JOIN label against labeltype and also against scan? (And then scan against ds.)

If I was chicken, I would do all the JOINing in SELECT statements, obtain an id that I could do a sanity check on, and then send a simple DELETE single row command. But you're braver than me, and it works, so that's cool.

When reasoning about complex JOINs I often find it convenient to bury some of the complexity behind a VIEW. Minimally you would want to define a scan_dataset_v view so you never have to type the name of the "dataset" table again.

CREATE VIEW scan_dataset_v AS
 SELECT s.*, ds.*
 FROM scan s
 JOIN dataset ds ON ds.id = s.id_dataset;

Now label has just two relationships to worry about. (And, the * stars probably have at least one conflicting column name -- just write them all out. Absent the DDL, I don't know what all the columns are.)

You might possibly find it convenient to produce a view that JOINs them both to label, and another that JOINs labeltype to label. It just saves some typing during interactive queries. There's no efficiency hit -- think of it as the backend doing a macro expansion before issuing the query.

Summary: Any improvements? Nope, not really, LGTM.

Stack Exchange Network

delete rows based on references to other tables in a mysql database?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

delete rows based on references to other tables in a mysql database?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions