MySQL interval search query suggestions

Question 1

My database structure is:

assets
- id (int) 
- name (text)
- stock (int)
booking:
- id (int)
- asset_id (int) Refers to the assets table
- quantity (int) How many assets are planned in this timeframe
- start_date (datetime)
- end_date (datetime)

I need to find the amount of assets that are short within the given timeframe.

I wrote the following query:

SELECT a.id, 
 (a.stock - COALESCE(b.total_planned, 0)) AS shortage
FROM assets a
LEFT JOIN (
 SELECT asset_id, SUM(quantity) AS total_planned
 FROM booking
 WHERE start_date <= :start_date AND end_date >= :end_date
 GROUP BY asset_id
) b ON a.id = b.asset_id
WHERE a.stock - COALESCE(b.total_planned, 0) < 0
ORDER BY a.id DESC

I need to return only negative shortages.

I want to validate this query based on performances because the real database contains thousands of thousands of records.

Question 2

Rather than showing what you've currently shown for your database structure, you'd be better off showing your actual DDL.

Question 3

This submission is about performance. It does not contain any EXPLAIN PLAN output, nor any timing measurements.

This submission is about SQL. Code Review site guidelines for that tag explain that when requesting SQL reviews you should:

Provide context,
Include the schema,
If asking about performance, include indexes and the output of EXPLAIN SELECT.

I will assume we have PK on id in both tables, FK exploiting the assets primary key, and that's it.

query plan

Looks good.

There's some implicit assumptions which you really ought to write down: assets stores stocking levels current as of "now", and the future booking projections (A.) exclusively cover intervals which start after "now", and (B.) are non-overlapping.

That nested query might conveniently be captured with CREATE VIEW projected_demand AS SELECT ...

I like that an ORDER BY gives us deterministic results. That makes it easy to diff the output of one run against another, or to write reproducible unit tests.

The problem forces us to read all of the (presumably smallish?) booking table. But fortunately we could have a billion assets and this is still a winning query if just a handful of distinct assets are booked -- most asset rows will idly sit, untouched, on disk.

DRY

That repeated stock - COALESCE(...) expression is perhaps slightly unfortunate. If it bugs you, a nested query could tidy it up, leaving us with WHERE shortfall < 0.

I am fond of named queries: VIEWs. Perhaps too fond. Let's see what applying that technique here would look like. I conventionally suffix VIEW names with _v.

CREATE VIEW asset_total_v AS
SELECT asset_id, SUM(quantity) AS total_planned
 FROM booking
 WHERE start_date <= :start_date AND end_date >= :end_date
 GROUP BY asset_id;

CREATE VIEW asset_shortfall_v AS
SELECT a.id, 
 (a.stock - COALESCE(b.total_planned, 0)) AS shortage
FROM assets a
LEFT JOIN asset_total_v b ON a.id = b.asset_id
ORDER BY a.id DESC;

SELECT *
FROM asset_shortfall_v
WHERE shortfall < 0;

On the one hand we managed to DRY up that expression. On the other, the code became slightly longer. Looks like a tossup.

A query that lives on for months, and is edited by maintenance engineers, will tend to pick up additional COALESCE( ... ) and CASE WHEN clauses. It won't take too many of those to make this "name the column shortfall!" approach look winning. I'm just saying that maybe we've not arrived at that point yet; the refactor might be a little premature now.

I haven't tested this -- maybe the :{start,end}_date parameters are trouble here? Usually I'll leave out such filters from a VIEW, and tack them onto the final SELECT at the last minute. Here, they are perhaps nested inconveniently deeply within. If it doesn't work out, you could bury the JOIN portion of all this in a nested subquery, and then a final WHERE shortfall < 0 would definitely work.

Question 4

The second sentence could be modified to start with something along the lines of "bearing in mind the sql tag was recently added its wiki states: Structured Query Language is a language for interacting with relational databases. Read the tag wiki's guidelines for requesting SQL reviews: 1) Provide context, 2) Include the schema, 3) If asking about performance, include indexes and the output of EXPLAIN SELECT."

J_H J_H 41.8k3 gold badges38 silver badges157 bronze badges · Answer 1 · 2024-03-01 15:13:50Z

This submission is about performance. It does not contain any EXPLAIN PLAN output, nor any timing measurements.

This submission is about SQL. Code Review site guidelines for that tag explain that when requesting SQL reviews you should:

Provide context,
Include the schema,
If asking about performance, include indexes and the output of EXPLAIN SELECT.

I will assume we have PK on id in both tables, FK exploiting the assets primary key, and that's it.

query plan

Looks good.

There's some implicit assumptions which you really ought to write down: assets stores stocking levels current as of "now", and the future booking projections (A.) exclusively cover intervals which start after "now", and (B.) are non-overlapping.

That nested query might conveniently be captured with CREATE VIEW projected_demand AS SELECT ...

I like that an ORDER BY gives us deterministic results. That makes it easy to diff the output of one run against another, or to write reproducible unit tests.

The problem forces us to read all of the (presumably smallish?) booking table. But fortunately we could have a billion assets and this is still a winning query if just a handful of distinct assets are booked -- most asset rows will idly sit, untouched, on disk.

DRY

That repeated stock - COALESCE(...) expression is perhaps slightly unfortunate. If it bugs you, a nested query could tidy it up, leaving us with WHERE shortfall < 0.

I am fond of named queries: VIEWs. Perhaps too fond. Let's see what applying that technique here would look like. I conventionally suffix VIEW names with _v.

CREATE VIEW asset_total_v AS
SELECT asset_id, SUM(quantity) AS total_planned
 FROM booking
 WHERE start_date <= :start_date AND end_date >= :end_date
 GROUP BY asset_id;

CREATE VIEW asset_shortfall_v AS
SELECT a.id, 
 (a.stock - COALESCE(b.total_planned, 0)) AS shortage
FROM assets a
LEFT JOIN asset_total_v b ON a.id = b.asset_id
ORDER BY a.id DESC;

SELECT *
FROM asset_shortfall_v
WHERE shortfall < 0;

On the one hand we managed to DRY up that expression. On the other, the code became slightly longer. Looks like a tossup.

A query that lives on for months, and is edited by maintenance engineers, will tend to pick up additional COALESCE( ... ) and CASE WHEN clauses. It won't take too many of those to make this "name the column shortfall!" approach look winning. I'm just saying that maybe we've not arrived at that point yet; the refactor might be a little premature now.

I haven't tested this -- maybe the :{start,end}_date parameters are trouble here? Usually I'll leave out such filters from a VIEW, and tack them onto the final SELECT at the last minute. Here, they are perhaps nested inconveniently deeply within. If it doesn't work out, you could bury the JOIN portion of all this in a nested subquery, and then a final WHERE shortfall < 0 would definitely work.

The second sentence could be modified to start with something along the lines of "bearing in mind the sql tag was recently added its wiki states: Structured Query Language is a language for interacting with relational databases. Read the tag wiki's guidelines for requesting SQL reviews: 1) Provide context, 2) Include the schema, 3) If asking about performance, include indexes and the output of EXPLAIN SELECT."

Stack Exchange Network

MySQL interval search query suggestions

1 Answer 1

query plan

DRY

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

MySQL interval search query suggestions

1 Answer 1

query plan

DRY

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions