My database structure is:
assets
- id (int)
- name (text)
- stock (int)
booking:
- id (int)
- asset_id (int) Refers to the assets table
- quantity (int) How many assets are planned in this timeframe
- start_date (datetime)
- end_date (datetime)
I need to find the amount of assets that are short within the given timeframe.
I wrote the following query:
SELECT a.id,
(a.stock - COALESCE(b.total_planned, 0)) AS shortage
FROM assets a
LEFT JOIN (
SELECT asset_id, SUM(quantity) AS total_planned
FROM booking
WHERE start_date <= :start_date AND end_date >= :end_date
GROUP BY asset_id
) b ON a.id = b.asset_id
WHERE a.stock - COALESCE(b.total_planned, 0) < 0
ORDER BY a.id DESC
I need to return only negative shortages.
I want to validate this query based on performances because the real database contains thousands of thousands of records.
-
\$\begingroup\$ Rather than showing what you've currently shown for your database structure, you'd be better off showing your actual DDL. \$\endgroup\$Reinderien– Reinderien2024年03月04日 13:05:35 +00:00Commented Mar 4, 2024 at 13:05
1 Answer 1
This submission is about performance. It does not contain any EXPLAIN PLAN output, nor any timing measurements.
This submission is about SQL. Code Review site guidelines for that tag explain that when requesting SQL reviews you should:
- Provide context,
- Include the schema,
- If asking about performance, include indexes and the output of EXPLAIN SELECT.
I will assume we have
PK
on id
in both tables,
FK
exploiting the assets
primary key, and that's it.
query plan
Looks good.
There's some implicit assumptions which you really ought to write down:
assets
stores stocking levels current as of "now",
and the future booking
projections (A.) exclusively cover
intervals which start after "now", and (B.) are non-overlapping.
That nested query might conveniently be captured with CREATE VIEW projected_demand AS SELECT ...
I like that an ORDER BY gives us deterministic results.
That makes it easy to diff
the output of one run against another,
or to write reproducible unit tests.
The problem forces us to read all of the (presumably smallish?)
booking
table.
But fortunately we could have a billion assets
and
this is still a winning query if just a handful of distinct
assets are booked -- most asset rows will idly sit, untouched, on disk.
DRY
That
repeated
stock - COALESCE(...)
expression is perhaps slightly unfortunate.
If it bugs you, a nested query could tidy it up,
leaving us with WHERE shortfall < 0
.
I am fond of named queries: VIEWs.
Perhaps too fond.
Let's see what applying that technique here would look like.
I conventionally suffix VIEW names with _v
.
CREATE VIEW asset_total_v AS
SELECT asset_id, SUM(quantity) AS total_planned
FROM booking
WHERE start_date <= :start_date AND end_date >= :end_date
GROUP BY asset_id;
CREATE VIEW asset_shortfall_v AS
SELECT a.id,
(a.stock - COALESCE(b.total_planned, 0)) AS shortage
FROM assets a
LEFT JOIN asset_total_v b ON a.id = b.asset_id
ORDER BY a.id DESC;
SELECT *
FROM asset_shortfall_v
WHERE shortfall < 0;
On the one hand we managed to DRY up that expression. On the other, the code became slightly longer. Looks like a tossup.
A query that lives on for months,
and is edited by maintenance engineers,
will tend to pick up additional COALESCE( ... )
and CASE WHEN clauses.
It won't take too many of those
to make this "name the column shortfall
!" approach
look winning. I'm just saying that maybe we've not
arrived at that point yet; the refactor might
be a little premature now.
I haven't tested this -- maybe the :{start,end}_date
parameters are trouble here?
Usually I'll leave out such filters from a VIEW,
and tack them onto the final SELECT at the last minute.
Here, they are perhaps nested inconveniently deeply within.
If it doesn't work out, you could
bury the JOIN portion of all this in a nested subquery,
and then a final WHERE shortfall < 0
would definitely work.
-
\$\begingroup\$ The second sentence could be modified to start with something along the lines of "bearing in mind the sql tag was recently added its wiki states: Structured Query Language is a language for interacting with relational databases. Read the tag wiki's guidelines for requesting SQL reviews: 1) Provide context, 2) Include the schema, 3) If asking about performance, include indexes and the output of EXPLAIN SELECT." \$\endgroup\$2024年03月03日 07:49:49 +00:00Commented Mar 3, 2024 at 7:49