The purpose of this SQL query would be to extract the record 10192 only, because the end_time
is both in the past and its foreign counterpart in the Node table still has the status
1. I don't want to select the record 10111 because it hasn't expired.
I've just finished writing this SQL query, it works! Although I'm sure it can be written better and I'm interested to here what you have to say.
SELECT MAX(l.id) AS highestID, l.nid, n.status
FROM mbl AS l
INNER JOIN node AS n
ON l.nid = n.nid
WHERE n.status = 1
AND ( SELECT end_time
FROM mbl
WHERE id = ( SELECT MAX(id)
FROM mbl
WHERE nid = l.nid
GROUP BY nid
)
) <= NOW()
GROUP BY l.nid
Mlb table data:
id | nid | start_time | end_time 105 | 5 | 20/07/2014 01:00:00 | 24/07/2014 01:00:00 589 | 52 | 20/07/2014 01:00:00 | 31/07/2015 01:32:00 10108 | 874 | 20/07/2014 01:00:00 | 24/07/2014 01:00:00 10111 | 874 | 20/07/2014 01:00:00 | 24/07/2015 01:00:00 10156 | 967 | 20/07/2012 01:00:00 | 24/07/2013 01:00:00 10161 | 967 | 20/07/2013 01:00:00 | 20/07/2014 01:00:00 10192 | 967 | 20/07/2014 01:00:00 | 24/07/2014 01:00:00
Node table data:
nid | status 5 | 0 52 | 1 874 | 1 967 | 1
3 Answers 3
First, let's go over some problems in your original query:
SELECT MAX(l.id) AS highestID, l.nid, n.status
..............
GROUP BY l.nid
... if there's more than one n.status
for a given l.nid
, you get a "random" row, decided by the optimizer (and not controllable by you). I'm presuming that you only have one row here (we'll exploit this in a bit).
(SELECT end_time
FROM mbl
WHERE id = (SELECT MAX(id)
FROM mbl
WHERE nid = l.nid
GROUP BY nid)) <= NOW()
I'm presuming that id
is still some sort of autogenerated value. The problem is, outside of some very specific contexts, the only thing a db-generated id should be used for is its uniqueness. Especially in situations where rows may be updated, max/min/ordering by ids rapidly becomes irrelevant. You seem to be attempting to make sure the most-recently-added row has an end_time
value in the past - if you have some sort of inserted_at
or similar timestamp, that would be a better column to use. It would also be helpful to know why you're looking for the greatest id in this case. (However, as I don't know the other columns in you table, I'm stuck with it for the moment).
It turns out that your statement can be reduced to this:
SELECT Most_Recent.highestId, Most_Recent.nid, Node.status
FROM (SELECT nid, MAX(id) AS highestId
FROM Mbl
GROUP BY nid) Most_Recent
JOIN Node
ON Node.nid = Most_Recent.nid
JOIN Mbl
ON Mbl.nid = Most_Recent.nid
AND Mbl.id = Most_Recent.highestId
AND Mbl.end_time <= NOW()
.... which not only eliminates a table reference, but your original version had issues with potential RBAR (row-by-agonizing-row)/Cartesian product plans. Note that we're still required to join to Mbl
twice, with the second time being solely to make sure that the most-recent row also has an end_time
value in the past (this can't be done in the subquery, because that would collect the most-recent-id of rows with end_time
in the past)
-
\$\begingroup\$ The Mbl table has the columns
id
,nid
,start_time
andend_time
. The Mbl table may have multiple rows with the samenid
but they may have expired (theend_time
is in the past). I'm joining the Node table to access thestatus
column because I'm only interested in the rows with astatus
of 1. I don't have aninserted_at
column in any of my tables. \$\endgroup\$thebubbles– thebubbles2014年07月31日 13:51:44 +00:00Commented Jul 31, 2014 at 13:51 -
2\$\begingroup\$ @thebubbles - then perhaps we should get some sample starting data and desired results? I have a feeling your query isn't matching your expectations, or what you really want/need. For instance, you could add
AND Node.status = 1
as a condition to the join... \$\endgroup\$Clockwork-Muse– Clockwork-Muse2014年07月31日 14:00:52 +00:00Commented Jul 31, 2014 at 14:00 -
\$\begingroup\$ I've added more detail to my question. \$\endgroup\$thebubbles– thebubbles2014年07月31日 15:19:13 +00:00Commented Jul 31, 2014 at 15:19
First thing's first. The indentation is all jacked up. There's not really a standard for formatting SQL, but as a rule of thumb, Select
, From
, and Where
for any (sub)query should be at the same indentation level.
SELECT MAX(l.id) AS highestID, l.nid, n.status
FROM mbl AS l
INNER JOIN node AS n
ON l.nid = n.nid
WHERE n.status = 1
AND (
SELECT end_time
FROM mbl
WHERE id = (
SELECT MAX(id)
FROM mbl
WHERE nid = l.nid
GROUP BY nid
)
) <= NOW()
GROUP BY l.nid
-
1\$\begingroup\$ Thanks, I just copied and pasted it from my code. It's cleaned up as per your answer! \$\endgroup\$thebubbles– thebubbles2014年07月31日 15:21:16 +00:00Commented Jul 31, 2014 at 15:21
-
\$\begingroup\$ And that's exactly the right thing to do here @thebubbles. We want to see your real code. We make bad code better and good code great. =) \$\endgroup\$RubberDuck– RubberDuck2014年07月31日 15:28:31 +00:00Commented Jul 31, 2014 at 15:28
-
\$\begingroup\$
FROM
,WHERE
, andGROUP BY
should definitely be at the same indentation level. Personally, I would choose to indentINNER JOIN
relative to that, and outdentSELECT
. \$\endgroup\$200_success– 200_success2014年07月31日 19:16:02 +00:00Commented Jul 31, 2014 at 19:16
Aliases
This may seem like nitpicking, but since you already got two good answers, I figured I would point out this only.
FROM mbl AS l
INNER JOIN node AS n
l
and n
mean nothing of themselves. Think of Mr. Maintainer looking back at your script, scratching his head. It may not seem like a big deal in this small of a query, but I have seen this sort of notation used in large scripts and it becomes very confusing. With these table names being only 3-4 letters long, I don't think an alias is even relevant.
-
1\$\begingroup\$ I completely agree with you. I've altered a few things in the SQL to preserve privacy of the project it's being used in. The original naming conventions may allow someone to match it to the project it's being used on. \$\endgroup\$thebubbles– thebubbles2014年07月31日 15:55:09 +00:00Commented Jul 31, 2014 at 15:55
id
referencing? Remember, auto-generated ids are essentially a shortcut/convenience; you're almost always going to have some collection of data that functions as a unique value anyways (...which might be temporal or otherwise updateable, so they're really handy, but in most cases are essentially derived connectors). Are you trying to get the most recently inserted row (id tricks are hacky for this)? The most recently completed row (maxend_time
in the past)? The most recently started row? Etc... \$\endgroup\$end_time
is in the past and thestatus
is 1 in the Node table. \$\endgroup\$