Extracting certain records

Question 1

The purpose of this SQL query would be to extract the record 10192 only, because the end_time is both in the past and its foreign counterpart in the Node table still has the status 1. I don't want to select the record 10111 because it hasn't expired.

I've just finished writing this SQL query, it works! Although I'm sure it can be written better and I'm interested to here what you have to say.

SELECT MAX(l.id) AS highestID, l.nid, n.status
 FROM mbl AS l
 INNER JOIN node AS n
 ON l.nid = n.nid
 WHERE n.status = 1
 AND ( SELECT end_time 
 FROM mbl
 WHERE id = ( SELECT MAX(id) 
 FROM mbl
 WHERE nid = l.nid
 GROUP BY nid
 )
 ) <= NOW()
 GROUP BY l.nid

Mlb table data:

id | nid | start_time | end_time
105 | 5 | 20/07/2014 01:00:00 | 24/07/2014 01:00:00
589 | 52 | 20/07/2014 01:00:00 | 31/07/2015 01:32:00
10108 | 874 | 20/07/2014 01:00:00 | 24/07/2014 01:00:00
10111 | 874 | 20/07/2014 01:00:00 | 24/07/2015 01:00:00 
10156 | 967 | 20/07/2012 01:00:00 | 24/07/2013 01:00:00
10161 | 967 | 20/07/2013 01:00:00 | 20/07/2014 01:00:00
10192 | 967 | 20/07/2014 01:00:00 | 24/07/2014 01:00:00

Node table data:

nid | status
5 | 0
52 | 1
874 | 1
967 | 1

Question 2

Welcome to Code Review!

Question 3

And the results you want are? What does this actually tell you? What is id referencing? Remember, auto-generated ids are essentially a shortcut/convenience; you're almost always going to have some collection of data that functions as a unique value anyways (...which might be temporal or otherwise updateable, so they're really handy, but in most cases are essentially derived connectors). Are you trying to get the most recently inserted row (id tricks are hacky for this)? The most recently completed row (max end_time in the past)? The most recently started row? Etc...

Question 4

I want the most recent row where an end_time is in the past and the status is 1 in the Node table.

Question 5

First, let's go over some problems in your original query:

SELECT MAX(l.id) AS highestID, l.nid, n.status
..............
GROUP BY l.nid

... if there's more than one n.status for a given l.nid, you get a "random" row, decided by the optimizer (and not controllable by you). I'm presuming that you only have one row here (we'll exploit this in a bit).

(SELECT end_time 
 FROM mbl
 WHERE id = (SELECT MAX(id) 
 FROM mbl
 WHERE nid = l.nid
 GROUP BY nid)) <= NOW()

I'm presuming that id is still some sort of autogenerated value. The problem is, outside of some very specific contexts, the only thing a db-generated id should be used for is its uniqueness. Especially in situations where rows may be updated, max/min/ordering by ids rapidly becomes irrelevant. You seem to be attempting to make sure the most-recently-added row has an end_time value in the past - if you have some sort of inserted_at or similar timestamp, that would be a better column to use. It would also be helpful to know why you're looking for the greatest id in this case. (However, as I don't know the other columns in you table, I'm stuck with it for the moment).

It turns out that your statement can be reduced to this:

SELECT Most_Recent.highestId, Most_Recent.nid, Node.status
FROM (SELECT nid, MAX(id) AS highestId
 FROM Mbl
 GROUP BY nid) Most_Recent
JOIN Node
 ON Node.nid = Most_Recent.nid
JOIN Mbl
 ON Mbl.nid = Most_Recent.nid
 AND Mbl.id = Most_Recent.highestId
 AND Mbl.end_time <= NOW()

.... which not only eliminates a table reference, but your original version had issues with potential RBAR (row-by-agonizing-row)/Cartesian product plans. Note that we're still required to join to Mbl twice, with the second time being solely to make sure that the most-recent row also has an end_time value in the past (this can't be done in the subquery, because that would collect the most-recent-id of rows with end_time in the past)

Question 6

The Mbl table has the columns id, nid, start_time and end_time. The Mbl table may have multiple rows with the same nid but they may have expired (the end_time is in the past). I'm joining the Node table to access the status column because I'm only interested in the rows with a status of 1. I don't have an inserted_at column in any of my tables.

Question 7

@thebubbles - then perhaps we should get some sample starting data and desired results? I have a feeling your query isn't matching your expectations, or what you really want/need. For instance, you could add AND Node.status = 1 as a condition to the join...

Question 8

I've added more detail to my question.

Question 9

First thing's first. The indentation is all jacked up. There's not really a standard for formatting SQL, but as a rule of thumb, Select, From, and Where for any (sub)query should be at the same indentation level.

SELECT MAX(l.id) AS highestID, l.nid, n.status
FROM mbl AS l
INNER JOIN node AS n
 ON l.nid = n.nid
WHERE n.status = 1
 AND (
 SELECT end_time 
 FROM mbl
 WHERE id = (
 SELECT MAX(id) 
 FROM mbl
 WHERE nid = l.nid
 GROUP BY nid
 )
 ) <= NOW()
GROUP BY l.nid

Question 10

Thanks, I just copied and pasted it from my code. It's cleaned up as per your answer!

Question 11

And that's exactly the right thing to do here @thebubbles. We want to see your real code. We make bad code better and good code great. =)

Question 12

FROM, WHERE, and GROUP BY should definitely be at the same indentation level. Personally, I would choose to indent INNER JOIN relative to that, and outdent SELECT.

Question 13

Aliases

This may seem like nitpicking, but since you already got two good answers, I figured I would point out this only.

FROM mbl AS l
INNER JOIN node AS n

l and n mean nothing of themselves. Think of Mr. Maintainer looking back at your script, scratching his head. It may not seem like a big deal in this small of a query, but I have seen this sort of notation used in large scripts and it becomes very confusing. With these table names being only 3-4 letters long, I don't think an alias is even relevant.

Question 14

I completely agree with you. I've altered a few things in the SQL to preserve privacy of the project it's being used in. The original naming conventions may allow someone to match it to the project it's being used on.

Clockwork-Muse Clockwork-Muse 1,55510 silver badges10 bronze badges · Answer 1 · 2014-07-31 12:50:26Z

First, let's go over some problems in your original query:

SELECT MAX(l.id) AS highestID, l.nid, n.status
..............
GROUP BY l.nid

... if there's more than one n.status for a given l.nid, you get a "random" row, decided by the optimizer (and not controllable by you). I'm presuming that you only have one row here (we'll exploit this in a bit).

(SELECT end_time 
 FROM mbl
 WHERE id = (SELECT MAX(id) 
 FROM mbl
 WHERE nid = l.nid
 GROUP BY nid)) <= NOW()

I'm presuming that id is still some sort of autogenerated value. The problem is, outside of some very specific contexts, the only thing a db-generated id should be used for is its uniqueness. Especially in situations where rows may be updated, max/min/ordering by ids rapidly becomes irrelevant. You seem to be attempting to make sure the most-recently-added row has an end_time value in the past - if you have some sort of inserted_at or similar timestamp, that would be a better column to use. It would also be helpful to know why you're looking for the greatest id in this case. (However, as I don't know the other columns in you table, I'm stuck with it for the moment).

It turns out that your statement can be reduced to this:

SELECT Most_Recent.highestId, Most_Recent.nid, Node.status
FROM (SELECT nid, MAX(id) AS highestId
 FROM Mbl
 GROUP BY nid) Most_Recent
JOIN Node
 ON Node.nid = Most_Recent.nid
JOIN Mbl
 ON Mbl.nid = Most_Recent.nid
 AND Mbl.id = Most_Recent.highestId
 AND Mbl.end_time <= NOW()

.... which not only eliminates a table reference, but your original version had issues with potential RBAR (row-by-agonizing-row)/Cartesian product plans. Note that we're still required to join to Mbl twice, with the second time being solely to make sure that the most-recent row also has an end_time value in the past (this can't be done in the subquery, because that would collect the most-recent-id of rows with end_time in the past)

The Mbl table has the columns id, nid, start_time and end_time. The Mbl table may have multiple rows with the same nid but they may have expired (the end_time is in the past). I'm joining the Node table to access the status column because I'm only interested in the rows with a status of 1. I don't have an inserted_at column in any of my tables.
@thebubbles - then perhaps we should get some sample starting data and desired results? I have a feeling your query isn't matching your expectations, or what you really want/need. For instance, you could add AND Node.status = 1 as a condition to the join...

RubberDuck RubberDuck 31.1k6 gold badges73 silver badges176 bronze badges · Answer 2 · 2014-07-31 11:52:25Z

4

\$\begingroup\$

First thing's first. The indentation is all jacked up. There's not really a standard for formatting SQL, but as a rule of thumb, Select, From, and Where for any (sub)query should be at the same indentation level.

SELECT MAX(l.id) AS highestID, l.nid, n.status
FROM mbl AS l
INNER JOIN node AS n
 ON l.nid = n.nid
WHERE n.status = 1
 AND (
 SELECT end_time 
 FROM mbl
 WHERE id = (
 SELECT MAX(id) 
 FROM mbl
 WHERE nid = l.nid
 GROUP BY nid
 )
 ) <= NOW()
GROUP BY l.nid

Share

edited Jul 31, 2014 at 12:45

answered Jul 31, 2014 at 11:52

RubberDuck's user avatar

RubberDuck RubberDuck

31.1k6 gold badges73 silver badges176 bronze badges

\$\endgroup\$

3

1

\$\begingroup\$ Thanks, I just copied and pasted it from my code. It's cleaned up as per your answer! \$\endgroup\$

thebubbles
– thebubbles

2014年07月31日 15:21:16 +00:00
Commented Jul 31, 2014 at 15:21
\$\begingroup\$ And that's exactly the right thing to do here @thebubbles. We want to see your real code. We make bad code better and good code great. =) \$\endgroup\$

RubberDuck
– RubberDuck

2014年07月31日 15:28:31 +00:00
Commented Jul 31, 2014 at 15:28
\$\begingroup\$ FROM, WHERE, and GROUP BY should definitely be at the same indentation level. Personally, I would choose to indent INNER JOIN relative to that, and outdent SELECT. \$\endgroup\$

200_success
– 200_success

2014年07月31日 19:16:02 +00:00
Commented Jul 31, 2014 at 19:16

Add a comment |

Phrancis Phrancis 20.5k6 gold badges69 silver badges155 bronze badges · Answer 3 · 2014-07-31 15:48:05Z

Aliases

This may seem like nitpicking, but since you already got two good answers, I figured I would point out this only.

FROM mbl AS l
INNER JOIN node AS n

l and n mean nothing of themselves. Think of Mr. Maintainer looking back at your script, scratching his head. It may not seem like a big deal in this small of a query, but I have seen this sort of notation used in large scripts and it becomes very confusing. With these table names being only 3-4 letters long, I don't think an alias is even relevant.

I completely agree with you. I've altered a few things in the SQL to preserve privacy of the project it's being used in. The original naming conventions may allow someone to match it to the project it's being used on.

Stack Exchange Network

Extracting certain records

Mlb table data:

Node table data:

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Extracting certain records

Mlb table data:

Node table data:

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions