Query with multiple foreign keys to the same table

Question 1

I have a primary table, REP_ASSIST.ES_LOG with multiple columns that are foreign keys to another table, REP_ASSIST.ES_DOCUMENT_STATUSES. Here is an example of what my main table looks like:

+-------------------------+-----------+-------------------------+-----------------------+
| created_at | filing_id | prior_dd_rcvd_status_id | new_dd_rcvd_status_id |
+-------------------------+-----------+-------------------------+-----------------------+
| 2019年09月04日 10:58:48.000 | 988 | 2 | 2 |
| 2019年09月04日 10:47:03.000 | 988 | 1 | 2 |
| 2019年08月28日 23:56:47.000 | 988 | null | 1 |
+-------------------------+-----------+-------------------------+-----------------------+

Both the prior_dd_rcvd_status_id and new_dd_rcvd_status_id are foreign keys to the other table, a sample of which is here:

+------------------+--------------------+
| dd_doc_status_id | dd_doc_status_name |
+------------------+--------------------+
| 1 | RECEIVED |
| 2 | MISSING |
| 3 | NOT_NEEDED |
| 4 | UNKNOWN |
+------------------+--------------------+

I need to pull in the values from the lookup table for a printable report; right now I'm using a subquery to get the dd_doc_status_name as a user-friendly name instead of the dd_doc_status_id.

SELECT created_at,
 filing_id,
 (
 SELECT dd_doc_status_name
 FROM REP_ASSIST.ES_DOCUMENT_STATUSES
 WHERE ES_DOCUMENT_STATUSES.dd_doc_status_id = prior_dd_rcvd_status_id
 ) AS 'prior_dd_status',
 (
 SELECT dd_doc_status_name
 FROM REP_ASSIST.ES_DOCUMENT_STATUSES
 WHERE ES_DOCUMENT_STATUSES.dd_doc_status_id = new_dd_rcvd_status_id
 ) AS 'new_dd_status'
FROM REP_ASSIST.ES_LOG
WHERE filing_id = 988;

However, I'm concerned that using subqueries like this may be a bit of a hack and affect performance (especially if I end up adding additional columns that need this information).

I'm getting the correct results (see below), but I'd like to know if there is a more efficient and more standard way.

+-------------------------+-----------+----------------------+--------------------+
| created_at | filing_id | prior_dd_rcvd_status | new_dd_rcvd_status |
+-------------------------+-----------+----------------------+--------------------+
| 2019年09月04日 10:58:48.000 | 988 | MISSING | MISSING |
| 2019年09月04日 10:47:03.000 | 988 | RECEIVED | MISSING |
| 2019年08月28日 23:56:47.000 | 988 | null | RECEIVED |
+-------------------------+-----------+----------------------+--------------------+

Question 2

This is SQL will not work on MySQL as the columns in the subqueries are unknown. Regarding the suggestions, just 2 left joins on your ES_DOCUMENT_STATUSES table instead of the subqueries will be fine, I don't know about sql-server so forgive me from putting a complete answer.

Question 3

Here is that fiddle for sql server: sqlfiddle.com/#!18/bdf74/5

Question 4

@Accountantم the query in question does not work, OP should edit his field names. I'm voting to close this question for not working SQL.

Question 5

When you edit your question, please update your title to describe the purpose of the code, rather than the mechanism. We really need to understand the motivational context to give good reviews. Thanks!

Question 6

@dfhwze - Thank you for correcting the fiddle for SQL Server.

Question 7

I've updated the Fiddle with your initial query and 2 alternatives.

This is the initial query plan:

enter image description here

Using OUTER APPLY instead of the nested SELECT statements.

select created_at, prior_dd_rcvd_status_id, 
 new_dd_rcvd_status_id, prior_dd_status, new_dd_status, filing_id
from ES_LOG
outer apply (
 select dd_doc_status_name as prior_dd_status 
 from ES_DOCUMENT_STATUSES where dd_doc_status_id = prior_dd_rcvd_status_id
) a
outer apply (
 select dd_doc_status_name as new_dd_status 
 from ES_DOCUMENT_STATUSES where dd_doc_status_id = new_dd_rcvd_status_id
) b
where filing_id = 988;

The query plan gets simplified:

enter image description here

And using LEFT OUTER JOIN instead of the OUTER APPLY.

select created_at, prior_dd_rcvd_status_id, 
 new_dd_rcvd_status_id, 
 a.dd_doc_status_name as prior_dd_status, 
 b.dd_doc_status_name as new_dd_status, 
 filing_id
from ES_LOG
left outer join ES_DOCUMENT_STATUSES a 
 on a.dd_doc_status_id = prior_dd_rcvd_status_id
left outer join ES_DOCUMENT_STATUSES b 
 on b.dd_doc_status_id = new_dd_rcvd_status_id
where filing_id = 988;

With an even more simplified query plan:

enter image description here

However, as you can see, the table scans take most of the time. So I'm not sure how much performance you could gain by picking either of the alternatives.

Further improvements require the use of indexes (Table Scan vs Index Scan). Deciding which index to provide depends not only on this query, but also the general design of how these tables will be used for other queries and CRUD operations. I would advise you learn about them.

Question 8

Great answer! One thing that is probably worth calling out is that with the volumes of these tables, SQL Server is pretty much guaranteed to nested loop and not try to do anything fancy with them. If production table sizes are different then this might be very different, and is at least a grain of salt with looking at the plans.

Question 9

However, I'm concerned that using subqueries like this may be a bit of a hack and affect performance (especially if I end up adding additional columns that need this information).

Whenever you find yourself including a subquery as part of your SELECT list, you're right to question whether or not you're doing the correct thing. These can often suffer from performance issues, and are much harder to read and troubleshoot.

I wouldn't worry as much about the second half of your question:

especially if I end up adding additional columns that need this information

Unless you know that you're likely to do this in the future, you're likely micro- and pre-optimizing.

That being said, ultimately your solution is very straightforward. All you want is a join from table A to table B on two different columns. You might recognize that what you actually need is two joins from table A to table B. That would just look like this:

SELECT created_at,
 filing_id,
 PriorStatuses.dd_doc_status_name prior_dd_status,
 NewStatuses.dd_doc_status_name new_dd_status
 FROM REP_ASSIST.ES_LOG
 LEFT OUTER JOIN REP_ASSIST.ES_DOCUMENT_STATUSES PriorStatuses
 ON ES_LOG.prior_dd_rcvd_status_id = PriorStatuses.dd_doc_status_id
 LEFT OUTER JOIN REP_ASSIST.ES_DOCUMENT_STATUSES NewStatuses
 ON ES_LOG.new_dd_rcvd_status_id = NewStatuses.dd_doc_status_id
 WHERE filing_id = 988;

I did my best to come up with meaningful names based on context, but you should just pick some domain-relevant alias as appropriate.

If you do find yourself with a really large number of these columns, you might want to consider a new table design. For example, it might then be easier to have a table like this:

CREATE TABLE REP_ASSIST.ES_LOG_STATUSES
(
 filing_id bigint NOT NULL,
 doc_status_type nvarchar(50) NOT NULL,
 status_id bigint NOT NULL
);

Then you can add new status types (or whatever the appropriate domain terminology would be) ad-hoc to this table, and your query then becomes something like this

SELECT ES_LOG.created_at,
 ES_LOG.filing_id,
 ES_LOG_STATUSES.doc_status_type,
 ES_DOCUMENT_STATUSES.dd_doc_status_name
 FROM REP_ASSIST.ES_LOG
 INNER JOIN REP_ASSIST.ES_LOG_STATUSES
 ON ES_LOG.filing_id = ES_LOG_STATUSES.filing_id
 LEFT OUTER JOIN REP_ASSIST.ES_DOCUMENT_STATUSES 
 ON ES_LOG_STATUSES.status_id = ES_DOCUMENT_STATUSES.status_id
 WHERE ES_LOG.filing_id = 988;

If its important to have the output be one row per filing_id instead of one row per filing_id and doc_status_type combination, you could always PIVOT your result set. If the set of possible status types is unknown or changes rapidly, there are plenty of ways to do that dynamically

dfhwze dfhwze 14.1k3 gold badges40 silver badges101 bronze badges · Answer 1 · 2019-09-06 14:28:57Z

I've updated the Fiddle with your initial query and 2 alternatives.

This is the initial query plan:

enter image description here

Using OUTER APPLY instead of the nested SELECT statements.

select created_at, prior_dd_rcvd_status_id, 
 new_dd_rcvd_status_id, prior_dd_status, new_dd_status, filing_id
from ES_LOG
outer apply (
 select dd_doc_status_name as prior_dd_status 
 from ES_DOCUMENT_STATUSES where dd_doc_status_id = prior_dd_rcvd_status_id
) a
outer apply (
 select dd_doc_status_name as new_dd_status 
 from ES_DOCUMENT_STATUSES where dd_doc_status_id = new_dd_rcvd_status_id
) b
where filing_id = 988;

The query plan gets simplified:

enter image description here

And using LEFT OUTER JOIN instead of the OUTER APPLY.

select created_at, prior_dd_rcvd_status_id, 
 new_dd_rcvd_status_id, 
 a.dd_doc_status_name as prior_dd_status, 
 b.dd_doc_status_name as new_dd_status, 
 filing_id
from ES_LOG
left outer join ES_DOCUMENT_STATUSES a 
 on a.dd_doc_status_id = prior_dd_rcvd_status_id
left outer join ES_DOCUMENT_STATUSES b 
 on b.dd_doc_status_id = new_dd_rcvd_status_id
where filing_id = 988;

With an even more simplified query plan:

enter image description here

However, as you can see, the table scans take most of the time. So I'm not sure how much performance you could gain by picking either of the alternatives.

Further improvements require the use of indexes (Table Scan vs Index Scan). Deciding which index to provide depends not only on this query, but also the general design of how these tables will be used for other queries and CRUD operations. I would advise you learn about them.

Great answer! One thing that is probably worth calling out is that with the volumes of these tables, SQL Server is pretty much guaranteed to nested loop and not try to do anything fancy with them. If production table sizes are different then this might be very different, and is at least a grain of salt with looking at the plans.

Dan Oberlam Dan Oberlam 8,0492 gold badges33 silver badges74 bronze badges · Answer 2 · 2019-09-06 14:29:30Z

However, I'm concerned that using subqueries like this may be a bit of a hack and affect performance (especially if I end up adding additional columns that need this information).

Whenever you find yourself including a subquery as part of your SELECT list, you're right to question whether or not you're doing the correct thing. These can often suffer from performance issues, and are much harder to read and troubleshoot.

I wouldn't worry as much about the second half of your question:

especially if I end up adding additional columns that need this information

Unless you know that you're likely to do this in the future, you're likely micro- and pre-optimizing.

That being said, ultimately your solution is very straightforward. All you want is a join from table A to table B on two different columns. You might recognize that what you actually need is two joins from table A to table B. That would just look like this:

SELECT created_at,
 filing_id,
 PriorStatuses.dd_doc_status_name prior_dd_status,
 NewStatuses.dd_doc_status_name new_dd_status
 FROM REP_ASSIST.ES_LOG
 LEFT OUTER JOIN REP_ASSIST.ES_DOCUMENT_STATUSES PriorStatuses
 ON ES_LOG.prior_dd_rcvd_status_id = PriorStatuses.dd_doc_status_id
 LEFT OUTER JOIN REP_ASSIST.ES_DOCUMENT_STATUSES NewStatuses
 ON ES_LOG.new_dd_rcvd_status_id = NewStatuses.dd_doc_status_id
 WHERE filing_id = 988;

I did my best to come up with meaningful names based on context, but you should just pick some domain-relevant alias as appropriate.

If you do find yourself with a really large number of these columns, you might want to consider a new table design. For example, it might then be easier to have a table like this:

CREATE TABLE REP_ASSIST.ES_LOG_STATUSES
(
 filing_id bigint NOT NULL,
 doc_status_type nvarchar(50) NOT NULL,
 status_id bigint NOT NULL
);

Then you can add new status types (or whatever the appropriate domain terminology would be) ad-hoc to this table, and your query then becomes something like this

SELECT ES_LOG.created_at,
 ES_LOG.filing_id,
 ES_LOG_STATUSES.doc_status_type,
 ES_DOCUMENT_STATUSES.dd_doc_status_name
 FROM REP_ASSIST.ES_LOG
 INNER JOIN REP_ASSIST.ES_LOG_STATUSES
 ON ES_LOG.filing_id = ES_LOG_STATUSES.filing_id
 LEFT OUTER JOIN REP_ASSIST.ES_DOCUMENT_STATUSES 
 ON ES_LOG_STATUSES.status_id = ES_DOCUMENT_STATUSES.status_id
 WHERE ES_LOG.filing_id = 988;

If its important to have the output be one row per filing_id instead of one row per filing_id and doc_status_type combination, you could always PIVOT your result set. If the set of possible status types is unknown or changes rapidly, there are plenty of ways to do that dynamically

Stack Exchange Network

Query with multiple foreign keys to the same table

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Query with multiple foreign keys to the same table

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions