I have a complex date-and-time-matching query in the middle of another huge query. I'm matching two tables, the second one is LEFT JOIN
ed to the first. If the time span is equal or greater in the second table, it should use those values. If the second table doesn't match, it should use the values from the first table.
With the provided test data, the values are returned from the following tables [1: X, 2: Y, 3: Y, 4: X].
The first query has the same logic repeated again (in the nested SELECT
). If it needs to change, the programmer has to make the same change in both places or the results could be wrong. I also feel it's not obvious what all that logic is doing.
The second query uses two CROSS APPLY
s to accomplish the same result. The first CROSS APPLY
is just for the condition (which of the two tables to select from) and the second CROSS APPLY
returns the values from the chosen table.
I like the separation of the second query because I think it's much clearer what the query is supposed to do, and it also runs faster! When the temporary tables already exist, the execution plan shows 82% for the first query and 18% for the second query.
What I don't like is that I have to repeat CASE [condition].[TableToUse] ... END
for each column I want to return, and if I knew a way to simplify it, I would.
What else can I do to improve this query in terms of comprehension, future maintenance and speed?
Create test data:
IF OBJECT_ID('tempdb..#X') IS NULL
BEGIN
CREATE TABLE #X(
[MatchID] INT
,[StartDateTime] DATETIME
,[EndDateTime] DATETIME
,[Value] INT
)
INSERT INTO #X
VALUES (1, '2017-07-01 12:00:00', '2017-07-01 13:00:00', 11)
,(2, '2017-07-01 12:00:00', '2017-07-01 13:00:00', 13)
,(3, '2017-07-01 12:00:00', '2017-07-01 13:00:00', 17)
,(4, '2017-07-01 12:00:00', '2017-07-01 13:00:00', 19)
END
IF OBJECT_ID('tempdb..#Y') IS NULL
BEGIN
CREATE TABLE #Y(
[MatchID] INT
,[StartDateTime] DATETIME
,[EndDateTime] DATETIME
,[Comment] VARCHAR(MAX)
)
INSERT INTO #Y
VALUES (1, '2017-07-01 12:00:00', '2017-07-01 12:30:00', 'Foo')
,(2, '2017-07-01 12:00:00', '2017-07-01 14:00:00', 'Bar')
,(3, '2017-07-01 13:00:00', '2017-07-01 14:00:00', 'Baz')
END
First query:
SELECT [x].[MatchID]
,[result].[StartDateTime]
,[result].[EndDateTime]
,'2017-07-01 13:00:00' - [result].[StartDateTime] AS [TimeBefore1300]
,[result].[EndDateTime] - '2017-07-01 13:00:00' AS [TimeAfter1300]
,[x].[Value]
,[y].[Comment]
FROM #X [x]
LEFT JOIN #Y [y]
ON [y].[MatchID] = [x].[MatchID]
JOIN (
SELECT [x].[MatchID]
,CASE WHEN [y].[MatchID] IS NULL
THEN [x].[StartDateTime]
ELSE
CASE WHEN [x].[EndDateTime] - [x].[StartDateTime]
> [y].[EndDateTime] - [y].[StartDateTime]
THEN [x].[StartDateTime]
ELSE [y].[StartDateTime]
END
END AS [StartDateTime]
,CASE WHEN [y].[MatchID] IS NULL
THEN [x].[EndDateTime]
ELSE
CASE WHEN [x].[EndDateTime] - [x].[StartDateTime]
> [y].[EndDateTime] - [y].[StartDateTime]
THEN [x].[EndDateTime]
ELSE [y].[EndDateTime]
END
END AS [EndDateTime]
FROM #X [x]
LEFT JOIN #Y [y]
ON [y].[MatchID] = [x].[MatchID]
) [result]
ON [result].[MatchID] = [x].[MatchID]
Second query:
SELECT [x].[MatchID]
,[result].[StartDateTime]
,[result].[EndDateTime]
,'2017-07-01 13:00:00' - [result].[StartDateTime] AS [TimeBefore1300]
,[result].[EndDateTime] - '2017-07-01 13:00:00' AS [TimeAfter1300]
,[x].[Value]
,[y].[Comment]
FROM #X [x]
LEFT JOIN #Y [y]
ON [y].[MatchID] = [x].[MatchID]
CROSS APPLY (
SELECT CASE WHEN [y].[MatchID] IS NULL
THEN 'x'
ELSE
CASE WHEN [x].[EndDateTime] - [x].[StartDateTime]
> [y].[EndDateTime] - [y].[StartDateTime]
THEN 'x'
ELSE 'y'
END
END AS [TableToUse]
) [condition]
CROSS APPLY (
SELECT CASE [condition].[TableToUse]
WHEN 'x' THEN [x].[StartDateTime]
WHEN 'y' THEN [y].[StartDateTime]
END AS [StartDateTime]
,CASE [condition].[TableToUse]
WHEN 'x' THEN [x].[EndDateTime]
WHEN 'y' THEN [y].[EndDateTime]
END AS [EndDateTime]
) [result]
1 Answer 1
The 1st query is overly complicated, no need join the tables twice.
Additionally this
,CASE WHEN [y].[MatchID] IS NULL
THEN [x].[StartDateTime]
ELSE
CASE WHEN [x].[EndDateTime] - [x].[StartDateTime]
> [y].[EndDateTime] - [y].[StartDateTime]
THEN [x].[StartDateTime]
ELSE [y].[StartDateTime]
END
END AS [StartDateTime]
can be simplified, because any calculation involving NULLs evaluates to unknown
(branching to ELSE
) anyway:
,CASE WHEN y.EndDateTime - y.StartDateTime
>= x.EndDateTime - x.StartDateTime
THEN y.StartDateTime
ELSE x.StartDateTime
END AS StartDateTime
And you can add comments to help explaining the logic or add hints for the programmer.
Finally:
SELECT MatchID
,StartDateTime
,EndDateTime
,'2017-07-01 13:00:00' - StartDateTime AS TimeBefore1300
,EndDateTime - '2017-07-01 13:00:00' AS TimeAfter1300
,Value
,Comment
FROM
(
SELECT x.MatchID
,x.Value
,y.Comment
-- assign date of longer duration, same as next CASE
,CASE WHEN y.EndDateTime - y.StartDateTime
>= x.EndDateTime - x.StartDateTime
THEN y.StartDateTime
ELSE x.StartDateTime
END AS StartDateTime
-- assign date of longer duration, same as previous CASE
,CASE WHEN y.EndDateTime - y.StartDateTime
>= x.EndDateTime - x.StartDateTime
THEN y.EndDateTime
ELSE x.EndDateTime
END AS EndDateTime
FROM #X AS x
LEFT JOIN #Y AS y
ON y.MatchID = x.MatchID
) AS Result
I removed the []
because
- they're not needed
- it's much easier to write
- it's better readable (at least for me)
- it's Standard SQL
In your 2nd query you still have a repeated CASE, thus the logic must be changed in two places, too:
CASE [condition].[TableToUse]
WHEN 'x' THEN [x].[StartDateTime]
WHEN 'y' THEN [y].[StartDateTime]
END AS [StartDateTime]
,CASE [condition].[TableToUse]
WHEN 'x' THEN [x].[EndDateTime]
WHEN 'y' THEN [y].[EndDateTime]
END AS [EndDateTime]
For me the 2nd query is less readable/understandable because of the nested CROSS APPLY
(not CROSS JOIN
, btw). You can simply highlight the Derived Table in #1 and execute it to check the results, but you can't in #2 (that's a reason why I sometimes still use Derived Tables instead of Common Table Expressons).
-
\$\begingroup\$ Why do you believe that using
CROSS APPLY
makes the queries less readable? \$\endgroup\$CJ Dennis– CJ Dennis2017年08月03日 02:09:46 +00:00Commented Aug 3, 2017 at 2:09 -
\$\begingroup\$ @CJDennis: Probably only habit. My main DBMS is Teradata which doesn't support Microsoft's
CROSS APPLY
(but an assigned column alias can be used directly, which is quite similar) , so i'm not very used to it. And I prefer that highlight a part of the query and run it :-) \$\endgroup\$dnoeth– dnoeth2017年08月03日 09:29:43 +00:00Commented Aug 3, 2017 at 9:29