2
\$\begingroup\$

I have a complex date-and-time-matching query in the middle of another huge query. I'm matching two tables, the second one is LEFT JOINed to the first. If the time span is equal or greater in the second table, it should use those values. If the second table doesn't match, it should use the values from the first table.

With the provided test data, the values are returned from the following tables [1: X, 2: Y, 3: Y, 4: X].

The first query has the same logic repeated again (in the nested SELECT). If it needs to change, the programmer has to make the same change in both places or the results could be wrong. I also feel it's not obvious what all that logic is doing.

The second query uses two CROSS APPLYs to accomplish the same result. The first CROSS APPLY is just for the condition (which of the two tables to select from) and the second CROSS APPLY returns the values from the chosen table.

I like the separation of the second query because I think it's much clearer what the query is supposed to do, and it also runs faster! When the temporary tables already exist, the execution plan shows 82% for the first query and 18% for the second query.

What I don't like is that I have to repeat CASE [condition].[TableToUse] ... END for each column I want to return, and if I knew a way to simplify it, I would.

What else can I do to improve this query in terms of comprehension, future maintenance and speed?

Create test data:

IF OBJECT_ID('tempdb..#X') IS NULL
 BEGIN
 CREATE TABLE #X(
 [MatchID] INT
 ,[StartDateTime] DATETIME
 ,[EndDateTime] DATETIME
 ,[Value] INT
 )
 INSERT INTO #X
 VALUES (1, '2017-07-01 12:00:00', '2017-07-01 13:00:00', 11)
 ,(2, '2017-07-01 12:00:00', '2017-07-01 13:00:00', 13)
 ,(3, '2017-07-01 12:00:00', '2017-07-01 13:00:00', 17)
 ,(4, '2017-07-01 12:00:00', '2017-07-01 13:00:00', 19)
 END
IF OBJECT_ID('tempdb..#Y') IS NULL
 BEGIN
 CREATE TABLE #Y(
 [MatchID] INT
 ,[StartDateTime] DATETIME
 ,[EndDateTime] DATETIME
 ,[Comment] VARCHAR(MAX)
 )
 INSERT INTO #Y
 VALUES (1, '2017-07-01 12:00:00', '2017-07-01 12:30:00', 'Foo')
 ,(2, '2017-07-01 12:00:00', '2017-07-01 14:00:00', 'Bar')
 ,(3, '2017-07-01 13:00:00', '2017-07-01 14:00:00', 'Baz')
 END

First query:

SELECT [x].[MatchID]
 ,[result].[StartDateTime]
 ,[result].[EndDateTime]
 ,'2017-07-01 13:00:00' - [result].[StartDateTime] AS [TimeBefore1300]
 ,[result].[EndDateTime] - '2017-07-01 13:00:00' AS [TimeAfter1300]
 ,[x].[Value]
 ,[y].[Comment]
FROM #X [x]
LEFT JOIN #Y [y]
 ON [y].[MatchID] = [x].[MatchID]
JOIN (
 SELECT [x].[MatchID]
 ,CASE WHEN [y].[MatchID] IS NULL
 THEN [x].[StartDateTime]
 ELSE
 CASE WHEN [x].[EndDateTime] - [x].[StartDateTime]
 > [y].[EndDateTime] - [y].[StartDateTime]
 THEN [x].[StartDateTime]
 ELSE [y].[StartDateTime]
 END
 END AS [StartDateTime]
 ,CASE WHEN [y].[MatchID] IS NULL
 THEN [x].[EndDateTime]
 ELSE
 CASE WHEN [x].[EndDateTime] - [x].[StartDateTime]
 > [y].[EndDateTime] - [y].[StartDateTime]
 THEN [x].[EndDateTime]
 ELSE [y].[EndDateTime]
 END
 END AS [EndDateTime]
 FROM #X [x]
 LEFT JOIN #Y [y]
 ON [y].[MatchID] = [x].[MatchID]
) [result]
 ON [result].[MatchID] = [x].[MatchID]

Second query:

SELECT [x].[MatchID]
 ,[result].[StartDateTime]
 ,[result].[EndDateTime]
 ,'2017-07-01 13:00:00' - [result].[StartDateTime] AS [TimeBefore1300]
 ,[result].[EndDateTime] - '2017-07-01 13:00:00' AS [TimeAfter1300]
 ,[x].[Value]
 ,[y].[Comment]
FROM #X [x]
LEFT JOIN #Y [y]
 ON [y].[MatchID] = [x].[MatchID]
CROSS APPLY (
 SELECT CASE WHEN [y].[MatchID] IS NULL
 THEN 'x'
 ELSE
 CASE WHEN [x].[EndDateTime] - [x].[StartDateTime]
 > [y].[EndDateTime] - [y].[StartDateTime]
 THEN 'x'
 ELSE 'y'
 END
 END AS [TableToUse]
) [condition]
CROSS APPLY (
 SELECT CASE [condition].[TableToUse]
 WHEN 'x' THEN [x].[StartDateTime]
 WHEN 'y' THEN [y].[StartDateTime]
 END AS [StartDateTime]
 ,CASE [condition].[TableToUse]
 WHEN 'x' THEN [x].[EndDateTime]
 WHEN 'y' THEN [y].[EndDateTime]
 END AS [EndDateTime]
) [result]
asked Jul 28, 2017 at 4:01
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

The 1st query is overly complicated, no need join the tables twice.

Additionally this

 ,CASE WHEN [y].[MatchID] IS NULL
 THEN [x].[StartDateTime]
 ELSE
 CASE WHEN [x].[EndDateTime] - [x].[StartDateTime]
 > [y].[EndDateTime] - [y].[StartDateTime]
 THEN [x].[StartDateTime]
 ELSE [y].[StartDateTime]
 END
 END AS [StartDateTime]

can be simplified, because any calculation involving NULLs evaluates to unknown (branching to ELSE) anyway:

 ,CASE WHEN y.EndDateTime - y.StartDateTime
 >= x.EndDateTime - x.StartDateTime
 THEN y.StartDateTime
 ELSE x.StartDateTime
 END AS StartDateTime

And you can add comments to help explaining the logic or add hints for the programmer.

Finally:

SELECT MatchID
 ,StartDateTime
 ,EndDateTime
 ,'2017-07-01 13:00:00' - StartDateTime AS TimeBefore1300
 ,EndDateTime - '2017-07-01 13:00:00' AS TimeAfter1300
 ,Value
 ,Comment
FROM
 (
 SELECT x.MatchID
 ,x.Value
 ,y.Comment
 -- assign date of longer duration, same as next CASE
 ,CASE WHEN y.EndDateTime - y.StartDateTime
 >= x.EndDateTime - x.StartDateTime
 THEN y.StartDateTime
 ELSE x.StartDateTime
 END AS StartDateTime
 -- assign date of longer duration, same as previous CASE
 ,CASE WHEN y.EndDateTime - y.StartDateTime
 >= x.EndDateTime - x.StartDateTime
 THEN y.EndDateTime
 ELSE x.EndDateTime
 END AS EndDateTime
 FROM #X AS x
 LEFT JOIN #Y AS y
 ON y.MatchID = x.MatchID
 ) AS Result

I removed the [] because

  • they're not needed
  • it's much easier to write
  • it's better readable (at least for me)
  • it's Standard SQL

In your 2nd query you still have a repeated CASE, thus the logic must be changed in two places, too:

 CASE [condition].[TableToUse]
 WHEN 'x' THEN [x].[StartDateTime]
 WHEN 'y' THEN [y].[StartDateTime]
 END AS [StartDateTime]
 ,CASE [condition].[TableToUse]
 WHEN 'x' THEN [x].[EndDateTime]
 WHEN 'y' THEN [y].[EndDateTime]
 END AS [EndDateTime]

For me the 2nd query is less readable/understandable because of the nested CROSS APPLY (not CROSS JOIN, btw). You can simply highlight the Derived Table in #1 and execute it to check the results, but you can't in #2 (that's a reason why I sometimes still use Derived Tables instead of Common Table Expressons).

answered Jul 28, 2017 at 10:19
\$\endgroup\$
2
  • \$\begingroup\$ Why do you believe that using CROSS APPLY makes the queries less readable? \$\endgroup\$ Commented Aug 3, 2017 at 2:09
  • \$\begingroup\$ @CJDennis: Probably only habit. My main DBMS is Teradata which doesn't support Microsoft's CROSS APPLY (but an assigned column alias can be used directly, which is quite similar) , so i'm not very used to it. And I prefer that highlight a part of the query and run it :-) \$\endgroup\$ Commented Aug 3, 2017 at 9:29

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.