0

I have a table with 150 million rows with a non-clustered index on userId. I have to write a query to filter on userId and return all the rows if the productType is not SIG and return only rows greater than the last 90 days if the productType is SIG. All the columns from the table shoule be returned (unfortunately).

I created a similar table with about 10 million rows. Took me about 10 min to create it.

Execution plan: https://www.brentozar.com/pastetheplan/?id=HJM-XOq0j

The optimizer does an index seek on the userId and then does a key lookup to get the rest of the columns and also applying additional criteria. Is there a better way to do this?

set nocount on
Declare @userId UniqueIdentifier, @i Int = 0, @j Int = 1, @k Int = 1, @productType Varchar(100)
Declare @date datetime, @userId1 varchar(100)
Declare @type1 varchar(100),
@type2 varchar(100),
@type3 varchar(100),
@type4 varchar(100),
@type5 varchar(100),
@type6 varchar(100),
@type7 varchar(100),
@type8 varchar(100),
@type9 varchar(100),
@type10 varchar(100),
@type11 varchar(100),
@type12 varchar(100)
Create Table #tmpProducts
(
productId Int primary key Identity(1,1),
userId UniqueIdentifier,
productType varchar(100),
createdDate Datetime,
type1 varchar(100),
type2 varchar(100),
type3 varchar(100),
type4 varchar(100),
type5 varchar(100),
type6 varchar(100),
type7 varchar(100),
type8 varchar(100),
type9 varchar(100),
type10 varchar(100),
type11 varchar(100),
type12 varchar(100)
)
While @i<=100
begin
 --declare @msg varchar(100) = 'step' + convert(varchar(10),@i)
 -- RAISERROR (@msg, 0, 1) WITH NOWAIT
 set @userId = newId()
 set @userId1 = convert(varchar(100), @userId)
 set @j = 1
 set @k = 1
 while @j <= 5
 begin
 Set @productType = Case When @j = 1 Then 'SIG' 
 When @j = 2 Then 'FIG'
 When @j = 3 Then 'MKT'
 When @j = 4 Then 'LPP'
 When @j = 5 Then 'POC'
 End
 while @k < 100000
 begin
 print @k
 set @date = case when @k < 1000 Then DateAdd(day,-180,getdate())
 When @k > 1000 and @k < 99500 Then DateAdd(day,-91,getdate())
 else getdate()
 end
 set @type1 = @userId1 + convert(varchar(10),@k) + '1'
 set @type2 = @userId1 + convert(varchar(10),@k) + '2'
 set @type3 = @userId1 + convert(varchar(10),@k) + '3'
 set @type4 = @userId1 + convert(varchar(10),@k) + '4'
 set @type5 = @userId1 + convert(varchar(10),@k) + '5'
 set @type6 = @userId1 + convert(varchar(10),@k) + '6'
 set @type7 = @userId1 + convert(varchar(10),@k) + '7'
 set @type8 = @userId1 + convert(varchar(10),@k) + '8'
 set @type9 = @userId1 + convert(varchar(10),@k) + '9'
 set @type10 = @userId1 + convert(varchar(10),@k) + '10'
 set @type11 = @userId1 + convert(varchar(10),@k) + '11'
 set @type12 = @userId1 + convert(varchar(10),@k) + '12'
 Insert Into #tmpProducts values (@userId, @productType, @date,@type1, @type2,@type3,@type4,@type5,@type6,@type7,@type8,@type9,@type10,@type11,@type12)
 set @k = @k + 1
 end
 set @j = @j + 1
 end
 set @i = @i + 1
End
Create nonclustered index idx_ptype on #tmpProducts(userId)
Select * 
 from #tmpProducts
 Where userId = '78EAC797-4399-4D4A-A8E3-54F2732EE0E0' --replace the memberId with the id from your dataset
 And (productType != 'SIG'
 or (productType = 'SIG'
 and createdDate >= Convert(date, dateadd(day,-90,getdate()))
 )
 )
--drop table if exists #tmpProducts
asked Feb 27, 2023 at 18:20
10
  • This sounds like a good job for the index change you asked about. Did that not get what you'd hoped for? Commented Feb 27, 2023 at 19:37
  • The table has 150 million rows - what is the maximum number of rows that any one userId has? And also for this "per user" number how many rows will meet the =SIG condition and how many not? And of the rows meeting the =SIG condition how many meet the " last 90 day" condition? Commented Feb 27, 2023 at 19:44
  • Even with the index change, i'm still getting a key lookup. Do I need to live with it? Can I use a CTE to use the narrow index and then join it back to the table on the clustered index to get all the other columns? Commented Feb 27, 2023 at 19:46
  • 2
    " use a CTE to use the narrow index and then join it back to the table on the clustered index to get all the other columns" - you are just doing exactly what the key lookup is doing then. Commented Feb 27, 2023 at 19:48
  • @MartinSmith The maximum number of rows a user has is 20k currently. SIG for the user that has 20k rows is 5k and of those 5k rows, 1k is in the last 90 days. The average number of rows per user is around 1k rows and the average SIG rows are 500 and average SIG rows in the last 90 days are 100. Commented Feb 27, 2023 at 19:49

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.