0
\$\begingroup\$

I have an SQL query that fetches some stats, and those stats are displayed on a dashboard. The dashboard runs this query every 15 seconds to fetch the most up-to-date data.

The query is very CPU-heavy for the database server (according to the stats provided by Azure SQL Server), and eats 15% of the available CPU capacity of the server.

The query runs on a log (which records messages received from user and the answers returned by a chatbot). It returns four pieces of information, namely conversations, messages, licenses and failed:

SELECT 
 -- Total unique `conversationId`s:
 COUNT (DISTINCT conversationId) AS conversations,
 -- Total records in table:
 COUNT (*) AS messages,
 -- Total unique `licenseId`s:
 COUNT (DISTINCT licenseId) AS licenses,
 -- Total messages where `intent` meets certain criteria:
 (
 SELECT COUNT (*)
 FROM MessageLog
 WHERE intent LIKE 'None' OR intent IS NULL OR intent LIKE ''
 ) AS failed
 FROM MessageLog

What do you think is the most CPU intensive part of that query, and what are the possible ways to consume less CPU?

Update: Here's the DB schema:

CREATE TABLE [dbo].[MessageLog](
 [id] [int] IDENTITY(1,1) NOT NULL,
 [licenseId] [int] NOT NULL,
 [message] [nvarchar](1000) NOT NULL,
 [timestamp] [datetime] NOT NULL,
 [intent] [nvarchar](70) NULL,
 [entities] [nvarchar](3000) NULL,
 [conversationId] [nvarchar](100) NULL,
 [confidence] [float] NULL,
 [failed] [bit] NOT NULL,
 [ipAddress] [varchar](25) NULL,
 [userAgent] [varchar](256) NULL,
 [nlpExecutionTime] [int] NULL,
 [answer] [nvarchar](1000) NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[MessageLog] ADD CONSTRAINT [DF_MessageLog_failed] DEFAULT ((0)) FOR [failed]
GO

There are non-clustered indexes on id, intent, conversationId, licenseId, but it doesn't seem to improve the performance.

200_success
145k22 gold badges190 silver badges478 bronze badges
asked Apr 9, 2018 at 8:03
\$\endgroup\$
7
  • 3
    \$\begingroup\$ If you don't use any patterns/wildcards, what's the purpose of LIKE then? Why don't use =? \$\endgroup\$ Commented Apr 9, 2018 at 8:52
  • \$\begingroup\$ Also without knowing the exact table schema I don't think there much anyone can say about this query. Please add it too. \$\endgroup\$ Commented Apr 9, 2018 at 8:55
  • 2
    \$\begingroup\$ Did you try to execute each statement separatly to determine if there's one that consume more than the other ? \$\endgroup\$ Commented Apr 9, 2018 at 10:28
  • 1
    \$\begingroup\$ is that the actual query that you are using in your production code? have you changed the query in order to post it here? because your like statement may be slow if you are not calling it as you have posted here. it is important that we have the actual code. \$\endgroup\$ Commented Apr 9, 2018 at 14:14
  • 1
    \$\begingroup\$ @K48 - the LIKE will definitely impact the query even without a wildcard in the value. Show the query plan and compare each option. \$\endgroup\$ Commented Apr 16, 2018 at 17:12

3 Answers 3

3
\$\begingroup\$

This seems problematic because there isn't any correlation between the outer MessageLog dataset and the one in the inner query. The consequence I believe, is that the inner query (the one in the SELECT clause) is executed the same number of times as there are rows in MessageLog, hence the high CPU:

(
 SELECT COUNT (*)
 FROM MessageLog
 WHERE intent LIKE 'None' OR intent IS NULL OR intent LIKE ''
) AS failed

If you try the following, your query will be working with only one dataset - MessageLog in the outer query. Effectively counting the data once, instead of multiple times.

(
 SELECT COUNT (
 CASE 
 WHEN intent = 'None' THEN 1 
 WHEN intent = '' THEN 1
 WHEN intent IS NULL THEN 1
 ELSE NULL
 END)
) AS failed

The CASE statement could be written more elegantly.

answered Apr 9, 2018 at 23:28
\$\endgroup\$
1
  • \$\begingroup\$ Thanks! That seems to have improved performance by about 40%. I made an edit to your code (change 0 to NULL and added END). \$\endgroup\$ Commented Apr 10, 2018 at 3:22
3
\$\begingroup\$

Guessing:

What do you think is the most CPU intensive part of that query, and what are the possible ways to consume less CPU?

We don't have to guess, you test it using SSMS and execution plans.

Capacity:

The query is very CPU-heavy for the database server (according to the stats provided by Azure SQL Server), and eats 15% of the available CPU capacity of the server.

15% of 0 is still 0, how many CPUs do you have for this instance, if you only have 1 CPU 15% could be considered significant, however the issue is most likely not this query, it is the ration of CPU you've given it.

Fundamentals:

When you have a table and you are TRYING to use a surrogate key as a primary key, you still have to have a primary key on the object for it to be considered a table. You are currently querying a heap.

Please start by adding a primary key to the table, identity(1,1) is not a primary key, it is a candidate for a primary key, you still have to tell SQL you want it to be the primary key.

Predicate:

How large is this table, it could be the case that you have 4 billion records in the table and you're asking this question with out a filter on date range. Do you intent for this to be a question for all time? Is there a process that truncates this table, or starts the process over in some fashion?

Indexes:

Indexes, I would create a non cluster index on intent, you could even try a filtered index on the column.

Performance tuning

You will want to compare the execution plans, find the equivalent of dropcleanbuffers and freeproccache for Azure. Subtreecost is what you're looking to improve, time it takes to return is a non-sequitur. You might need to provide table hints to force it to use the indexes to validate if the index is useful or not.

Tempdb:

What does your tempdb look like, I only ask because of the multiple DISTINCT you use in the queries. If there is a lot of data in the table its dumping to tempdb and disk, what does the I/O look like for tempdb, what does the space look like?

Code Changes

I would also consider changing the code @Malachi offers sound suggestion. I would also consider SQL Server Window Functions.

Mast
13.8k12 gold badges56 silver badges127 bronze badges
answered Apr 9, 2018 at 17:55
\$\endgroup\$
0
2
\$\begingroup\$

I am not sure at the moment how this would affect the performance but it would be simpler to use an in instead of two or statements in this sub query

 SELECT COUNT (*)
 FROM MessageLog
 WHERE intent LIKE 'None' OR intent IS NULL OR intent LIKE ''

with the in statement. if you have other values you can add them into this list much easier.

 SELECT COUNT (*)
 FROM MessageLog
 WHERE intent in ('None', '') OR intent IS NULL
answered Apr 9, 2018 at 14:18
\$\endgroup\$
1
  • \$\begingroup\$ Thanks! I've tested this and the other solution here from Mauricio Morales, and it seems that his solution eats up much less CPU. I am not sure why this is happening but that's what the test shows. \$\endgroup\$ Commented Apr 10, 2018 at 3:47

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.