I have 4 development SQL Server 2016 instances on a virtual server (128 GB, 12 virtual processors). Each instance is configured the same (16 GB memory, automatic processor and I/O affinity masks). All instance parameters are the same. Each instance has a database called "operations" with a table called "ErrLog" with a clustered PK index; each instance's ErrLog currently has 600-1000 records. Each operations database has one datafile on E: drive, one tx log file on F: drive. Querying the ErrLog table in instances (b), (c), and (d) is sub-second. In instance (a), it consistently takes 2-4 seconds.
I do typical maintenance (Ola Hallengren's index and statistic maintenance once a week), but this is consistently done in each instance.
I have used sp_BlitzFirst and sp_Blitz, which return issues, but consistent issues between the instances. And so far I'm not noticing any issues returned by various checkdb operations.
Can anyone suggest other troubleshooting steps to determine why instance (a) is having different performance than others? (Querying ErrLog is just a proxy for honing in on the differences; other queries are also performing differently.)
Also, I restored a copy of a.Operations into b.Operations_test, and query b.Operations_test.ErrLog also takes 2-4 seconds. I restored a copy of b.Operations into a.Operations_test, and querying a.Operations_test is sub-second.
EDIT: Thanks for your replies...
Here is the structure of the table (same in each of the instances); just the one, clustered primary key index:
CREATE TABLE [dbo].[ErrLog](
[ErrorID] [int] IDENTITY(1,1) NOT NULL,
[TimestampCreate] [dbo].[TimestampCreate] NOT NULL,
[ErrorNumber] [int] NOT NULL,
[ErrType] [int] NOT NULL,
[ErrLocation] [varchar](2000) NULL,
[ErrContext] [text] NULL,
[ErrSeverity] [int] NOT NULL,
[MessageLong] [varchar](1024) NOT NULL,
[ServerName] [dbo].[DescriptionLong] NULL,
CONSTRAINT [PK_ErrLog] PRIMARY KEY CLUSTERED
(
[ErrorID] ASC
)
PasteAPlan results:
This might be a clue to the different performances: I find that in NP02, some of the records have much longer values in the ErrContext text field:
NP02: 668 rows; maximum length of "ErrContext" text field is 3,090,039
NP03: 648 rows; maximum length of "ErrContext" text field is 154,013
NP04: 411 rows; max length of "ErrContext" text field is 28,519
NP05: 46 rows; max length of "ErrContext" text field is 55,445
Top 10 Wait Statistics (pulled using Pinal Dave's query):
NP02:
Wait_Type | Wait_Time_Seconds | Waiting_Tasks_Count | Percentage_WaitTime |
---|---|---|---|
OLEDB | 17951.078 | 1006510364 | 34.22289012 |
BACKUPBUFFER | 5775.054 | 6016680 | 11.00987018 |
ASYNC_IO_COMPLETION | 5491.216 | 3688 | 10.46874632 |
BACKUPIO | 5441.588 | 5592972 | 10.37413286 |
IO_COMPLETION | 3451.687 | 1192286 | 6.58047973 |
SOS_SCHEDULER_YIELD | 3271.515 | 6502869 | 6.236990244 |
PAGEIOLATCH_SH | 3191.304 | 2482244 | 6.084071726 |
BACKUPTHREAD | 2224.763 | 229830 | 4.241406543 |
TRACEWRITE | 1680.446 | 864 | 3.203691656 |
NP03:
Wait_Type | Wait_Time_Seconds | Waiting_Tasks_Count | Percentage_WaitTime |
---|---|---|---|
CXPACKET | 60954.838000 | 24948490 | 20.506848003464389 |
BACKUPBUFFER | 58801.396000 | 27540692 | 19.782372158277558 |
ASYNC_IO_COMPLETION | 48562.138000 | 1528 | 16.337610194112273 |
BACKUPIO | 41613.771000 | 28454440 | 13.999992531322523 |
OLEDB | 27122.218000 | 1397826908 | 9.124644085557670 |
BACKUPTHREAD | 22593.140000 | 1078602 | 7.600940353594105 |
ASYNC_NETWORK_IO | 12203.635000 | 1545004 | 4.105631254975333 |
PAGEIOLATCH_SH | 8875.883000 | 5384558 | 2.986085921145972 |
SOS_SCHEDULER_YIELD | 5326.097000 | 43211882 | 1.791842374032848 |
IO_COMPLETION | 3153.986000 | 1708478 | 1.061085774800265 |
-
1Please provide the slow query, any index definitions, and the actual execution plan which can be uploaded to Paste The Plan.J.D.– J.D.2023年01月28日 03:47:16 +00:00Commented Jan 28, 2023 at 3:47
-
1Also compare the session wait stats. learn.microsoft.com/en-us/sql/relational-databases/…David Browne - Microsoft– David Browne - Microsoft2023年01月28日 19:19:06 +00:00Commented Jan 28, 2023 at 19:19
1 Answer 1
From the execution plans, we can see that the important difference is the time spent doing ASYNC_NETWORK_IO waits. For your slow scenario, you do over 14,000 of them but only 159 in your fast case.
This is saying that your slower execution is having to do a lot more back and forth trips to your database in order to send all the data to your client. The most obvious explanation is the text
column. The bigger the size of this lob column, the more read requests the client is going to have to do to get to the end of it. Lobs aren’t as simple as other data types, the contents of them won’t necessarily fit in the same pages as the rest of your columns so a pointer is used instead. Your client receives this pointer and does it’s own lookups. This can lead to multiple lookups per row (as in your case) as further pointers must be followed. You should have a read of this post for further information on how lobs are handled https://aboutsqlserver.com/2013/11/05/sql-server-storage-engine-lob-storage/
-
Agreed: when I exclude the ErrContext field (the text column) from the query, it returns sub-second as in the other instances.pvallero– pvallero2023年01月30日 17:29:21 +00:00Commented Jan 30, 2023 at 17:29
Explore related questions
See similar questions with these tags.