I'm working on a system that has a resource intenstive houskeeping job. It hits two different data files on the same LUN. One is for relational data, one for blobs. During the job, read latency on the relational data file averages over 250ms, while on the LOB data it is only 5ms. This is measured using sys.dm_io_virtual_file_stats
(two samples 12 hours apart to encompass the houskeeping - during the other 12 hours, latency is 20ms).
I'm sampling wait stats using "who is active". In this I don't see any PAGEIOLATCH
waits with large enough wait times to explain the 250ms. sys.dm_os_wait_stats
backs this up with an average of 5ms for PAGEIOLATCH_EX
and 15ms for PAGEIOLATCH_SH
. What I do see is long waits for LCK_M
(which is explainable, due to the nature of the housekeeping).
My question is: can the LCK_M
waits contribute to the io_stall_read_ms
times in the virtual file stats?
In case the details of the housekeeping matter, it is removing smallish batches of data using multiple cascading deltes, including ~1MB of LOB data per item. The system is using AlwaysOn with sync commit.
-
check out this post sqlperformance.com/2013/10/t-sql-queries/io-latency Processing more data will lead to more latency, if your throughput goes up then this will be normal behavior.Stijn Wynants– Stijn Wynants2017年04月05日 09:50:17 +00:00Commented Apr 5, 2017 at 9:50
-
@laurence answer to your question will by yes. io_stall_read_ms means Total time, in milliseconds, that the users waited for reads issued on the file and not the time it took to read the page from disk. For that you can use wmi counter avg. disk sec/read.SqlWorldWide– SqlWorldWide2017年04月05日 11:07:41 +00:00Commented Apr 5, 2017 at 11:07
2 Answers 2
The waits that are seen in sys.dm_io_virtual_stats
are aggregates instead of point values. If you are doing a lot of transactions in a certain period, it will show you an higher aggregate --> higher average latency. More information on how this dmv works can be found on https://sqlperformance.com/2013/10/t-sql-queries/io-latency
Virtual file stats are a great starting point when you want to understand I/O performance for a SQL Server instance. If you see I/O-related waits when looking at wait statistics, looking at sys.dm_io_virtual_file_stats is a logical next step. However, understand that the data you're viewing is an aggregate since the stats last cleared by one of the associated events (instance restart, offline of database, etc). If you see low latencies, then the I/O subsystem is keeping up with the performance load. However, if you see high latencies, it's not a foregone conclusion that storage is a problem. To really know what's going on you can start to snapshot file stats, as shown here, or you can simply use Performance Monitor to look at latency in real time. It's very easy to create a Data Collector Set in PerfMon that captures the Physical Disk counters Avg. Disk Sec/Read and Avg. Disk Sec/Read for all disks that host database files.
The LCK_M
will not contribute to the IO_STALL_READ_MS
as these counters are only IO related
-
This is very interesting but it doesn't answer the question. Can LCK_M waits contribute to the io_stall_read_ms. I've updated the question to make it clear the 250ms was measured as the difference between two 12hr samples. Should I expect the io_stall to correlate with IO type waits?Laurence– Laurence2017年04月05日 10:59:52 +00:00Commented Apr 5, 2017 at 10:59
-
Yes, it will correlate only with the IO type waitsStijn Wynants– Stijn Wynants2017年04月05日 11:14:25 +00:00Commented Apr 5, 2017 at 11:14
-
1Thanks, Stijn. I've realised what the numbers indicate and answered myself. However, I'll mark yours as the answer as you answered the question I had correctly.Laurence– Laurence2017年04月05日 11:56:04 +00:00Commented Apr 5, 2017 at 11:56
I've missed the obvious explanation.
io_stall
measures difference between io being submitted and fulfilled. Wait stats only measure when a query doesn't have something useful to do.
The numbers I am seeing indicate:
- IO latency has increased markedly as a result of the housekeeping load.
- The housekeeping query can mostly find other useful work to do instead of waiting for IO to complete. (Else I would see high
PAGEIOLATCH
waits)