1

I'm working on a system that has a resource intenstive houskeeping job. It hits two different data files on the same LUN. One is for relational data, one for blobs. During the job, read latency on the relational data file averages over 250ms, while on the LOB data it is only 5ms. This is measured using sys.dm_io_virtual_file_stats (two samples 12 hours apart to encompass the houskeeping - during the other 12 hours, latency is 20ms).

I'm sampling wait stats using "who is active". In this I don't see any PAGEIOLATCH waits with large enough wait times to explain the 250ms. sys.dm_os_wait_stats backs this up with an average of 5ms for PAGEIOLATCH_EX and 15ms for PAGEIOLATCH_SH. What I do see is long waits for LCK_M (which is explainable, due to the nature of the housekeeping).

My question is: can the LCK_M waits contribute to the io_stall_read_ms times in the virtual file stats?

In case the details of the housekeeping matter, it is removing smallish batches of data using multiple cascading deltes, including ~1MB of LOB data per item. The system is using AlwaysOn with sync commit.

marc_s
9,0626 gold badges46 silver badges52 bronze badges
asked Apr 5, 2017 at 9:31
2
  • check out this post sqlperformance.com/2013/10/t-sql-queries/io-latency Processing more data will lead to more latency, if your throughput goes up then this will be normal behavior. Commented Apr 5, 2017 at 9:50
  • @laurence answer to your question will by yes. io_stall_read_ms means Total time, in milliseconds, that the users waited for reads issued on the file and not the time it took to read the page from disk. For that you can use wmi counter avg. disk sec/read. Commented Apr 5, 2017 at 11:07

2 Answers 2

0

The waits that are seen in sys.dm_io_virtual_stats are aggregates instead of point values. If you are doing a lot of transactions in a certain period, it will show you an higher aggregate --> higher average latency. More information on how this dmv works can be found on https://sqlperformance.com/2013/10/t-sql-queries/io-latency

Virtual file stats are a great starting point when you want to understand I/O performance for a SQL Server instance. If you see I/O-related waits when looking at wait statistics, looking at sys.dm_io_virtual_file_stats is a logical next step. However, understand that the data you're viewing is an aggregate since the stats last cleared by one of the associated events (instance restart, offline of database, etc). If you see low latencies, then the I/O subsystem is keeping up with the performance load. However, if you see high latencies, it's not a foregone conclusion that storage is a problem. To really know what's going on you can start to snapshot file stats, as shown here, or you can simply use Performance Monitor to look at latency in real time. It's very easy to create a Data Collector Set in PerfMon that captures the Physical Disk counters Avg. Disk Sec/Read and Avg. Disk Sec/Read for all disks that host database files.

The LCK_M will not contribute to the IO_STALL_READ_MS as these counters are only IO related

answered Apr 5, 2017 at 10:06
3
  • This is very interesting but it doesn't answer the question. Can LCK_M waits contribute to the io_stall_read_ms. I've updated the question to make it clear the 250ms was measured as the difference between two 12hr samples. Should I expect the io_stall to correlate with IO type waits? Commented Apr 5, 2017 at 10:59
  • Yes, it will correlate only with the IO type waits Commented Apr 5, 2017 at 11:14
  • 1
    Thanks, Stijn. I've realised what the numbers indicate and answered myself. However, I'll mark yours as the answer as you answered the question I had correctly. Commented Apr 5, 2017 at 11:56
0

I've missed the obvious explanation.

io_stall measures difference between io being submitted and fulfilled. Wait stats only measure when a query doesn't have something useful to do.

The numbers I am seeing indicate:

  1. IO latency has increased markedly as a result of the housekeeping load.
  2. The housekeeping query can mostly find other useful work to do instead of waiting for IO to complete. (Else I would see high PAGEIOLATCH waits)
answered Apr 5, 2017 at 11:55

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.