SQL: What is slowing down INSERTs if not CPU or IO?

Question 1

We have a database for a product that is write-heavy. We just bought a new server machine with a SSD to help. To our surprise, the insertions were not faster than on our old machine with much slower storage. During benchmarking we noticed that the IO rate exhibited by the SQL Server process was very low.

For example, I ran the script found on this page, except that I added a BEGIN TRAN and COMMIT around the loop. At best I could see the disk usage reach 7Mb/s, while CPU barely touched 5%. The server has 64Gb installed and is using 10. The total run time was 2 minutes 15 seconds for the first call down to around 1 minute for subsequent calls. The database is on simple recovery and was idle during the test. I dropped the table between each call.

Why is such a simple script so slow? The hardware is barely being used at all. Both dedicated disk benchmarking tools and SQLIO indicate that the SSD performs correctly with speeds upward of 500Mb/s for both reading and writing. I understand that random writes are slower than sequential writes, but I would expect a simple insert like this, to a table without clustered indexing, to be much faster.

Ultimately our scenario is much more complex, but I feel that I need to understand a simple case first. In a nutshell our application deletes old data, then uses SqlBulkCopy to copy new data to staging tables, performs some filtering, and finally uses MERGE and/or INSERT INTO depending on cases to copy the data to the final tables.

--> EDIT 1: I followed the procedure linked by Martin Smith, and I got the following result:

[Wait Type] [Wait Count] [Total Wait (ms)] [T. Resource Wait (ms)] [T. Signal Wait (ms)]
NETWORK_IO 5008 46735 46587 148
LOGBUFFER 901 5994 5977 17
PAGELATCH_UP 40 866 865 1
SOS_SCHEDULER_YIELD 53279 219 121 98
WRITELOG 5 145 145 0
PAGEIOLATCH_UP 4 58 58 0
LATCH_SH 5 0 0 0

I find it weird NETWORK_IO takes most of the time, considering there are no result to display and no data to transfer anywhere other than to the SQL files. Does the NETWORK_IO type includes all IO?

--> EDIT 2: I created a 20Gb RAM disk and mounted a database from there. The best time I had on the SSD is 48s, with the RAM disk it went down to 37 seconds. NETWORK_IO is still the biggest wait. The maximum write speed to the RAM disk was about 250Mb/s while it's able to do multi gigabytes per second. It still wasn't using much CPU, so what's holding up SQL?

Question 2

You can use the script here to capture waitstats for a specific spid. I would add SET NOCOUNT ON to it too.

Question 3

the NETWORK_IO might be from the 3 million "1 row(s) affected" messages being sent back. Did you try adding SET NOCOUNT ON to the script?

Question 4

Yes, I added the NOCOUNT.

Question 5

Strange. I wouldn't expect much in the way of network activity at all then. Did you delete the old extended events files between runs? The script that reads them uses a wild card EE_WaitStats*.xel so old ones will contaminate your results.

Question 6

Good call, I will update the results tomorow.

Question 7

I know it's an old Question but this might still help searchers and it's a problem that pops up every now and then.

The main reason why you are hitting a performance ceiling without you seeing any resource bottleneck is because you've reached the limit of what is possible to process within one session single thread. The loop isn't processed in parallel, but all inserts are done serially.

In my case, it takes 36 seconds to insert 3 million rows. That means 36/30000000 = 0.000012 seconds per row. That's pretty fast. On my system, it simply takes 0.000012 to go through all the steps that are necessary.

The only way to get it done faster is start up a second session in parallel.

If I start 2 sessions in parallel both doing 15 million inserts. Both of them finish in 18 seconds. I could scale out more, but my current test setup is hitting 95% cpu with two parallel session, so doing 3 would skew the results since I would hit a CPU bottleneck.

If I start 2 parallel session both inserting 3 million rows, they both finish in 39 seconds. so that is now 6 million rows in 39 seconds.

Okay, that still leaves us with the NETWORK_IO wait showing up.

The NETWORK_IO waits are added by the fact that you are using extended events to trace them. In my case the insert takes 36 seconds (on avg). When using the extended event way (from the link above in the very first comment) this is what is registered:

Wait Type Wait Count Total Wait Time (ms) Total Resource Wait Time (ms) Total Signal Wait Time (ms)
NETWORK_IO 3455 68808 68802 6
PAGEIOLATCH_SH 3 64 64 0
PAGEIOLATCH_UP 12 58 58 0
WRITE_COMPLETION 8 15 15 0
WRITELOG 3 9 9 0
PAGELATCH_UP 2 4 4 0
SOS_SCHEDULER_YIELD 32277 1 0 1
IO_COMPLETION 8 0 0 0
LATCH_SH 3 0 0 0
LOGBUFFER 1 0 0 0

You can see that 68 seconds of NETWORK_IO is registered. But since the insert loop is a single threaded action that took 36 seconds, this can't be. (Yes, multiple threads are used, but the operations are serial, never in parallel, so you can't acummulate more wait time than the total duration of the query)

If I don't use extended events but just the wait stats DMVs on a quiet instance (with just me running the insert) I get this:

Wait Type Wait Count Total Wait Time (ms) Total Resource Wait Time (ms) Signal Resource Wait Time (ms)
SOS_SCHEDULER_YIELD 8873 0.21 0.01 0.20
PAGEIOLATCH_UP 3 0.02 0.02 0.00
PREEMPTIVE_OS_AUTHENTICATIONOPS 17 0.02 0.02 0.00
PAGEIOLATCH_SH 1 0.00 0.00 0.00

So the NETWORK_IO you were seeing in the extended events log, wasn't related to your insert loop. (If you wouldn't turn nocount on, you would have massive async network IO waits, +1 Martin)

However I don't know why the NETWORK_IO show up in the extended event trace. Sure the writing out to a async file target of the events accumulates ASYNC_NETWORK_IO, but surely this is all done on a differenent SPID then the one we are filtering on. I might ask this as a new question myself)

Question 8

" you are hitting a performance ceiling without you seeing any resource bottleneck is because you've reached the limit of what is possible to process within one session single thread": you are describing a 100% CPU bottleneck (on one core). If there is no bottleneck, then the system will go faster, so something else must be at play.

Question 9

Your answer is very informative Edward. It does look like parallelism is the solution to our problem we are already working on that, though it requires changes to our database layout. Like Remus however, I'm still curious why the machine didn't seem to use all (of one) CPU or disk resources.

Question 10

Typically you start by looking at sys.dm_exec_requests, specifically at the wait_time, wait_type and wait_resource for your INSERT request(s). This will give a clear indication what is blocking your INSERT. The results will indicate whether is lock contention, file growth events, log flush waits, allocation contention (manifests as PFS page latch contention) etc etc etc. Once you measure, update your question accordingly. I strongly urge you to stop now and read the Waits and Queues troubleshooting methodology before you proceed.

Question 11

I ran the test script at the page linked in the OP with the BEGIN TRAN / COMMIT around the loop. On my machine, it took 1:28 to complete the first time.

Then I moved these two commands outside the loop:

SELECT @Random = ROUND(((@Upper - @Lower -1) * RAND() + @Lower), 0)
SET @InsertDate = DATEADD(dd, @Random, GETDATE())

It completed in 28 seconds after that.

I don't know for sure what's happening, but I'm guessing there might be a sleep of some kind in the RAND() code, perhaps as part of the algorithm they're using to generate entropy (better random numbers).

FWIW, SSDs are not always the best technology for write-heavy apps. For best performance, make sure your DB log is on a different drive letter from the DB data, the log file is pre-grown to its maximum size, and never truncate the log.

Question 12

Thanks for your input RickNZ. I did not get faster results by moving the code out of the loop. Wait I did observe is that if you run it multiple times it gets faster, that might be what you experienced. I know SSDs aren't silver bullets, but I still feel like the performance isn't what it could be.

Question 13

Another DMV that I use to identify slowness is sys.dm_os_waiting_tasks. If your query is not CPU intensive, then you can find more information about the waits from this DMV.

Question 14

I'm checking the list of wait events for sql 2008 and i don't see NETWORK_IO listed: http://technet.microsoft.com/en-us/library/ms179984(v=sql.100).aspx

I thought that NETWORK_IO was now just listed as ASYNC_NETWORK_IO, so I wanted to ask if you could check your version of SQL again, because I am simply curious as to how/why that wait event is appearing for that version.

As for the network wait appearing at all, yes that can happen even if you are working on a standalone server. Have you checked the settings for your network cards? I'm wondering if they are an issue.

At the end of the day there are only a few resource bottlenecks possible: memory, CPU, disk I/O, network, and locking. You have indicated that CPU and I/O is not the issue, and you have a wait event of NETWORK_IO, so I suggest you look at those NIC cards first.

Question 15

The NETWORK_IO is shown because the OP is using extended events. It never got updated in sys.dm_xe_map_values

Question 16

I'm thinking the same SQLRockstar, just what could be going on. I did try to disable the network cards entirely. Martin pointed out that some old files might be still there I will update the results tomorow to see if it changes anything.

Question 17

also, it might help if we could see the execution plans for the statements.

Edward Dortland Edward Dortland 2,87520 silver badges24 bronze badges · Accepted Answer · 2012-09-05 12:45:46Z