How is IF NOT EXISTS SELECT THEN INSERT faster than UNIQUE index?

Question 1

In SQL Server how is...

Sp:

CREATE PROCEDURE insertToTable
 @field1 VARCHAR(256), @field2 varchar(256), @field3 varchar(256)
AS
BEGIN
 SET NOCOUNT ON
 IF NOT EXISTS (SELECT * FROM my_table WHERE field1 = @field1)
 INSERT INTO my_table
 (field1, field2, field3)
 VALUES (@field1, @field2, @field3);
 ELSE
 THROW 50000, 'xxxxxx', 1;
 END
GO

Table:

CREATE TABLE my_table (
 field1 VARCHAR(256) NOT NULL,
 field2 VARCHAR(256) NOT NULL,
 field3 VARCHAR(256) NOT NULL
);
CREATE INDEX idx_field1 ON my_table(field1);

the above faster than the below?

Sp:

CREATE PROCEDURE insertToTable
 @field1 VARCHAR(256), @field2 varchar(256), @field3 varchar(256)
AS
BEGIN
 SET NOCOUNT ON
 INSERT INTO my_table
 (field1, field2, field3)
 VALUES (@field1, @field2, @field3);
GO

Table:

CREATE TABLE my_table (
 field1 VARCHAR(256) NOT NULL,
 field2 VARCHAR(256) NOT NULL,
 field3 VARCHAR(256) NOT NULL
);
CREATE UNIQUE INDEX idx_field1 ON my_table(field1);

Sample input:

field1: F56yCgZ9AEm9aFpTyjwhERtqNeglYEow
field2: BD84CE2A514316164B7448C804B178AD8F6F597E8EC6F25F4D6E36287259C65F67E7206E82A4F8EFD2389C0821C0C70E8278DC5F166D220356B5A15A091A6C170ドル
field3: A18E9049117A77E6A4D41C6CA3FFDEA65D842BF1F57705405B4E66969531D93D

The input is generated on the fly by the web application and using prepared statements. I use Jmeter to generate requests to my web app.

With UNIQUE index, insert performance degrades after 100K inserts and gets worse.

With NON UNIQUE index and a manual check with IF NOT EXISTS SELECT, performance is constant even with millions of records inserted.

The values are unique enough that there never is a duplicate generated. Even after a few million values inserted.

Question 2

Might have to do wit not getting exclusive locks and not rolling back the transaction.

Question 3

I have difficulties to get the whole picture here. Can you post the full SQL code? For both cases...

Question 4

How are you measuring speed? Are you using SET STATISTICS TIME ON? The faster query may be going parallel (using more CPU time) and the slower query may be running single threaded (less CPU time).

Question 5

Please post a full SQL script that demonstrates the phenomenon you are asking about

Question 6

so I have created a script and run it against a newly created SQL Server 2017 database with log and data file both presized to 4GB. Non unique index and home made (and non thread safe) unique check took 231.101 seconds. One with unque index took 194.940 seconds. Script used pastebin.com/nCB7QFt3. Script output pastebin.com/eHDfdHhc

Question 7

FINAL UPDATE:

It's the INSERT that's really slowing things down.

When a Unique index is in place, with every new record you add SQL has to check if the value already exists. As the table grows, the amount of cross-referencing increases. A non-unique index will require no cross-referencing so performance is constant.

Unique indexes are usually faster for SELECT statements but that comes at a cost when updating a table.

Below is why the SELECT can sometimes be slower on a Unique Index

I've partially recreated your situation to the point where I think it is down to combination of Parameter sniffing and SQL prefers using the NON-UNIQUE index on a HEAP.

Set up 2 test tables, one of them is a heap (just like your table).

CREATE TABLE dbo.TEST1(ID VARCHAR(255) NOT NULL,TXT1 VARCHAR(255) NOT NULL,TXT2 VARCHAR(255) NOT NULL)
CREATE TABLE dbo.TEST2(ID VARCHAR(255) NOT NULL,TXT1 VARCHAR(255) NOT NULL,TXT2 VARCHAR(255) NOT NULL)
GO
INSERT INTO dbo.TEST1 VALUES(NEWID(),NEWID(),NEWID())
GO 30000
INSERT INTO dbo.TEST2
SELECT * FROM dbo.TEST1
GO
CREATE CLUSTERED INDEX cidx ON dbo.TEST1 (ID)
CREATE INDEX idx_nu ON dbo.TEST1 (ID)
CREATE UNIQUE INDEX idx_u ON dbo.TEST1 (ID)
CREATE INDEX idx_nu ON dbo.TEST2 (ID)
CREATE UNIQUE INDEX idx_u ON dbo.TEST2 (ID)

Review the footprint of the Indexes, on the HEAP the UNIQUE index has a smaller footprint than the NON-UNIQUE index. (perhaps the pages of the NON-UNIQUE index contain extra - possibly useful - information)(note: after running the above code multiple times the page count doesn't differ, probably due to caching, alter the "GO 30000" to rectify the issue.)

SELECT
 s.name AS SchemaName,
 t.name AS TableName,
 i.name AS IndexName,
 p.row_count,
 SUM (p.used_page_count) as used_pages_count,
 SUM (CASE
 WHEN (i.index_id < 2) THEN (in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count)
 ELSE lob_used_page_count + row_overflow_used_page_count
 END) as pages
FROM 
 sys.dm_db_partition_stats AS p 
 JOIN sys.tables AS t 
 ON 
 p.object_id = t.object_id
 JOIN sys.indexes AS i 
 ON 
 i.[object_id] = t.[object_id] 
 AND 
 p.index_id = i.index_id
 JOIN sys.schemas AS s 
 ON
 t.schema_id = s.schema_id
WHERE
 t.name IN ('TEST1','TEST2')
GROUP BY 
 s.name
 ,t.name
 ,i.name
 ,p.row_count

Now query the tables with literals and variables.

--SCAN of the UNIQUE index
DECLARE @account_id VARCHAR(255) = (SELECT TOP 1 ID FROM dbo.TEST2 WHERE ID like '%A%') 
--Parameter Sniffing kicks in --The optimiser doesn't know the value of @account_id
--SEEK of the CLUSTERED index 
DECLARE @ID1 VARCHAR(255) = (SELECT TOP 1 ID FROM dbo.TEST1 WHERE ID = @account_id)
--SEEK of the NON UNIQUE index
DECLARE @ID2 VARCHAR(255) = (SELECT TOP 1 ID FROM dbo.TEST2 WHERE ID = @account_id)

For some reason SQL prefers the NON UNIQUE index on a HEAP when performing SEEK operations.

Here's what I think is going on. When the Non-Unique index has more Pages then the corresponding Histogram in the STATS has more STEPS, run below code.

DBCC SHOW_STATISTICS ( 'TEST2' , 'idx_nu' )
DBCC SHOW_STATISTICS ( 'TEST2' , 'idx_u' )

The additional STEPS create a more granular view of the underlying Index, so the Optimiser (knowing that the EQ_ROWS is always 1) is getting a better Cardinality Estimate from the Non-Unique index.

Question 8

So why bother with UNIQUE index when IF NOT EXISTS SELECT does a better job?

Question 9

@user547 a unique index should be faster when doing Select statements on tables, but this comes at a cost when performing updates. It depends on the kind of activity your table will mostly be subject to. If possible I'd disable the Unique Index during the INSERTS then rebuild the Index when the inserts are complete. My select example from above is a very rare case that I just found interesting. ALTER INDEX idx ON dbo.Mytable DISABLE; ALTER INDEX idx ON dbo.Mytable REBUILD;

Question 10

Ah can't disable. It's a "real time" app. Meaning "users" can register themselves as needed. You know like a normal Signup page.

Question 11

Your argument doesn't really make sense because they are manually doing the same "cross referencing" as the index does. But when done as part of an insert this should be trivial as it needs to locate the right place in the index to insert the row anyway.

Question 12

They haven't supplied any repro that we can actually run that demonstrates this phenomenon even exists though. It could be an error with their benchmarking methodology

pacreely pacreely 1,0981 gold badge7 silver badges15 bronze badges · Answer 1 · 2018-06-01 09:37:56Z

FINAL UPDATE:

It's the INSERT that's really slowing things down.

When a Unique index is in place, with every new record you add SQL has to check if the value already exists. As the table grows, the amount of cross-referencing increases. A non-unique index will require no cross-referencing so performance is constant.

Unique indexes are usually faster for SELECT statements but that comes at a cost when updating a table.

Below is why the SELECT can sometimes be slower on a Unique Index

I've partially recreated your situation to the point where I think it is down to combination of Parameter sniffing and SQL prefers using the NON-UNIQUE index on a HEAP.

Set up 2 test tables, one of them is a heap (just like your table).

CREATE TABLE dbo.TEST1(ID VARCHAR(255) NOT NULL,TXT1 VARCHAR(255) NOT NULL,TXT2 VARCHAR(255) NOT NULL)
CREATE TABLE dbo.TEST2(ID VARCHAR(255) NOT NULL,TXT1 VARCHAR(255) NOT NULL,TXT2 VARCHAR(255) NOT NULL)
GO
INSERT INTO dbo.TEST1 VALUES(NEWID(),NEWID(),NEWID())
GO 30000
INSERT INTO dbo.TEST2
SELECT * FROM dbo.TEST1
GO
CREATE CLUSTERED INDEX cidx ON dbo.TEST1 (ID)
CREATE INDEX idx_nu ON dbo.TEST1 (ID)
CREATE UNIQUE INDEX idx_u ON dbo.TEST1 (ID)
CREATE INDEX idx_nu ON dbo.TEST2 (ID)
CREATE UNIQUE INDEX idx_u ON dbo.TEST2 (ID)

Review the footprint of the Indexes, on the HEAP the UNIQUE index has a smaller footprint than the NON-UNIQUE index. (perhaps the pages of the NON-UNIQUE index contain extra - possibly useful - information)(note: after running the above code multiple times the page count doesn't differ, probably due to caching, alter the "GO 30000" to rectify the issue.)

SELECT
 s.name AS SchemaName,
 t.name AS TableName,
 i.name AS IndexName,
 p.row_count,
 SUM (p.used_page_count) as used_pages_count,
 SUM (CASE
 WHEN (i.index_id < 2) THEN (in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count)
 ELSE lob_used_page_count + row_overflow_used_page_count
 END) as pages
FROM 
 sys.dm_db_partition_stats AS p 
 JOIN sys.tables AS t 
 ON 
 p.object_id = t.object_id
 JOIN sys.indexes AS i 
 ON 
 i.[object_id] = t.[object_id] 
 AND 
 p.index_id = i.index_id
 JOIN sys.schemas AS s 
 ON
 t.schema_id = s.schema_id
WHERE
 t.name IN ('TEST1','TEST2')
GROUP BY 
 s.name
 ,t.name
 ,i.name
 ,p.row_count

Now query the tables with literals and variables.

--SCAN of the UNIQUE index
DECLARE @account_id VARCHAR(255) = (SELECT TOP 1 ID FROM dbo.TEST2 WHERE ID like '%A%') 
--Parameter Sniffing kicks in --The optimiser doesn't know the value of @account_id
--SEEK of the CLUSTERED index 
DECLARE @ID1 VARCHAR(255) = (SELECT TOP 1 ID FROM dbo.TEST1 WHERE ID = @account_id)
--SEEK of the NON UNIQUE index
DECLARE @ID2 VARCHAR(255) = (SELECT TOP 1 ID FROM dbo.TEST2 WHERE ID = @account_id)

For some reason SQL prefers the NON UNIQUE index on a HEAP when performing SEEK operations.

Here's what I think is going on. When the Non-Unique index has more Pages then the corresponding Histogram in the STATS has more STEPS, run below code.

DBCC SHOW_STATISTICS ( 'TEST2' , 'idx_nu' )
DBCC SHOW_STATISTICS ( 'TEST2' , 'idx_u' )

The additional STEPS create a more granular view of the underlying Index, so the Optimiser (knowing that the EQ_ROWS is always 1) is getting a better Cardinality Estimate from the Non-Unique index.

So why bother with UNIQUE index when IF NOT EXISTS SELECT does a better job?
@user547 a unique index should be faster when doing Select statements on tables, but this comes at a cost when performing updates. It depends on the kind of activity your table will mostly be subject to. If possible I'd disable the Unique Index during the INSERTS then rebuild the Index when the inserts are complete. My select example from above is a very rare case that I just found interesting. ALTER INDEX idx ON dbo.Mytable DISABLE; ALTER INDEX idx ON dbo.Mytable REBUILD;
Ah can't disable. It's a "real time" app. Meaning "users" can register themselves as needed. You know like a normal Signup page.
Your argument doesn't really make sense because they are manually doing the same "cross referencing" as the index does. But when done as part of an insert this should be trivial as it needs to locate the right place in the index to insert the row anyway.
They haven't supplied any repro that we can actually run that demonstrates this phenomenon even exists though. It could be an error with their benchmarking methodology

Stack Exchange Network

How is IF NOT EXISTS SELECT THEN INSERT faster than UNIQUE index?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How is IF NOT EXISTS SELECT THEN INSERT faster than UNIQUE index?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions