I have a master product table as like below :
CREATE TABLE dbo.[products](
[id] INT NOT NULL IDENTITY(1, 1),
[product_code] VARCHAR(100) NOT NULL UNIQUE,
[price] FLOAT NOT NULL,
[brand] VARCHAR(100) NOT NULL,
[colour] VARCHAR(100) NOT NULL
);
So, if I create this table, a clustered index
will be created on id
column and a non-clustered index
on product_code
column.
And I am using this table in a website to show the products. And I will be using the sql query like below.
Query 1
SELECT * FROM dbo.[products]
WHERE [brand] IN ('brand1', 'brand2', 'brand3', 'brand4', 'brand5')
AND [colour] IN ('colour1', 'colour1', 'colour1')
ORDER BY [product_code];
And there is an another option to search for bulk of product_codes like below.
Query 2
SELECT * FROM dbo.[products]
WHERE [product_code] IN ('product_code1', 'product_code2', 'product_code3');
The conditions can be more in Query 1 and product codes can be more in Query 2.
Did I created the index properly?
Or is there any better way to improve the performance?
2 Answers 2
Short answer: No, you need more indexes.
If you're going to create a significant number of queries involving [brand]
and [colour]
, you should index both columns:
CREATE INDEX products_colour_idx ON dbo.products (colour);
CREATE INDEX products_brand_idx ON dbo.products (brand);
If you mostly query using both columns, and there are not queries involving colour
that do not also involve a brand
, a multi-column index would be better, because by using just one index, the database can retrieve all the relevant rows. You would create it this way:
CREATE INDEX products_brand_colours_idx ON dbo.products (brand, colour);
The order of the columns [(brand, colour) vs. (colour, brand)] should be normally chosen in such a way that the most selective one comes first (basically: the one with more different values, first).
If you happen to have a significant number of queries with brand
, colour
and both brand
and colour
, you should have at least the two first indexes; if you need the fastest possible speed, have one index for (brand, colour) and another one for (colour). You don't need an index on purpose for (brand), the (brand, colour) one is good for looking for brand
s.
[product_code]
will be indexed by the database automatically to enforce the UNIQUE
constraint on it. See Create Unique Constraints and Unique Constraints and Unique Indexes. You don't need to create an index explicitly even if query_2 is frequent.
-
Comments are not for extended discussion; this conversation has been moved to chat.2017年02月09日 02:32:02 +00:00Commented Feb 9, 2017 at 2:32
For the multicolumn index, see joanolo's post. I'd like to raise attention to your clustered index however.
A clustered index is the most efficient index a table can have. Since your product code is already unique, you might want to consider making it clustered instead. But there are considerations.
The main reason for why it's usually advised to create a clustered index on an identity column, is because a) a clustered index controls the order of the data on the disk, so you want to have it in a column with an ever increasing value to avoid fragmentation. And b) usually this identity column is also the primary key, which means other tables may contain foreign key references to that field, which means that if it's clustered, all such relational queries perform much faster due to the clustered index not only being the fastest, but also containing all the information of the same row (even other columns on the same row).
The full picture is difficult to explain in short here, I suggest you read up on it more. But basically the point is this:
If you know that you will NOT get many new rows, OR if you know you can handle the fragmentation (by for example using an appropriate compromise with index fillfactor, or reasonably regular index maintenance), OR if you know that from a business logic point of view, the product codes will be added in an ever increasing numerical / alphabetical order which means there will be no fragmentation in the first place... AND if you know that there aren't too many references to the identity column, or that the queries which those references are used for, can handle it... In this case, it might be optimal to create a clustered index on product_code instead, and if required, then place a nonclustered index (primary key or otherwise) on the identity column.
Note, do NOT do any of this unless you know what you're doing. But from an optimizing point of view, I figured it would be good to mention options. Your current setup, with joanolo's suggestions, matches the default recommendations. Anything more, including everything I've said in this post, assumes you have a far more detailed understanding of what's going on, how the data is structured, and used.
-
Thanks alot for your answer. I will be moving forward with the primary key to the identity column and unique key to the product_code column and multicolumn index to the other columns which are frequently using.Ullas– Ullas2017年01月18日 10:21:35 +00:00Commented Jan 18, 2017 at 10:21
Explore related questions
See similar questions with these tags.
select * from...
. Why? Because of performance. Why transport data that you don't require. Why is "Select * from table" considered bad practice and Why is SELECT * considered harmful? and What is the reason not to use select *?.select *....
.