Non-clustered indexes for columns defined within a composite Primary Key

Question 1

I have a multi-tenant database where I am utilizing an AccountId column within all tables as a part of the composite Primary Key for tenant isolation. Is it beneficial to create additional non-clustered indexes on each column that is part of the composite Primary Key to assist SQL Server with maintaining accurate statistics and improving query performance when joining to lookup tables?

For instance, within an associative table that defines one-to-many relationships between an Account and the American State in which they have offices, in theory which of the two options is preferable given the following structure and sample query?

Create the Account and State tables and populate with sample data.

DROP TABLE IF EXISTS [dbo].[Account];
DROP TABLE IF EXISTS [dbo].[State];
-- [Account] table and sample values.
IF OBJECT_ID('[dbo].[Account]', 'U') IS NULL
BEGIN
 CREATE TABLE [dbo].[Account] (
 [AccountId] [int] IDENTITY(1,1) NOT NULL
 ,[AccountAlias] [varchar](3) NOT NULL
 ,[AccountName] [varchar](128) NOT NULL
 ,CONSTRAINT [PK_Account] PRIMARY KEY CLUSTERED ([AccountId] ASC)
 ,CONSTRAINT [UQ_Account_Alias] UNIQUE NONCLUSTERED ([AccountAlias] ASC)
 ,CONSTRAINT [UQ_Account_Name] UNIQUE NONCLUSTERED ([AccountName] ASC)
 );
 SET IDENTITY_INSERT [dbo].[Account] ON;
 INSERT INTO [dbo].[Account] ([AccountId], [AccountAlias], [AccountName])
 VALUES (1, 'SA1', 'Sample Account 1'), (2, 'SA2', 'Sample Account 2'), (3, 'SA3', 'Sample Account 3')
 SET IDENTITY_INSERT [dbo].[Account] OFF;
END;
GO
-- [State] table and sample values.
IF OBJECT_ID('[dbo].[State]', 'U') IS NULL
BEGIN
 CREATE TABLE [dbo].[State] (
 [StateId] [tinyint] IDENTITY(1,1) NOT NULL
 ,[StateCode] [varchar](2) NOT NULL
 ,[StateName] [varchar](32) NOT NULL
 ,CONSTRAINT [PK_State] PRIMARY KEY CLUSTERED ([StateId] ASC)
 ,CONSTRAINT [UQ_State_Code] UNIQUE NONCLUSTERED ([StateCode] ASC)
 ,CONSTRAINT [UQ_State_Name] UNIQUE NONCLUSTERED ([StateName] ASC)
 );
 SET IDENTITY_INSERT [dbo].[State] ON;
 INSERT INTO [dbo].[State] ([StateId], [StateCode], [StateName])
 VALUES (1, 'AL', 'Alabama'), (2, 'AK', 'Alaska'), (3, 'AZ', 'Arizona'), (4, 'AR', 'Arkansas'), (5, 'CA', 'California')
 SET IDENTITY_INSERT [dbo].[State] OFF;
END;
GO

Create AccountState OPTION 1 - composite Primary Key only

DROP TABLE IF EXISTS [dbo].[AccountState];
IF OBJECT_ID('[dbo].[AccountState]', 'U') IS NULL
BEGIN
 CREATE TABLE [dbo].[AccountState] (
 [AccountId] [int] NOT NULL
 ,[StateId] [tinyint] NOT NULL
 ,CONSTRAINT [PK_AccountState] PRIMARY KEY CLUSTERED ([AccountId] ASC, [StateId] ASC)
 ,CONSTRAINT [FK_AccountState_Account] FOREIGN KEY ([AccountId]) REFERENCES [dbo].[Account]([AccountId])
 ,CONSTRAINT [FK_AccountState_State] FOREIGN KEY ([StateId]) REFERENCES [dbo].[State]([StateId])
 );
 INSERT INTO [dbo].[AccountState] ([AccountId], [StateId])
 SELECT A.[AccountId], S.[StateId]
 FROM [dbo].[Account] A CROSS JOIN [dbo].[State] S
 WHERE A.[AccountId] = 1 AND S.[StateId] IN (1, 2, 3)
 UNION
 SELECT A.[AccountId], S.[StateId]
 FROM [dbo].[Account] A CROSS JOIN [dbo].[State] S
 WHERE A.[AccountId] = 2 AND S.[StateId] IN (3, 4, 5)
 UNION
 SELECT A.[AccountId], S.[StateId]
 FROM [dbo].[Account] A CROSS JOIN [dbo].[State] S
 WHERE A.[AccountId] = 3 AND S.[StateId] IN (1, 3, 5)
END;
GO

Create AccountState OPTION 2 - composite Primary Key + non-clustered indexes

DROP TABLE IF EXISTS [dbo].[AccountState];
IF OBJECT_ID('[dbo].[AccountState]', 'U') IS NULL
BEGIN
 CREATE TABLE [dbo].[AccountState] (
 [AccountId] [int] NOT NULL
 ,[StateId] [tinyint] NOT NULL
 ,CONSTRAINT [PK_AccountState] PRIMARY KEY CLUSTERED ([AccountId] ASC, [StateId] ASC)
 ,CONSTRAINT [FK_AccountState_Account] FOREIGN KEY ([AccountId]) REFERENCES [dbo].[Account]([AccountId])
 ,CONSTRAINT [FK_AccountState_State] FOREIGN KEY ([StateId]) REFERENCES [dbo].[State]([StateId])
 ,INDEX [IX_AccountState_Account] NONCLUSTERED ([AccountId])
 ,INDEX [IX_AccountState_State] NONCLUSTERED ([StateId])
 );
 INSERT INTO [dbo].[AccountState] ([AccountId], [StateId])
 SELECT A.[AccountId], S.[StateId]
 FROM [dbo].[Account] A CROSS JOIN [dbo].[State] S
 WHERE A.[AccountId] = 1 AND S.[StateId] IN (1, 2, 3)
 UNION
 SELECT A.[AccountId], S.[StateId]
 FROM [dbo].[Account] A CROSS JOIN [dbo].[State] S
 WHERE A.[AccountId] = 2 AND S.[StateId] IN (3, 4, 5)
 UNION
 SELECT A.[AccountId], S.[StateId]
 FROM [dbo].[Account] A CROSS JOIN [dbo].[State] S
 WHERE A.[AccountId] = 3 AND S.[StateId] IN (1, 3, 5)
END;
GO

Sample query

SELECT
 A.[AccountName]
 ,S.[StateName]
FROM
 [dbo].[AccountState] A_S
 JOIN [dbo].[Account] A
 ON A_S.[AccountId] = A.[AccountId]
 JOIN [dbo].[State] S
 ON A_S.[StateId] = S.[StateId]
WHERE
 S.[StateCode] = 'CA'

Of the two options, what type of index combination would be seen as most suitable to scale? Composite primary key only or composite primary key plus additional non-clustered indexes? Or is there another option that is more viable?

Question 2

I'm not sure that the sample query is a fair example to base a recommendation on. The sample query is not a typical multi-tenant application query since it is not specific to a particular client. It is more of a support or management query looking to gain insight about all (or at least multiple) customers. Of course, it could also be maintenance related (e.g. garbage collection would look for oldest dates and not care about AccountId). So let's separate this out:

General

I don't see any benefit to ever having a non-clustered index that is solely the leading / left-most key column in the clustered index. The clustered index is already in that order and so statistics exist for it. So, the IX_AccountState_Account index in Option 2 is purely waste, and a drag on the system.
Support / Management / Maintenance queries

These queries, especially the maintenance ones, can work across AccountId values. As such, some queries will certainly benefit from non-clustered indexes on clustered index key columns that are not AccountId (or to put it more generally: that are not the left-most / leading key column). This assumes that you have queries that filter / sort on only the entity ID. Not having any queries like that means you probably don't need this index.
Application queries

These queries should always include AccountId, so I don't see how they would benefit from an index that is solely on the entity ID.

While I have worked on multi-tenant systems similar to what you are describing, I just thought of something that never occurred to me before: since the clustered index statistics are based on the left-most / leading column and not the combination of key columns, it might be beneficial to manually create statistics for those objects. I seem to recall that it is just the density portion that will account for all key columns (not just the leading column), but that might help the query optimizing. This needs to be tested as I have not tried it (and I won't be able to do such a test anytime soon).

Question 3

I agree that the sample query isn't the best example as I shared a stripped down table set to be as concise as possible with a hopeful emphasis on the JOINs to the lookup tables. My uncertainty was whether the leading key column would be utilized when it is the only column of the ON clause. It appears it will use it per point 1. Following that, it seems that StateId would need an NCI to assist with the JOIN. Point 2 will be something to dig into on a query-by-query basis per the execution plans. Point 3 has been ingrained into all my developers - always include AccountId.

Question 4

@PicoDeGallo Whether or not to add a NCI on StateId is part of Point 2. Only do it if you have cross-customer queries that filter / order based on StateId. But it shouldn't be needed for queries like the sample query because it is the given value that will be looked-up in the parent table. I don't believe it would be used in the context of the sample query, but you should test on your system with your data (actuall execution plan, not estimated) to see if it is used. I am also curious if the manual stats on the same composite keys as the clustered index helps.

Question 5

Indeed, I should have clarified. I was leaning towards needing the NCI based off of existing queries within the system that do perform such a JOIN as opposed to the sample query. We have several associative tables where we have a composite Primary Key where the secondary column is a lookup table (such as State). At present we do not have any NCI on these table columns, but I will be carrying out a deeper performance analysis to see if SQL Server will positively leverage them. I will try to report back with my findings when I've reached that scope of work.

score 2 · Accepted Answer · 2019-03-06 17:05:02Z

I'm not sure that the sample query is a fair example to base a recommendation on. The sample query is not a typical multi-tenant application query since it is not specific to a particular client. It is more of a support or management query looking to gain insight about all (or at least multiple) customers. Of course, it could also be maintenance related (e.g. garbage collection would look for oldest dates and not care about AccountId). So let's separate this out:

General

I don't see any benefit to ever having a non-clustered index that is solely the leading / left-most key column in the clustered index. The clustered index is already in that order and so statistics exist for it. So, the IX_AccountState_Account index in Option 2 is purely waste, and a drag on the system.
Support / Management / Maintenance queries

These queries, especially the maintenance ones, can work across AccountId values. As such, some queries will certainly benefit from non-clustered indexes on clustered index key columns that are not AccountId (or to put it more generally: that are not the left-most / leading key column). This assumes that you have queries that filter / sort on only the entity ID. Not having any queries like that means you probably don't need this index.
Application queries

These queries should always include AccountId, so I don't see how they would benefit from an index that is solely on the entity ID.

While I have worked on multi-tenant systems similar to what you are describing, I just thought of something that never occurred to me before: since the clustered index statistics are based on the left-most / leading column and not the combination of key columns, it might be beneficial to manually create statistics for those objects. I seem to recall that it is just the density portion that will account for all key columns (not just the leading column), but that might help the query optimizing. This needs to be tested as I have not tried it (and I won't be able to do such a test anytime soon).

I agree that the sample query isn't the best example as I shared a stripped down table set to be as concise as possible with a hopeful emphasis on the JOINs to the lookup tables. My uncertainty was whether the leading key column would be utilized when it is the only column of the ON clause. It appears it will use it per point 1. Following that, it seems that StateId would need an NCI to assist with the JOIN. Point 2 will be something to dig into on a query-by-query basis per the execution plans. Point 3 has been ingrained into all my developers - always include AccountId.
@PicoDeGallo Whether or not to add a NCI on StateId is part of Point 2. Only do it if you have cross-customer queries that filter / order based on StateId. But it shouldn't be needed for queries like the sample query because it is the given value that will be looked-up in the parent table. I don't believe it would be used in the context of the sample query, but you should test on your system with your data (actuall execution plan, not estimated) to see if it is used. I am also curious if the manual stats on the same composite keys as the clustered index helps.
Indeed, I should have clarified. I was leaning towards needing the NCI based off of existing queries within the system that do perform such a JOIN as opposed to the sample query. We have several associative tables where we have a composite Primary Key where the secondary column is a lookup table (such as State). At present we do not have any NCI on these table columns, but I will be carrying out a deeper performance analysis to see if SQL Server will positively leverage them. I will try to report back with my findings when I've reached that scope of work.

Stack Exchange Network

Non-clustered indexes for columns defined within a composite Primary Key

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Non-clustered indexes for columns defined within a composite Primary Key

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions