How to create clustered index on 100 GB table

Question 1

I have a heap table that takes about 104 GB of disk space with almost 3 billion rows. I am trying to create a clustered index on this table on the [WeekEndingDate] column. I have about 200 gb's free in the data file and about 280 gb's free in the tempdb.

I have tried two different methods. First was to create the index directly on the table with the following command:

CREATE CLUSTERED INDEX CX_WT_FOLD_HISTORY
ON WT_FOLD_HISTORY (WeekEndingDate ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, 
IGNORE_DUP_KEY = OFF
, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, 
DATA_COMPRESSION = PAGE)

I tried it both with SORT_IN_TEMPDB = ON and OFF. When using ON it filled up the tempdb and with OFF it filled the data drive.

Other method was to create a new blank table with the needed index and then insert the records from the heap into the new table. This failed as well after filling up the data drive.

Any other suggestions on what to do. Most things I've read stated that I would need about 1.2 times the size of the table to be used as a workspace while create the index. I have way more than that and it still fails. Any suggestions would be appreciated.

Here is my original heap table structure:

CREATE TABLE [dbo].[WT_FOLD_HISTORY](
[WeekEndingDate] [varchar](50) NULL,
[Division] [varchar](50) NULL,
[Store] [varchar](50) NULL,
[SKUNumber] [varchar](50) NULL,
[UPC] [varchar](50) NULL,
[SalesUnits] [varchar](50) NULL,
[SalesCost] [varchar](50) NULL,
[SalesRetail] [varchar](50) NULL,
[InventoryUnits] [varchar](50) NULL,
[InventoryCost] [varchar](50) NULL,
[InventoryRetail] [varchar](50) NULL,
[OnOrderUnits] [varchar](50) NULL,
[OnOrderCost] [varchar](50) NULL,
[OnOrderRetail] [varchar](50) NULL,
[ReceiptUnits] [varchar](50) NULL,
[ReceiptCost] [varchar](50) NULL,
[ReceiptRetail] [varchar](50) NULL,
[PermanentMarkdowns] [varchar](50) NULL,
[ReturnsToVendor] [varchar](50) NULL,
[POSMarkdowns] [varchar](50) NULL,
[TimeFK] [smallint] NULL,
[LocationFK] [int] NULL,
[ItemFK] [int] NULL
) ON [AcademySports_DataFG1]

Question 2

When doing the "new table, move rows in batches" approach, are you deleting rows in the original table as you move them over? You may need to do some additional gymnastics to get the heap to release the unused space as you delete data.

Question 3

Might be of interest as to why a non-clustered index isn't acceptable in this case; [yes, I'm aware of the differences/benefits of clustered vs non-clustered ... just curious as to why you've ruled out a non-clustered index]; also, does the table already have any non-clustered indexes in place and if so how much space do they use? [wondering if dropping any current non-clustered indexes might free up enough space to create the clustered index?]

Question 4

Have you tried created the index with DATA_COMPRESSION=NONE? If that works, you could compress afterwards.

Question 5

nice question.i google it.and read this is what they said dba.stackexchange.com/questions/11956/… or stackoverflow.com/questions/2309889/… This is the only correct answer.

Question 6

Just to be certain, could you include the actual error message it fails with?

Question 7

If you've got a short-term need for disk space, one option would be to:

Shrink tempdb temporarily, freeing up as much space on that drive as seems safe.
Create a secondary data file for the DB the table is in on the tempdb drive.
Add the clustered index to the table.
Shrink the secondary file by migrating all data out of it.
Remove the secondary file.
Make sure the tempdb file is allowed to grow to its former size.
Rebuild indexes in the table's DB (the removal of the secondary file will have caused some fragmentation).

NOTE: as others have suggested, I'd only do this after things like temporarily removing non-clustered indexes from the table in question. This in particular will allow the addition of the clustered index to go faster, as the non-clustered indexes would all have to be rebuilt anyway (with a clustered index in place, the index key is used to locate the rows in the table itself).

That's actually another point - how wide is the key on the clustered index? If you do have non-clustered indexes, and the key on the clustered index is significantly wider than the pointer into the heap was, then the non-clustered indexes will consume more space after the clustered index is created.

If the cluster key consists of several columns, or even one large column (say, a varchar column with an average length of 25 or more), you may want to consider a surrogate key instead (usually a monotonically increasing value, for best INSERT performance.

Question 8

What is filling up your space is your mega-sort (you try to sort all of your 104Gb in a whole), so I think it can be solved doing sort on smaller portions. I suggest you to create the new clustered table and insert the data in small chunks like this:

declare @rowcount int = 1;
while @rowcount > 0
begin
 delete top (5000) 
 from your_heap with(tablock) 
 output deleted.field1, ..., deleted.fieldN 
 into new_clustered_table;
 set @rowcount = @@rowcount;
end;

This way you sort only 5000 rows at a time and the only problem is page splits that cannot be avoided since you don't make sorted insert. So when finished, the new_clustered_table will be fragmented but you can rebuild it after.

Question 9

Yes, you are right, I updated my answer, but it was just an idea.

Question 10

Just a quick tip - consider to drop all the non-clustered indexes (if any) on this heap before attempting to create Clustered Index. You can script those non-CI along with their include columns detail and create them later again with those definition after the Clustered Index is successfully created.

RDFozz RDFozz 11.7k4 gold badges25 silver badges38 bronze badges · Answer 1 · 2017-08-16 17:08:53Z

If you've got a short-term need for disk space, one option would be to:

Shrink tempdb temporarily, freeing up as much space on that drive as seems safe.
Create a secondary data file for the DB the table is in on the tempdb drive.
Add the clustered index to the table.
Shrink the secondary file by migrating all data out of it.
Remove the secondary file.
Make sure the tempdb file is allowed to grow to its former size.
Rebuild indexes in the table's DB (the removal of the secondary file will have caused some fragmentation).

NOTE: as others have suggested, I'd only do this after things like temporarily removing non-clustered indexes from the table in question. This in particular will allow the addition of the clustered index to go faster, as the non-clustered indexes would all have to be rebuilt anyway (with a clustered index in place, the index key is used to locate the rows in the table itself).

That's actually another point - how wide is the key on the clustered index? If you do have non-clustered indexes, and the key on the clustered index is significantly wider than the pointer into the heap was, then the non-clustered indexes will consume more space after the clustered index is created.

If the cluster key consists of several columns, or even one large column (say, a varchar column with an average length of 25 or more), you may want to consider a surrogate key instead (usually a monotonically increasing value, for best INSERT performance.

sepupic sepupic 11.3k18 silver badges27 bronze badges · Answer 2 · 2017-08-17 20:52:44Z

What is filling up your space is your mega-sort (you try to sort all of your 104Gb in a whole), so I think it can be solved doing sort on smaller portions. I suggest you to create the new clustered table and insert the data in small chunks like this:

declare @rowcount int = 1;
while @rowcount > 0
begin
 delete top (5000) 
 from your_heap with(tablock) 
 output deleted.field1, ..., deleted.fieldN 
 into new_clustered_table;
 set @rowcount = @@rowcount;
end;

This way you sort only 5000 rows at a time and the only problem is page splits that cannot be avoided since you don't make sorted insert. So when finished, the new_clustered_table will be fragmented but you can rebuild it after.

Yes, you are right, I updated my answer, but it was just an idea.

Channdeep Singh Channdeep Singh 551 silver badge6 bronze badges · Answer 3 · 2017-08-18 13:41:06Z

Just a quick tip - consider to drop all the non-clustered indexes (if any) on this heap before attempting to create Clustered Index. You can script those non-CI along with their include columns detail and create them later again with those definition after the Clustered Index is successfully created.

Stack Exchange Network

How to create clustered index on 100 GB table

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

How to create clustered index on 100 GB table

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions