I have a heap table that takes about 104 GB of disk space with almost 3 billion rows. I am trying to create a clustered index on this table on the [WeekEndingDate
] column. I have about 200 gb's free in the data file and about 280 gb's free in the tempdb.
I have tried two different methods. First was to create the index directly on the table with the following command:
CREATE CLUSTERED INDEX CX_WT_FOLD_HISTORY
ON WT_FOLD_HISTORY (WeekEndingDate ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON,
IGNORE_DUP_KEY = OFF
, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON,
DATA_COMPRESSION = PAGE)
I tried it both with SORT_IN_TEMPDB = ON
and OFF
. When using ON
it filled up the tempdb and with OFF
it filled the data drive.
Other method was to create a new blank table with the needed index and then insert the records from the heap into the new table. This failed as well after filling up the data drive.
Any other suggestions on what to do. Most things I've read stated that I would need about 1.2 times the size of the table to be used as a workspace while create the index. I have way more than that and it still fails. Any suggestions would be appreciated.
Here is my original heap table structure:
CREATE TABLE [dbo].[WT_FOLD_HISTORY](
[WeekEndingDate] [varchar](50) NULL,
[Division] [varchar](50) NULL,
[Store] [varchar](50) NULL,
[SKUNumber] [varchar](50) NULL,
[UPC] [varchar](50) NULL,
[SalesUnits] [varchar](50) NULL,
[SalesCost] [varchar](50) NULL,
[SalesRetail] [varchar](50) NULL,
[InventoryUnits] [varchar](50) NULL,
[InventoryCost] [varchar](50) NULL,
[InventoryRetail] [varchar](50) NULL,
[OnOrderUnits] [varchar](50) NULL,
[OnOrderCost] [varchar](50) NULL,
[OnOrderRetail] [varchar](50) NULL,
[ReceiptUnits] [varchar](50) NULL,
[ReceiptCost] [varchar](50) NULL,
[ReceiptRetail] [varchar](50) NULL,
[PermanentMarkdowns] [varchar](50) NULL,
[ReturnsToVendor] [varchar](50) NULL,
[POSMarkdowns] [varchar](50) NULL,
[TimeFK] [smallint] NULL,
[LocationFK] [int] NULL,
[ItemFK] [int] NULL
) ON [AcademySports_DataFG1]
3 Answers 3
If you've got a short-term need for disk space, one option would be to:
- Shrink tempdb temporarily, freeing up as much space on that drive as seems safe.
- Create a secondary data file for the DB the table is in on the tempdb drive.
- Add the clustered index to the table.
- Shrink the secondary file by migrating all data out of it.
- Remove the secondary file.
- Make sure the tempdb file is allowed to grow to its former size.
- Rebuild indexes in the table's DB (the removal of the secondary file will have caused some fragmentation).
NOTE: as others have suggested, I'd only do this after things like temporarily removing non-clustered indexes from the table in question. This in particular will allow the addition of the clustered index to go faster, as the non-clustered indexes would all have to be rebuilt anyway (with a clustered index in place, the index key is used to locate the rows in the table itself).
That's actually another point - how wide is the key on the clustered index? If you do have non-clustered indexes, and the key on the clustered index is significantly wider than the pointer into the heap was, then the non-clustered indexes will consume more space after the clustered index is created.
If the cluster key consists of several columns, or even one large column (say, a varchar
column with an average length of 25 or more), you may want to consider a surrogate key instead (usually a monotonically increasing value, for best INSERT
performance.
What is filling up your space is your mega-sort (you try to sort all of your 104Gb in a whole), so I think it can be solved doing sort on smaller portions. I suggest you to create the new clustered table and insert the data in small chunks like this:
declare @rowcount int = 1;
while @rowcount > 0
begin
delete top (5000)
from your_heap with(tablock)
output deleted.field1, ..., deleted.fieldN
into new_clustered_table;
set @rowcount = @@rowcount;
end;
This way you sort only 5000 rows at a time and the only problem is page splits that cannot be avoided since you don't make sorted insert. So when finished, the new_clustered_table will be fragmented but you can rebuild it after.
-
Yes, you are right, I updated my answer, but it was just an idea.sepupic– sepupic2017年08月18日 04:13:38 +00:00Commented Aug 18, 2017 at 4:13
Just a quick tip - consider to drop all the non-clustered indexes (if any) on this heap before attempting to create Clustered Index. You can script those non-CI along with their include columns detail and create them later again with those definition after the Clustered Index is successfully created.
Explore related questions
See similar questions with these tags.
DATA_COMPRESSION=NONE
? If that works, you could compress afterwards.