1

I have a wide table (80 columns) with 70 million rows in it. I can currently trying to set a foreign key across all rows within the table.

Approach one An update statement over the entire table such as:

BEGIN TRAN
UPDATE Cx
SET Cx.CohortId = Chx.CohortId
FROM Customers Cx
INNER JOIN Cohorts Chx ON Chx.NameField = Cx.NameField
COMMIT TRAN

I was informed this has the potential to fill up the transaction log (please note recovery mode is currently set to simple) and it would be better to batch the updates:

Approach two Update in batches

DECLARE @Rows INT,
 @BatchSize INT; 
SET @BatchSize = 20000;
SET @Rows = @BatchSize; 
WHILE (@Rows = @BatchSize)
BEGIN
UPDATE TOP(@BatchSize) Cx
SET Cx.CohortId = Chx.CohortId
FROM Customers Cx
INNER JOIN Cohorts Chx ON Chx.NameField = Cx.NameField
WHERE Cx.CohortId IS NULL -- Do not update records that have already been uplifted to include the FK
SET @Rows = @@ROWCOUNT
END

Does the second approach yield any benefit in terms of the transaction log on large table updates?

asked Jul 24, 2018 at 12:11
2
  • There are temporary indexes in place to speed up my joins and there is an index on the column I am trying to populate. The execution plan actually tends to do non-clustered index scans rather than looking at the clustered index which is fine for an integer column for my needs. Commented Jul 24, 2018 at 12:30
  • This directly answers your question about the transaction log: Break large delete operations into chunks. Commented Jul 24, 2018 at 13:26

1 Answer 1

1

I was informed this has the potential to fill up the transaction log . . . and it would be better to batch the updates

It would be better if it didn't fill up the transaction log. But the proposed remedy is the wrong one.

You should size your transaction log to support your transactions, not the other way around.

Does the second approach yield any benefit in terms of the transaction log on large table updates?

Yes. The transaction log can be reused between batches, so long as all the batches aren't in a transaction.

increasing the disk space to support that would be troublesome

Yes, sometimes you do have to code around space issues, but coding isn't free. Spending money on infrastructure to reduce coding time is usually a wise move.

Also if you are trying to minimize log space, consider loading an empty table with your query, and then performing an ALTER TABLE ... SWITCH. Loading the new table can be minimally logged, and the ALTER TABLE is a metadata operation.

This will also enable you to perform the operation in a transaction, and not commit intermediate results.

answered Jul 24, 2018 at 13:18
4
  • Your point about the transaction log being appropriate is very true I feel. The issue would be that the table is 25gb and I will be doing three updates across the entire table. That would result in a 75gb transaction log wouldn't it? Unfortunately increasing the disk space to support that would be troublesome to say the least. Commented Jul 24, 2018 at 13:21
  • I'll take a look at the options. This is a one time operation which needs to be completed within the span of hours as it stands. I'll review the options with the relevant people tomorrow and see what would work - but your point about space being cheaper to add than dev time is probably apt. Commented Jul 24, 2018 at 13:37
  • @AndrewSecondary Also, have you considered using a Clustured Columnstore instead of a rowstore table? Commented Jul 24, 2018 at 13:40
  • @DavidBrowne-Microsoft yes we did in-fact use a clustered columnstore for a short while, however the system is heavily transactional and it caused some deadlocking issues. The system was performing incredibly quickly though with the clustered columnstore. Commented Jul 24, 2018 at 14:13

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.