We have a SQL Server 2014 enterprise cluster. Server 1 and Server 2. SQL Server is limited to 64GB of memory.
We are using Ola Hallengren scripts for index rebuilding and optimizations once per week for all our databases.
Every time script triggers, it sometimes hangs on the same index
ALTER INDEX [SecretId] ON [someDB].[dbo].[Articles] REBUILD WITH (SORT_IN_TEMPDB = OFF, ONLINE = ON)
This is unique identifier field in the database. Table has approx 1.5 mil records.
When index rebuild hangs, we do a failover to another SQL server, restart service on first one, and failover back to instance one. After that, we re-run scripts for index rebuilding, and it passes well in a matter of minutes.
Why do we need to restart service in order to rebuild index successfully?
-
3What type of wait do you see when it "hangs"? Is it being blocked? What makes you determine that it's hung and not doing work? Also, can you clarify if this is a Failover Cluster Instance or an Availability Group? Your "Failover-restart-failback" description makes me think it's an AG...anon– anon2018年11月04日 12:10:07 +00:00Commented Nov 4, 2018 at 12:10
-
Is the column a UUID type column? That could potentially take a long time to rebuild. What does the index definition look like? Could you add that to the question? Thanks.John K. N.– John K. N.2020年11月09日 10:46:33 +00:00Commented Nov 9, 2020 at 10:46
2 Answers 2
I've come across this issue recently. In my case, Paul Williams' comment was spot on and it's to do with a complex nvarchar(max)
column I had that stores JSON and HTML:
Is this a table with a large
varbinary
ornvarchar(max)
column? I've seen unusual maintenance behaviors with these kinds tables.
I use the following query to narrow it down to the offending table, which is a slight variation to Ola's script at https://ola.hallengren.com/scripts/misc/CommandLogSelect.sql
The index with a null EndTime
is where the crash is occurring, most likely in the last row returned.
use [master];
SELECT DatabaseName,
SchemaName,
ObjectName,
CASE WHEN ObjectType = 'U' THEN 'USER_TABLE' WHEN ObjectType = 'V' THEN 'VIEW' END AS ObjectType,
IndexName,
CASE WHEN IndexType = 1 THEN 'CLUSTERED' WHEN IndexType = 2 THEN 'NONCLUSTERED' WHEN IndexType = 3 THEN 'XML' WHEN IndexType = 4 THEN 'SPATIAL' END AS IndexType,
PartitionNumber,
ExtendedInfo.value('(ExtendedInfo/PageCount)[1]','int') AS [PageCount],
ExtendedInfo.value('(ExtendedInfo/Fragmentation)[1]','float') AS Fragmentation,
CommandType,
Command,
StartTime,
EndTime,
DATEDIFF(ss,StartTime, (CASE WHEN EndTime IS NULL THEN StartTime ELSE EndTime END)) AS Duration,
ErrorNumber,
ErrorMessage
FROM dbo.CommandLog
WHERE CommandType = 'ALTER_INDEX'
ORDER BY StartTime ASC;
Check the dbo.CommandLog table to see what the error is (for IndexOptimize)?
If it's lock timeout (from blocking), then obviously failover would resolve it as there shouldn't be blocking after a restart
You can also run sp_whoisactive to find the Head blocker
In my work's cases, sometimes even retry after 1 minute or 10 minutes would work (no need to failover) as the BLOCKING session has cleared/released the lock
But sometimes it'd still fail after retrying for an hour (inside SQL Job)
Let us know how it turns out
Good luck
Explore related questions
See similar questions with these tags.