3

I am responsible for a poorly designed legacy application that has 700+ MSSQL tables and plenty of columns that are completely unused. I would like to run a script to return the column and table name of every column that has only null records. Does anyone have any suggestions on I could accomplish this?

Results from select * from sys.schemas

asked Mar 26, 2021 at 23:33
2
  • Dynamic SQL or procedural code (in an application as opposed to T-SQL) as you'll essentially need to query every table. It's not a super easy task, nor will it likely be quick to run either. You'll likely want to use the system DMVs like sys.tables and sys.columns to help you get a list of your Tables and their nullable columns. Commented Mar 26, 2021 at 23:41
  • Once you get the list of NULL columns be careful with what you're gonna do with them. Unfortunately the poorly designed legacy app might reference those columns (to insert NULL values) so just removing them can cause you problems. Commented Mar 27, 2021 at 10:54

2 Answers 2

2

As per my comment, the actual querying of the tables to check which nullable columns are completely NULL is the more difficult part to this problem.

To get you started you can use Dynamic SQL and the system DMVs to build the queries you'll need like this:

-- Temp table to store the results for later
DROP TABLE IF EXISTS #Results
CREATE TABLE #Results (TableName VARCHAR(1000), ColumnName VARCHAR(1000), RowsCount INT, NonNullCount INT)
-- Dynamic SQL variable to execute after we build it out recursively
DECLARE @DynamicSQL NVARCHAR(MAX) = ''
-- Uses the sys DMVs to recursively build which columns and tables to query
SELECT @DynamicSQL = @DynamicSQL + 
'
 INSERT INTO #Results
 SELECT ' + S.[name] + '.' + T.[name] + ' AS TableName, ' + C.[name] + ' AS ColumnName, COUNT(*) AS RowsCount, COUNT([' + C.[name] + ']) AS NonNullCount 
 FROM [' + S.[name] + '].[' + T.[name] + '];
'
FROM sys.Tables AS T
INNER JOIN sys.Schemas AS S
 ON T.[schema_id] = S.[schema_id]
INNER JOIN sys.Columns AS C
 ON OBJECT_ID(S.[name] + '.' + T.[name]) = C.[object_id]
WHERE C.is_nullable = 1;
-- PRINT @DynamicSQL -- Used for debugging
-- Executes the above dynamic SQL
EXEC sp_ExecuteSQL @DynamicSQL; 
-- Get the final results of which columns only contain NULLs
SELECT TableName, ColumnName
FROM #Results
WHERE RowsCount > 0 -- If the RowsCount is 0 then the table is empty so can't determine if the column only contains NULLs when there's no rows to begin with (but you can remove this predicate if you want to include empty tables too)
 AND NonNullCount = 0 -- If the non null count is 0 then the column only contains NULLs
ORDER BY TableName, ColumnName

The above theoretically should answer the question. I had to type it on my phone and from memory, so if there's any minor syntactical errors, let me know and I can easily fix. You can also switch the comments around on the line that prints the dynamic SQL with the one that executes it, to help debug any syntactical issues with the dynamic SQL.

answered Mar 27, 2021 at 0:01
14
  • 2
    you typed this on your phone? my hat is off to you, sir. I do suggest that instead of counting the non-null rows that a simple exist would be more efficient and fit requirements without the need to do a full table scan x column count for 700+ tables... although if mostly empty probably a wash. Commented Mar 27, 2021 at 2:20
  • @JonathanFite Yea haha, most times I only have my phone at hand, and will write all my answers while the website is in mobile mode / without the desktop version of the website's editor. Yea there's definitely a lot of performance improvements I'm sure can be made; I just decided to type what first came to mind lol. For example, it may be more performant to UNION ALL the individual queries of the dynamic SQL as opposed to doing 700+ INSERT INTOs but would need to be tested. I've found too many UNION ALL operators can even cause the SQL optimizer to falter. Commented Mar 27, 2021 at 2:28
  • 1
    @JonathanFite Also we share the same first name, so my hat is off to you as well, cheers. 🙂 Commented Mar 27, 2021 at 2:35
  • To improve performance, check for existance rather than counting rows, i.e. IF EXISTS(SELECT 1 FROM mytable WHERE mycolumn IS NOT NULL) SELECT 'mytable.mycolumn' Commented Mar 29, 2021 at 10:21
  • 1
    This was very helpful but the INNER JOIN on sys.Columns does not retrieve all of my tables. We have some tables using the dbo schema and the script above will get those but we also have alot of old tables using a 'informix' schema. Is there a way I can include those as well? Commented Mar 29, 2021 at 16:26
0

I did some little changes and tested on my DB:

-- Temp table to store the results for later
 DROP TABLE IF EXISTS #Results
 CREATE TABLE #Results (TableName VARCHAR(1000), ColumnName VARCHAR(1000), RowsCount INT, NonNullCount INT)
 
-- Dynamic SQL variable to execute after we build it out recursively
 DECLARE @DynamicSQL NVARCHAR(MAX) = ''
 DECLARE @MyCursor CURSOR;
 BEGIN
 SET @MyCursor = CURSOR FOR
 -- Uses the sys DMVs to recursively build which columns and tables to query
 SELECT 
 ' INSERT INTO #Results
 SELECT ''' + S.[name] + '.' + T.[name] + ''' AS TableName, ''' + C.[name] + ''' AS ColumnName, COUNT(*) AS RowsCount, COUNT([' + C.[name] + ']) AS NonNullCount 
 FROM [' + S.[name] + '].[' + T.[name] + '];'
 FROM sys.Tables AS T
 INNER JOIN sys.Schemas AS S
 ON T.[schema_id] = S.[schema_id]
 INNER JOIN sys.Columns AS C
 ON OBJECT_ID(S.[name] + '.' + T.[name]) = C.[object_id]
 WHERE C.is_nullable = 1 order by T.[name];
 
 OPEN @MyCursor 
 FETCH NEXT FROM @MyCursor 
 INTO @DynamicSQL
 
 WHILE @@FETCH_STATUS = 0
 BEGIN
 PRINT @DynamicSQL
 EXEC sp_ExecuteSQL @DynamicSQL; 
 /*
 YOUR ALGORITHM GOES HERE 
 */
 FETCH NEXT FROM @MyCursor 
 INTO @DynamicSQL 
 END; 
 
 CLOSE @MyCursor ;
 DEALLOCATE @MyCursor;
 END;
 
-- Get the final results of which columns only contain NULLs
 SELECT TableName, ColumnName
 FROM #Results
 WHERE RowsCount > 0 -- If the RowsCount is 0 then the table is empty so can't determine if the column only contains NULLs when there's no rows to begin with (but 
 you can remove this predicate if you want to include empty tables too)
 AND NonNullCount = 0 -- If the non null count is 0 then the column only contains NULLs
 ORDER BY TableName, ColumnName
answered Oct 6, 2024 at 11:36
1
  • It would have been better to edit the previous answer to include the little changes. Commented Oct 6, 2024 at 12:19

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.