I am responsible for a poorly designed legacy application that has 700+ MSSQL tables and plenty of columns that are completely unused. I would like to run a script to return the column and table name of every column that has only null records. Does anyone have any suggestions on I could accomplish this?
2 Answers 2
As per my comment, the actual querying of the tables to check which nullable columns are completely NULL
is the more difficult part to this problem.
To get you started you can use Dynamic SQL and the system DMVs to build the queries you'll need like this:
-- Temp table to store the results for later
DROP TABLE IF EXISTS #Results
CREATE TABLE #Results (TableName VARCHAR(1000), ColumnName VARCHAR(1000), RowsCount INT, NonNullCount INT)
-- Dynamic SQL variable to execute after we build it out recursively
DECLARE @DynamicSQL NVARCHAR(MAX) = ''
-- Uses the sys DMVs to recursively build which columns and tables to query
SELECT @DynamicSQL = @DynamicSQL +
'
INSERT INTO #Results
SELECT ' + S.[name] + '.' + T.[name] + ' AS TableName, ' + C.[name] + ' AS ColumnName, COUNT(*) AS RowsCount, COUNT([' + C.[name] + ']) AS NonNullCount
FROM [' + S.[name] + '].[' + T.[name] + '];
'
FROM sys.Tables AS T
INNER JOIN sys.Schemas AS S
ON T.[schema_id] = S.[schema_id]
INNER JOIN sys.Columns AS C
ON OBJECT_ID(S.[name] + '.' + T.[name]) = C.[object_id]
WHERE C.is_nullable = 1;
-- PRINT @DynamicSQL -- Used for debugging
-- Executes the above dynamic SQL
EXEC sp_ExecuteSQL @DynamicSQL;
-- Get the final results of which columns only contain NULLs
SELECT TableName, ColumnName
FROM #Results
WHERE RowsCount > 0 -- If the RowsCount is 0 then the table is empty so can't determine if the column only contains NULLs when there's no rows to begin with (but you can remove this predicate if you want to include empty tables too)
AND NonNullCount = 0 -- If the non null count is 0 then the column only contains NULLs
ORDER BY TableName, ColumnName
The above theoretically should answer the question. I had to type it on my phone and from memory, so if there's any minor syntactical errors, let me know and I can easily fix. You can also switch the comments around on the line that prints the dynamic SQL with the one that executes it, to help debug any syntactical issues with the dynamic SQL.
-
2you typed this on your phone? my hat is off to you, sir. I do suggest that instead of counting the non-null rows that a simple exist would be more efficient and fit requirements without the need to do a full table scan x column count for 700+ tables... although if mostly empty probably a wash.Jonathan Fite– Jonathan Fite2021年03月27日 02:20:21 +00:00Commented Mar 27, 2021 at 2:20
-
@JonathanFite Yea haha, most times I only have my phone at hand, and will write all my answers while the website is in mobile mode / without the desktop version of the website's editor. Yea there's definitely a lot of performance improvements I'm sure can be made; I just decided to type what first came to mind lol. For example, it may be more performant to
UNION ALL
the individual queries of the dynamic SQL as opposed to doing 700+INSERT INTO
s but would need to be tested. I've found too manyUNION ALL
operators can even cause the SQL optimizer to falter.J.D.– J.D.2021年03月27日 02:28:41 +00:00Commented Mar 27, 2021 at 2:28 -
1@JonathanFite Also we share the same first name, so my hat is off to you as well, cheers. 🙂J.D.– J.D.2021年03月27日 02:35:42 +00:00Commented Mar 27, 2021 at 2:35
-
To improve performance, check for existance rather than counting rows, i.e. IF EXISTS(SELECT 1 FROM mytable WHERE mycolumn IS NOT NULL) SELECT 'mytable.mycolumn'Sean Pearce– Sean Pearce2021年03月29日 10:21:19 +00:00Commented Mar 29, 2021 at 10:21
-
1This was very helpful but the INNER JOIN on sys.Columns does not retrieve all of my tables. We have some tables using the dbo schema and the script above will get those but we also have alot of old tables using a 'informix' schema. Is there a way I can include those as well?shaunteezie– shaunteezie2021年03月29日 16:26:01 +00:00Commented Mar 29, 2021 at 16:26
I did some little changes and tested on my DB:
-- Temp table to store the results for later
DROP TABLE IF EXISTS #Results
CREATE TABLE #Results (TableName VARCHAR(1000), ColumnName VARCHAR(1000), RowsCount INT, NonNullCount INT)
-- Dynamic SQL variable to execute after we build it out recursively
DECLARE @DynamicSQL NVARCHAR(MAX) = ''
DECLARE @MyCursor CURSOR;
BEGIN
SET @MyCursor = CURSOR FOR
-- Uses the sys DMVs to recursively build which columns and tables to query
SELECT
' INSERT INTO #Results
SELECT ''' + S.[name] + '.' + T.[name] + ''' AS TableName, ''' + C.[name] + ''' AS ColumnName, COUNT(*) AS RowsCount, COUNT([' + C.[name] + ']) AS NonNullCount
FROM [' + S.[name] + '].[' + T.[name] + '];'
FROM sys.Tables AS T
INNER JOIN sys.Schemas AS S
ON T.[schema_id] = S.[schema_id]
INNER JOIN sys.Columns AS C
ON OBJECT_ID(S.[name] + '.' + T.[name]) = C.[object_id]
WHERE C.is_nullable = 1 order by T.[name];
OPEN @MyCursor
FETCH NEXT FROM @MyCursor
INTO @DynamicSQL
WHILE @@FETCH_STATUS = 0
BEGIN
PRINT @DynamicSQL
EXEC sp_ExecuteSQL @DynamicSQL;
/*
YOUR ALGORITHM GOES HERE
*/
FETCH NEXT FROM @MyCursor
INTO @DynamicSQL
END;
CLOSE @MyCursor ;
DEALLOCATE @MyCursor;
END;
-- Get the final results of which columns only contain NULLs
SELECT TableName, ColumnName
FROM #Results
WHERE RowsCount > 0 -- If the RowsCount is 0 then the table is empty so can't determine if the column only contains NULLs when there's no rows to begin with (but
you can remove this predicate if you want to include empty tables too)
AND NonNullCount = 0 -- If the non null count is 0 then the column only contains NULLs
ORDER BY TableName, ColumnName
-
It would have been better to edit the previous answer to include the little changes.Michael Green– Michael Green2024年10月06日 12:19:57 +00:00Commented Oct 6, 2024 at 12:19
sys.tables
andsys.columns
to help you get a list of your Tables and their nullable columns.NULL
columns be careful with what you're gonna do with them. Unfortunately the poorly designed legacy app might reference those columns (to insertNULL
values) so just removing them can cause you problems.