0

I am looking for a way to run frequencies for a number of different columns. I'm used to working in SAS and SQL is still new. In SAS I'd run a proc freq; table var1 var2 var3; run; command; however, it appears as though this requires a bit more code in SQL. Because of the number of variables/columns I'm obtaining frequencies for, I really don't want to have to write out the same code over and over again. Eg:

select
 language,
 count(*) as freq
from #pop
group by language
order by language

I've been looking at cursor commands to loop through column names, so I start with generating a new temporary table with all the column names that I'd like to loop through. Here's what I've been able to put together:

drop table if exists #dem_table;
create table #dem_table (dem_vars varchar(50) not null);
insert into #dem_table values ('age_cat'),('race'),('ethnicity'),('language'),('sex'),('orientation'),('income'),('urban'),('poverty'),('children'),('marital_status'),('state');
declare @column varchar(50);
declare cursor_dem cursor for
 select * from #dem_table;
open cursor_dem;
 fetch next from cursor_dem into @column;
 while @@fetch_status = 0
 begin
 select @column, count(*) as freq
 from #pop
 group by @column
 order by @column;
 fetch next from cursor_dem into @column;
 end;
close cursor_dem;
deallocate cursor_dem;

It works until I reach the "group by" and "order by" syntax, and then I receive the following errors:

Each GROUP BY expression must contain at least one column that is not an outer reference. The SELECT item identified by the ORDER BY number 1 contains a variable as part of the expression identifying a column position. Variables are only allowed when ordering by an expression referencing a column name.

Wondering if dynamic sql might be helpful? But I can't get it to iterate through columns that way either.

Dale K
28.1k15 gold badges60 silver badges86 bronze badges
asked Dec 13, 2023 at 17:14
8
  • 1
    If you find yourself needing to loop through columns, it's time to reconsider the schema design. Commented Dec 13, 2023 at 17:19
  • do you want just to output the results to the client in a loop? In that case, you should do the actual dynamic part of the cursor: EXEC(' select ' + @column + N', count(*) as freq from #pop group by ' + @column + N' order by ' + @column + N';') Commented Dec 13, 2023 at 17:20
  • Do you want a single result set at the end? or do you want n different results, one for each column? Commented Dec 13, 2023 at 17:30
  • What would your desired output be? One resultset or multiple? In columns or rows? Please show. Commented Dec 13, 2023 at 17:31
  • @Xedni - great question! Desired output would be n results for each column - ideally not in new or temp tables, just the results viewer works! Commented Dec 13, 2023 at 19:16

2 Answers 2

-1

This could probably be pared down a bit, or adjusted to accept a specific list of columns if you are interested only in a subset (For instance, you probably don't care about the distribution of a surrogate key, so you could exclude that. Or maybe you want to exclude float columns), but here is a snippet I use when I want to find the distinct values in a table, and the counts thereof.

-- Field-by-Field with loop
drop table if exists #DistVals
create table #DistVals
(
 TableName nvarchar(384),
 OrdinalPosition int,
 Field nvarchar(128),
 Type nvarchar(128),
 Value nvarchar(max),
 Ct int
)
go
declare 
 @TableName nvarchar(128), -- table here
 @SchemaName nvarchar(128) -- schema here
declare @sql nvarchar(max)
declare c cursor local fast_forward for
 select concat('insert into #DistVals (TableName, OrdinalPosition, Field, Type, Value, Ct) select TableName = ', quotename(concat(table_catalog, '.', table_schema, '.', table_name), ''''), ', OrdinalPosition = ', ordinal_position, ', Field = ', quotename(column_name, ''''), ', Type = ', quotename(data_type, ''''), ', Value = convert(nvarchar(max), ', quotename(column_name), '), Ct = count(1) from ', quotename(table_catalog), '.', quotename(table_schema), '.', quotename(table_name), ' group by ', quotename(column_name))
 from information_schema.columns
 where table_name = @TableName
 and table_schema = @SchemaName
 -- and column_name <> 'MySurrogateKey'
open c
fetch next from c into @sql
while @@fetch_status = 0
begin
 exec sp_executesql @sql
 
 fetch next from c into @sql
end
select
 TableName,
 OrdinalPosition,
 Field,
 Type,
 Value,
 Ct,
 TotalValueCt = sum(ct) over (partition by TableName, Field)
from #DistVals
order by TableName, OrdinalPosition, Ct desc

I find this approach the easiest to reason about, and since I'm typically running this in an ad-hoc fashion, I usually don't care that I have to re-hit the table n times for each column.

EDIT If you really want as many result sets as you have columns to analyze, you could replace the insert into #DistVals with a simple select. I don't know your exact use case, but to me, having a single dataset containing all columns is more useful.

That said, if you want to get everything in one go, You could also do that by doing a dynamic unpivot.

-- all at once with dynamic unpivot
declare 
 @TableName nvarchar(128),
 @SchemaName nvarchar(128)
declare 
 @Columns nvarchar(max),
 @ConvertedColumns nvarchar(max),
 @SQL nvarchar(max)
select 
 @Columns = string_agg(quotename(name), ','),
 @ConvertedColumns = string_agg(concat(quotename(name), ' = convert(nvarchar(max), ', quotename(name), ')'), ',')
from sys.columns
where object_id = object_id(@SchemaName + '.' + @TableName)
select @SQL = concat
('
 select
 Name,
 ValueCt = count(1),
 DistinctValueCt = count(distinct Value)
 from
 (
 select ', @ConvertedColumns, '
 from ', quotename(@SchemaName), '.', quotename(@TableName), '
 ) s
 unpivot(Value for Name in(', @Columns, ')) u
 group by name
')
exec sp_executesql @SQL
answered Dec 13, 2023 at 17:35
Sign up to request clarification or add additional context in comments.

Comments

-1

Here is another option leveraging the sys.columns to build the dynamic sql for this. It is a bit simpler than using a dynamic unpivot for something like this.

If you want the distinct counts in a single row per column it would be something like this.

declare @sql nvarchar(max) = 'select '
 , @TableName sysname = 'YourTableName'
select @sql = @sql + quotename(c.name) + ' = count(distinct ' + quotename(c.name) + '), '
from sys.columns c
where object_id = object_id(@TableName)
set @sql = left(@sql, len(@sql) - 1) + ' from ' + QUOTENAME(@TableName)
exec sp_executesql @sql

If however you want individual rows you could do something similar like this.

declare @sql nvarchar(max) = ''
 , @TableName sysname = 'YourTableName'
select @sql = @sql + 'select Name = ''' + c.name + ''', ValueCount = count(' + quotename(c.name) + '), ' + quotename(c.name) + ' = count(distinct ' + quotename(c.name) + ') from ' + quotename(@TableName) + ' union all '
from sys.columns c
where object_id = object_id(@TableName)
set @sql = left(@sql, len(@sql) - 10)
exec sp_executesql @sql

And last but not least if you really do want a new result set for each value as you indicated you can modify to something like this.

declare @sql nvarchar(max) = ''
 , @TableName sysname = 'YourTableName'
select @sql = @sql + 'select Name = ''' + c.name + ''', ValueCount = count(' + quotename(c.name) + '), ' + quotename(c.name) + ' = count(distinct ' + quotename(c.name) + ') from ' + quotename(@TableName) + ';'
from sys.columns c
where object_id = object_id(@TableName)
exec sp_executesql @sql
answered Dec 13, 2023 at 20:38

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.