In my work, I often have to take a naive look at tables and then never see them again, and it can be difficult to see what lines up where for large databases with many columns. I wrote this, and while it works, it seems to require an unnecessary number of joins. I'm wondering if there is a better way to handle this issue or a way to cut down the code.
select A.COLUMN_NAME as 'Similar Column', y.TABLE_NAME as 'First Table', z.TABLE_NAME as 'Second Table'
from INFORMATION_SCHEMA.COLUMNS A
full outer join INFORMATION_SCHEMA.COLUMNS B
on A.COLUMN_NAME = B.COLUMN_NAME
cross apply (
SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE='BASE TABLE') as y
cross apply (
SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE='BASE TABLE') as z
where y.TABLE_NAME <> z.TABLE_NAME
and A.TABLE_NAME = y.TABLE_NAME
and B.TABLE_NAME = z.TABLE_NAME
order by y.TABLE_NAME,z.TABLE_NAME, a.COLUMN_NAME
I also wonder if there is a way to prevent duplicate entries, in other words, Activity, RMA is the same as RMA, Activity.
2 Answers 2
A common-table-expression (See CTE) will help you here quite nicely..
with columns as (
select C.COLUMN_NAME,
C.TABLE_NAME
from INFORMATION_SCHEMA.COLUMNS C
inner join INFORMATION_SCHEMA.TABLES T on T.TABLE_NAME = C.TABLE_NAME
where T.TABLE_TYPE = 'BASE TABLE')
select C.COLUMN_NAME as 'Similar Column',
C.TABLE_NAME as 'First Table',
D.TABLE_NAME as 'Second Table'
from columns C
inner join columns D
on C.COLUMN_NAME = D.COLUMN_NAME
and C.TABLE_NAME < D.TABLE_NAME
order by C.TABLE_NAME, D.TABLE_NAME, C.COLUMN_NAME
The above query reduces the complexity of the column/table table joins, and reuses the result. Further, by using <
instead of <>
on the table-name anti-join, it removes your duplicates.
-
\$\begingroup\$ Very clever with the less than sign. \$\endgroup\$jfa– jfa2017年03月10日 21:34:57 +00:00Commented Mar 10, 2017 at 21:34
I asked the database architect/administrator for our group, and in reviewing INFORMATION_SCHEMA.COLUMNS
we arrived at the join-golf low score when we noticed that TABLE_NAME
is a column in INFORMATION_SCHEMA.COLUMNS
, which means two joins can be taken out in favor of a select. Here is the new code:
select x.COLUMN_NAME, x.TABLE_NAME as 'Table 1', y.TABLE_NAME as 'Table 2'
from INFORMATION_SCHEMA.COLUMNS X
join INFORMATION_SCHEMA.COLUMNS y
on x.COLUMN_NAME = y.COLUMN_NAME
and x.TABLE_NAME < y.TABLE_NAME
order by x.TABLE_NAME, y.TABLE_NAME, x.COLUMN_NAME