More efficient SQL for different databases with same schema

Question 1

I have data stored in different monthly databases that have the exact same name schema for the tables and columns. Each monthly database has ~10 million rows. I'm looking for a more efficient way to select the rows I want from each database based on the same variable. There are only 5 to 10 rows from each month for that variable.

I also need to JOIN this different monthly data to the same separate database. I am currently using UNION to combine the data. So:

 SELECT a.col1, a.col2, a.col3, b.colX, b.colY, b.colZ
 FROM db_JAN.tab1 a
 INNER JOIN db_OTHER.tabX b
 ON (a.DATE = b.DATE AND a.someID = b.someID)
 WHERE a.col1='variable1'
 UNION
 SELECT a.col1, a.col2, a.col3, b.colX, b.colY, b.colZ
 FROM db_FEB.tab1 a
 INNER JOIN db_OTHER.tabX b
 ON (a.DATE = b.DATE AND a.someID = b.someID)
 WHERE a.col1='variable1'
 UNION
 SELECT a.col1, a.col2, a.col3, b.colX, b.colY, b.colZ
 FROM db_MARCH.tab1 a
 INNER JOIN db_OTHER.tabX b
 ON (a.DATE = b.DATE AND a.someID = b.someID)
 WHERE a.col1='variable1'
 .
 .
 .

If the SELECT, JOIN and WHERE clauses are the same across different databases with the same naming schema, is there a more efficient way of writing this query? Something like:

 SELECT a.col1, a.col2, a.col3, b.colX, b.colY, b.colZ
 FROM (db_JAN,db_FEB,db_MARCH...).tab1 a
 INNER JOIN db_OTHER.tabX b
 ON (a.DATE = b.DATE AND a.someID = b.someID)
 WHERE a.col1='variable1'

This question might be applicable but I'm not sure as I'm relatively new to SQL and am not sure about sequential table scans or indexing:

"How to make a union view execute more efficiently?"

I don't believe it is a hierarchical query and this question doesn't help me either.

NOTE: I am using SAS to run the query connecting to Teradata SQL database and have used a MACRO VARIABLE for 'variable1' in my PROC SQL statement. I am not sure if I can use a MACRO VARIABLE for the databases but that could be an option if I can't do it in SQL itself.

Question 2

Do you really need the UNION or UNION ALL would be ok? (it's usually more efficient as it doesn't try to remove any duplicates)

Question 3

I could use 'UNION ALL'. I'll change to that. I was hoping to shorten the query itself if possible.

Question 4

The query could be rewritten shorter. But I'm not confident it would help efficiency. It's more probable it wouldn't.

Question 5

Teradata supports predicates pushdown

select a.col1, a.col2, a.col3, b.colx, b.coly, b.colz 
from ( select col1, col2, col3, date, someid from db_jan.tab1 
 union all select col1, col2, col3, date, someid from db_feb.tab1 
 union all ...
 ) a 
 join db_other.tabx b 
 on a.date = b.date 
 and a.someid = b.someid
where a.col1 = 'variable1'

Question 6

It's actually shorter, only one WHERE. Mine repeats for each one of the 12 tables

Question 7

yep. sorry. just saw the WHERE at the end.

Question 8

Ran my original SQL, Dudu and TypoCube's multiple times with UNION & UNION ALL. Times from SAS log and queries ran on server. Times for same query varied depending on server activity. I know it was a very amateur test but was good enough for me: - My original query was consistently slower for all. - Using the WHERE clause inside the derived table didn't make noticeable difference so I left it outside for a shorter query per Dudu. - UNION ALL was noticeable faster on all tests than UNION. Unfortunately I found duplicates that need to be deleted so have to use UNION.

Question 9

There are various ways to write the query in shorter form. I'm not sure if they could improve performance (or make it worse!)

The best lines of attack if the query isn't efficient is to change it into UNION ALL (provided that the results are equivalent) and examine/optimize the parts separately, make sure indexes are used, etc.

One way to rewrite is to move the UNION or UNION ALL inside a derived table and then join:

SELECT a.col1, a.col2, a.col3, b.colX, b.colY, b.colZ
FROM 
 ( SELECT col1, col2, col3, DATE, someID
 FROM db_JAN.tab1 WHERE col1 = 'variable1' 
 UNION ALL
 SELECT col1, col2, col3, DATE, someID
 FROM db_FEB.tab1 WHERE col1 = 'variable1' 
 UNION ALL
 ---
 ) a
INNER JOIN db_OTHER.tabX b
ON (a.DATE = b.DATE AND a.someID = b.someID) ;

Question 10

Thanks @TypoCube. I'll amend and see if it improves or hinders performance. At least using the derived table will make the query a lot shorter since the initial SELECT is 60 lines of code which is what I wanted to avoid repeating.

Question 11

My advice is still to test. If the original query with UNION ALL preforms well, I don texpect my version to perform better. It might perform a bit better, it might perform worse. test them (and Dudu's query as well, it's a bit shorter than mine)

Question 12

see above for test results/comments.

score 1 · Accepted Answer · 2016-12-30 17:49:36Z

1

Teradata supports predicates pushdown

select a.col1, a.col2, a.col3, b.colx, b.coly, b.colz 
from ( select col1, col2, col3, date, someid from db_jan.tab1 
 union all select col1, col2, col3, date, someid from db_feb.tab1 
 union all ...
 ) a 
 join db_other.tabx b 
 on a.date = b.date 
 and a.someid = b.someid
where a.col1 = 'variable1'

Share

Improve this answer

edited Dec 30, 2016 at 20:08

answered Dec 30, 2016 at 17:49

David דודו Markovitz's user avatar

David דודו Markovitz David דודו Markovitz

3,27414 silver badges21 bronze badges

3

It's actually shorter, only one WHERE. Mine repeats for each one of the 12 tables

ypercubeᵀᴹ
– ypercubeᵀᴹ

2016年12月30日 19:57:58 +00:00
Commented Dec 30, 2016 at 19:57
yep. sorry. just saw the WHERE at the end.

xxCDxx
– xxCDxx

2016年12月30日 19:59:28 +00:00
Commented Dec 30, 2016 at 19:59
Ran my original SQL, Dudu and TypoCube's multiple times with UNION & UNION ALL. Times from SAS log and queries ran on server. Times for same query varied depending on server activity. I know it was a very amateur test but was good enough for me: - My original query was consistently slower for all. - Using the WHERE clause inside the derived table didn't make noticeable difference so I left it outside for a shorter query per Dudu. - UNION ALL was noticeable faster on all tests than UNION. Unfortunately I found duplicates that need to be deleted so have to use UNION.

xxCDxx
– xxCDxx

2016年12月30日 22:28:09 +00:00
Commented Dec 30, 2016 at 22:28

Add a comment |

Stack Exchange Network

More efficient SQL for different databases with same schema

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

More efficient SQL for different databases with same schema

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions