1

I have data stored in different monthly databases that have the exact same name schema for the tables and columns. Each monthly database has ~10 million rows. I'm looking for a more efficient way to select the rows I want from each database based on the same variable. There are only 5 to 10 rows from each month for that variable.

I also need to JOIN this different monthly data to the same separate database. I am currently using UNION to combine the data. So:

 SELECT a.col1, a.col2, a.col3, b.colX, b.colY, b.colZ
 FROM db_JAN.tab1 a
 INNER JOIN db_OTHER.tabX b
 ON (a.DATE = b.DATE AND a.someID = b.someID)
 WHERE a.col1='variable1'
 UNION
 SELECT a.col1, a.col2, a.col3, b.colX, b.colY, b.colZ
 FROM db_FEB.tab1 a
 INNER JOIN db_OTHER.tabX b
 ON (a.DATE = b.DATE AND a.someID = b.someID)
 WHERE a.col1='variable1'
 UNION
 SELECT a.col1, a.col2, a.col3, b.colX, b.colY, b.colZ
 FROM db_MARCH.tab1 a
 INNER JOIN db_OTHER.tabX b
 ON (a.DATE = b.DATE AND a.someID = b.someID)
 WHERE a.col1='variable1'
 .
 .
 .

If the SELECT, JOIN and WHERE clauses are the same across different databases with the same naming schema, is there a more efficient way of writing this query? Something like:

 SELECT a.col1, a.col2, a.col3, b.colX, b.colY, b.colZ
 FROM (db_JAN,db_FEB,db_MARCH...).tab1 a
 INNER JOIN db_OTHER.tabX b
 ON (a.DATE = b.DATE AND a.someID = b.someID)
 WHERE a.col1='variable1'

This question might be applicable but I'm not sure as I'm relatively new to SQL and am not sure about sequential table scans or indexing:

"How to make a union view execute more efficiently?"

I don't believe it is a hierarchical query and this question doesn't help me either.

NOTE: I am using SAS to run the query connecting to Teradata SQL database and have used a MACRO VARIABLE for 'variable1' in my PROC SQL statement. I am not sure if I can use a MACRO VARIABLE for the databases but that could be an option if I can't do it in SQL itself.

asked Dec 30, 2016 at 16:45
3
  • Do you really need the UNION or UNION ALL would be ok? (it's usually more efficient as it doesn't try to remove any duplicates) Commented Dec 30, 2016 at 16:50
  • I could use 'UNION ALL'. I'll change to that. I was hoping to shorten the query itself if possible. Commented Dec 30, 2016 at 16:54
  • The query could be rewritten shorter. But I'm not confident it would help efficiency. It's more probable it wouldn't. Commented Dec 30, 2016 at 16:58

2 Answers 2

1

Teradata supports predicates pushdown

select a.col1, a.col2, a.col3, b.colx, b.coly, b.colz 
from ( select col1, col2, col3, date, someid from db_jan.tab1 
 union all select col1, col2, col3, date, someid from db_feb.tab1 
 union all ...
 ) a 
 join db_other.tabx b 
 on a.date = b.date 
 and a.someid = b.someid
where a.col1 = 'variable1'
answered Dec 30, 2016 at 17:49
3
  • It's actually shorter, only one WHERE. Mine repeats for each one of the 12 tables Commented Dec 30, 2016 at 19:57
  • yep. sorry. just saw the WHERE at the end. Commented Dec 30, 2016 at 19:59
  • Ran my original SQL, Dudu and TypoCube's multiple times with UNION & UNION ALL. Times from SAS log and queries ran on server. Times for same query varied depending on server activity. I know it was a very amateur test but was good enough for me: - My original query was consistently slower for all. - Using the WHERE clause inside the derived table didn't make noticeable difference so I left it outside for a shorter query per Dudu. - UNION ALL was noticeable faster on all tests than UNION. Unfortunately I found duplicates that need to be deleted so have to use UNION. Commented Dec 30, 2016 at 22:28
2

There are various ways to write the query in shorter form. I'm not sure if they could improve performance (or make it worse!)

The best lines of attack if the query isn't efficient is to change it into UNION ALL (provided that the results are equivalent) and examine/optimize the parts separately, make sure indexes are used, etc.

One way to rewrite is to move the UNION or UNION ALL inside a derived table and then join:

SELECT a.col1, a.col2, a.col3, b.colX, b.colY, b.colZ
FROM 
 ( SELECT col1, col2, col3, DATE, someID
 FROM db_JAN.tab1 WHERE col1 = 'variable1' 
 UNION ALL
 SELECT col1, col2, col3, DATE, someID
 FROM db_FEB.tab1 WHERE col1 = 'variable1' 
 UNION ALL
 ---
 ) a
INNER JOIN db_OTHER.tabX b
ON (a.DATE = b.DATE AND a.someID = b.someID) ;
answered Dec 30, 2016 at 17:08
3
  • Thanks @TypoCube. I'll amend and see if it improves or hinders performance. At least using the derived table will make the query a lot shorter since the initial SELECT is 60 lines of code which is what I wanted to avoid repeating. Commented Dec 30, 2016 at 19:54
  • My advice is still to test. If the original query with UNION ALL preforms well, I don texpect my version to perform better. It might perform a bit better, it might perform worse. test them (and Dudu's query as well, it's a bit shorter than mine) Commented Dec 30, 2016 at 19:57
  • see above for test results/comments. Commented Dec 30, 2016 at 22:11

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.