I am working in SQL Server 2008. I have a stored procedure with 5 queries inside of it that select rows from very large fact tables (on the order of millions of rows) into another table (call it table_B). (Actually, the 5 queries used to be lots of queries, but I compressed them down to 5 because each query was doing a LEFT JOIN from the very large fact tables against very small dimension tables, i.e., each very large table was being scanned lots of times.) I need this other table (table_B) to persist, i.e., every time the stored procedure is executed, rows just keep getting added to it. table_B can become very large itself.
Currently, for each of the 5 queries, I insert rows into it via the INSERT INTO table_B SELECT ... method. A colleague recommended to me to de-couple my inserts and selects to gain some performance. He recommended to first insert the rows into separate temp tables, export the temp tables as flat files, and then insert the flat files into my table_B via SSIS. That way, we can get the performance of SSIS for inserts.
What is the best way for inserting rows into these temp tables? (I can either pre-define them or create them on-the-fly. So, I'm open to using either INSERT INTO or SELECT INTO methods.) The number of rows returned in each SELECT query will be large, in general. So, I need it to be as fast as possible.
The tables are very simple, i.e., no indexes, constraints, or primary keys. (Those may get implemented in the future, but I am not designing for it.)
Also, is there a way to automate the end-to-end process? I don't want to have users first execute the stored procedure, query the temp tables, export them, and then run SSIS to import them into the final table_B.
1 Answer 1
Since this seems to be a data warehouse, is / can the recovery model be bulk logged (or simple)? That in addition with using TABLOCK hint on table_b could help a lot since the inserts could be minimally logged. I found a blog post by Itzik Ben-Gan on minimally logged inserts that could help.
-
Actually, this isn't a true data warehouse. It's a data validation DB (in a test environment) for testing external data before we move the data into the data warehouse. I call the test data "fact" tables because that is where they will eventually move to. But, I do LEFT JOIN against actual dimension tables in the test environment.skyline01– skyline012015年07月18日 20:11:12 +00:00Commented Jul 18, 2015 at 20:11
-
@user3100444 To me that sounds like bulk logged or simple recover model might be feasible and minimally logged insert into the final table should be a lot better than trying to export something to discJames Z– James Z2015年07月18日 20:42:21 +00:00Commented Jul 18, 2015 at 20:42
SELECT
overhead when sourced from flat files. You'll still need to incur theSELECT
overhead to create the files so I'm not sure the proposed method would improve performance. Are you saying you have no indexes on any of the tables or just the proposed temp tables?