Most efficient way to insert rows into a temp table in a stored procedure

Question 1

I am working in SQL Server 2008. I have a stored procedure with 5 queries inside of it that select rows from very large fact tables (on the order of millions of rows) into another table (call it table_B). (Actually, the 5 queries used to be lots of queries, but I compressed them down to 5 because each query was doing a LEFT JOIN from the very large fact tables against very small dimension tables, i.e., each very large table was being scanned lots of times.) I need this other table (table_B) to persist, i.e., every time the stored procedure is executed, rows just keep getting added to it. table_B can become very large itself.

Currently, for each of the 5 queries, I insert rows into it via the INSERT INTO table_B SELECT ... method. A colleague recommended to me to de-couple my inserts and selects to gain some performance. He recommended to first insert the rows into separate temp tables, export the temp tables as flat files, and then insert the flat files into my table_B via SSIS. That way, we can get the performance of SSIS for inserts.

What is the best way for inserting rows into these temp tables? (I can either pre-define them or create them on-the-fly. So, I'm open to using either INSERT INTO or SELECT INTO methods.) The number of rows returned in each SELECT query will be large, in general. So, I need it to be as fast as possible.

The tables are very simple, i.e., no indexes, constraints, or primary keys. (Those may get implemented in the future, but I am not designing for it.)

Also, is there a way to automate the end-to-end process? I don't want to have users first execute the stored procedure, query the temp tables, export them, and then run SSIS to import them into the final table_B.

Question 2

I would expect an SSIS bulk insert to require the same insertion overhead of an 'INSERT...SELECT` but without the SELECT overhead when sourced from flat files. You'll still need to incur the SELECT overhead to create the files so I'm not sure the proposed method would improve performance. Are you saying you have no indexes on any of the tables or just the proposed temp tables?

Question 3

Correct, I have no indexes on any of the tables. The stored proc is for running data validations on external data that we will eventually move into a data warehouse. I have to accept the trashiest data possible. (As a separate project, I will first be removing duplicate rows in a stage table and then running this stored proc against an audit table, such that I can put indexes on the audit tables.)

Question 4

I doubt if you would get better performance by first selecting into a temp table, then writing into a file, and then loading into the final table. Note that in this approach, you are hitting the disk i/o twice: once for writing to the file and then to write to the final table. The only approach that can potentially beat 'select into' is reading the selected data into in-memory temp table, and then writing from temp table to the final table. But again, I do not expect significant difference. I would be curious to know the result :)

Question 5

The right indexes will make all the difference. You can index trashy data--you don't have to have keys & constraints. If you really need the indexes to not stay there, it's possible that creating an index, running your 5 queries, and deleting the index will still be faster than your current process.

Question 6

Since this seems to be a data warehouse, is / can the recovery model be bulk logged (or simple)? That in addition with using TABLOCK hint on table_b could help a lot since the inserts could be minimally logged. I found a blog post by Itzik Ben-Gan on minimally logged inserts that could help.

Question 7

Actually, this isn't a true data warehouse. It's a data validation DB (in a test environment) for testing external data before we move the data into the data warehouse. I call the test data "fact" tables because that is where they will eventually move to. But, I do LEFT JOIN against actual dimension tables in the test environment.

Question 8

@user3100444 To me that sounds like bulk logged or simple recover model might be feasible and minimally logged insert into the final table should be a lot better than trying to export something to disc

James Z James Z 2,21915 silver badges22 bronze badges · Answer 1 · 2015-07-18 18:44:22Z

2

Since this seems to be a data warehouse, is / can the recovery model be bulk logged (or simple)? That in addition with using TABLOCK hint on table_b could help a lot since the inserts could be minimally logged. I found a blog post by Itzik Ben-Gan on minimally logged inserts that could help.

Share

Improve this answer

answered Jul 18, 2015 at 18:44

James Z's user avatar

James Z James Z

2,21915 silver badges22 bronze badges

2

Actually, this isn't a true data warehouse. It's a data validation DB (in a test environment) for testing external data before we move the data into the data warehouse. I call the test data "fact" tables because that is where they will eventually move to. But, I do LEFT JOIN against actual dimension tables in the test environment.

skyline01
– skyline01

2015年07月18日 20:11:12 +00:00
Commented Jul 18, 2015 at 20:11
@user3100444 To me that sounds like bulk logged or simple recover model might be feasible and minimally logged insert into the final table should be a lot better than trying to export something to disc

James Z
– James Z

2015年07月18日 20:42:21 +00:00
Commented Jul 18, 2015 at 20:42

Add a comment |

Stack Exchange Network

Most efficient way to insert rows into a temp table in a stored procedure

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Most efficient way to insert rows into a temp table in a stored procedure

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions