I have a table with 12 million of rows and a following task I should frequently perform:
- Get search results from somewhere (50 rows). Each result has a key looking as a md5 hash and a table's PM is built on this field.
- Check what rows are currently stored.
- Store all other rows
Question is what is the best way to perform steps 2-3. I use PHP and Doctrine so not all tricky queries are possible to use. For example I can't use bulk inserts so need to run INSERT up to 50 times in a row. I see two possible ways:
- run SELECT ... WHERE id IN(...) with all 50 IDs and see what is returned, then run as many inserts as I need
- run 50 inserts and catch duplicated ID error
1 Answer 1
I think what you're looking is simply to use INSERT IGNORE
. Forget about steps 1 and 2, just insert and ignore :)
If you use the IGNORE keyword, errors that occur while executing the INSERT statement are ignored. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row is discarded and no error occurs. Ignored errors may generate warnings instead, although duplicate-key errors do not.
-
as I stated I am using Doctrine that does not support bulk inserts so I use many separate ones. Of course I can run queries directly ignoring all ORM stuff. It's not clean but a already do it in another place. So 2 questions again: is
INSERT IGNORE
better then simpleINSERT
with catching duplicate error? and is bulk insert withignore
MUCH better then what I have now?yefrem– yefrem2014年10月22日 14:54:49 +00:00Commented Oct 22, 2014 at 14:54 -
Yes it is better, you just save yourself a select statement on a 12 million rows table. Bulk inserts are much better yes. Not sure though, if we're talking about the same. If you're talking about
LOAD DATA INFILE..
, it too has the possibility to specifyIGNORE
. If you mean to specify multiple rows in one insert statement, yes, that's better. After every insert statement the indexes on your table get rebuilt. If you insert multiple rows in one statement, the rebuilding index step has to be done only once. That's significant for good performance.tombom– tombom2014年10月22日 15:04:03 +00:00Commented Oct 22, 2014 at 15:04 -
I mean insert with multiple rows. Thanks for your explanation, look like I should run direct queries ignoring ORM again even though it's dirty.yefrem– yefrem2014年10月22日 15:12:29 +00:00Commented Oct 22, 2014 at 15:12
-
you were right, my
SELECT
for step 2 took about 35ms, my newINSERT IGNORE
takes about 2-3 msyefrem– yefrem2014年10月24日 08:17:10 +00:00Commented Oct 24, 2014 at 8:17