I need to sync a large table ~500 millions rows without a primary key between SQL Server and MySQL. The table has only a clustered composite non-unique index.
I do have a ODBC connection between the servers, but an import of ~8 million rows took around 45 minutes, so I believe a larger single import would be unreasonable as interruptions may occur at any point. I can't change the existing table structure, I can add other tables. After further reading, offset / fetch is not an option for large tables. "Select ... where x between ... and ... " is not an option as I don't have an unique key.
How can I export the table in batches that are guaranteed to contain all rows? My problem is that since the clustered key is not unique, ordering after it would not guarantee the physical rows have the same order between consecutive queries and ordering after all columns would take too long. And how would you recommend to migrate the batches, through ODBC or CSV files?
-
This will be repeating (ususal operation) or one time operation?Bogdan Bogdanov– Bogdan Bogdanov2016年01月29日 12:45:01 +00:00Commented Jan 29, 2016 at 12:45
-
The initial export will be a one time operation, the sync changes like new records or updates should be repetitive. CDC is not an option, but will investigate further after the initial migration.no one– no one2016年01月29日 13:02:51 +00:00Commented Jan 29, 2016 at 13:02
-
I think to receive help on this you have to explain in more details the whole process (it look you have very complex problem)Bogdan Bogdanov– Bogdan Bogdanov2016年01月29日 13:42:13 +00:00Commented Jan 29, 2016 at 13:42
-
You note "since the clustered key is not unique, ordering after it would not guarantee the physical rows have the same order between consecutive queries". Since row order is not preserved (unless you have some sequence data) you cannot rely on getting the same physical row order. Order of rows does not default to insertion order nor index order, but is defined by the ORDER BY clause.RLF– RLF2016年01月29日 13:56:43 +00:00Commented Jan 29, 2016 at 13:56
-
Yes, RLF, I agree. The columns are all ints, A, B, C, D, E. Clustered key is on ABC. A combination ABC is not unique, neither a combination ABCD. Would "order by" an non-unique column(s) allow me to export the entire table in batches? And Bogdan Bodganov, Stack platform discourages complex problems, it's better just to address the question. How to export the complete large table as fast as possible in batches without loss of rows?no one– no one2016年01月29日 15:47:09 +00:00Commented Jan 29, 2016 at 15:47
1 Answer 1
Assuming you don't have updates or deletes against source table you can try the following :
1. Make a copy of existing table using CTAS syntax (for SQLServer it's SELECT * into source_table_copy FROM source_table
). Such operation is very fast even for huge tables.
2. Add after insert
trigger on source_table
that copies new record[s] to source_table_copy
.
3. Now when all new records in source_table
go to source_table_copy
as well, and you can move data from copied table to Mysql in batches.
For instance, if you have a link between 2 servers, everything can be done within the body of TSQL stored procedure.
E.g. a piece of code that move up to 20 records to new server might look like
--declare table variable to keep deleted records until they delivered to target host
BEGIN TRANSACTION;
DELETE TOP (20) FROM source_table_copy OUTPUT DELETED.* INTO @Table_Var;
--insert data into linked server , or to csv file
COMMIT;
It's also possible to use CURSOR to read data and then delete with where current of
clause.
**Ideally you need to prevent applications from inserting data into source_table
during step 1. If it's absolutely impossible , I'll go with an after insert
trigger which is added right before step 1 and removed right after it's done which copies data to some other table I can later merge with source_table_copy
.
-
Thank you for the solution, I was trying something as well, however with a normal insert. I'll try the CTAS syntax to see if it's speeds things up. Followup question, if you don't mind: would the "after insert trigger" affect performances?no one– no one2016年01月30日 12:25:37 +00:00Commented Jan 30, 2016 at 12:25
-
Since the trigger body is very simple (just insert data to another table), performance impact will be minimal.a1ex07– a1ex072016年01月30日 22:00:15 +00:00Commented Jan 30, 2016 at 22:00