Best practice/Patterns for two way data synchronisation

Question 1

Quite often in my work the idea of 2-way data synchronisation between database systems crops up. The classic example is two slightly different CRM systems (say, Raiser's Edge and Salesforce) and the need to have a two-way sync of Contact data between them.

API considerations aside, assuming you have a shared key to sync on, and purely thinking of the algorithm/pattern to be employed, this is a task that is often underestimated by non-techies.

For example, you have to watch out for:

Can you easily detect which records have changed in both systems (or will you have to compare all records between the systems to detect changes)
If you're going for a once-every-N-hours sync, how to deal with conflicts where the same record changes at more-or-less the same time in both systems
If you're going for a real-time sync (ie an update in one system immediately triggers an update to the other system) how to handle divergence over time due to bugs or system crashes

Personally I can think of ways to tackle all this but I'm wondering if there are any well known patterns, literature or best practices that I could refer to.

Question 2

what you describe sounds pretty close to Federated database system - is that correct?

Question 3

@gnat: Thanks for the link, some of the concerns are similar (eg dealing with heterogeneity), but I'm talking about syncing a subset of data from two autonomous databases whereas that seems to be more about creating a fully integrated view of everything across multiple dbs.

Question 4

7 years later, 50 upvotes but only 1 decent answer. There must be some syncronisation patterns or best practices out there?

Question 5

Seems to me that, in addition to the shared key, you need at least a last time updated stamp on each system’s records. That or you could hash the shared fields and compare hashes. It is a pretty low catch for a good Q (at least if MS’s stack cant be applied).

Question 6

Yes, a hard problem, easily underestimated. And could be a lot of work. If you are on Microsoft technologies, you may want to have a look at Microsoft Sync Framework here and here.

Question 7

Thanks, thats interesting. I'd heard of Ms Sync Framework but hadn't realised it was so generalised. It basically is a pattern for handling sync issues in general.

Question 8

Microsoft Sync Framework was replaced by Microsoft Sync Framework Toolkit.

Question 9

I am frustrated with the docs, which not that clear, specially for Non SQL-Server ADO.NET data providers, which is my case. Besides, my workplace is looking for something that does not require adding infrastructure tables/making changes in the Production environment. So I am about to discard this one.

Question 10

There are many theories about remote site DB synchronization. First start with INSERT. handling this one is easy - as you can create a unique ID for every site (for example an initial of the site name + ID (number): site_a_177 vs. site_b_53)

So insert should not create any conflicts. the problem is the update. I don't believe that there's a 100% failure proof method, but you can start an update by "locking" the record in the remote DB, and only after you got the handle - continue with the update, and finish by syncing the update and only then release the lock.

Question 11

Thanks, I think you're talking about distributed dbs with the same schema and dealing with distributed transactions. I'm thinking more of scenarios where the two DBs are completely autonomous (e.g. they assign unique ids in completely different ways and the schemas differ) but you want to sync a subset of the data in them.

Question 12

It sounds like there shouldn't be any conflicts. In that case it should be very simple - just save the "last record-id" that was synced for each table and continue from there.

codingoutloud codingoutloud 7265 silver badges9 bronze badges · Answer 1 · 2012-02-19 14:26:34Z

10

Yes, a hard problem, easily underestimated. And could be a lot of work. If you are on Microsoft technologies, you may want to have a look at Microsoft Sync Framework here and here.

Share

Improve this answer

answered Feb 19, 2012 at 14:26

codingoutloud's user avatar

codingoutloud codingoutloud

7265 silver badges9 bronze badges

3

1

Thanks, thats interesting. I'd heard of Ms Sync Framework but hadn't realised it was so generalised. It basically is a pattern for handling sync issues in general.

codeulike
– codeulike

02/20/2012 11:35:06
Commented Feb 20, 2012 at 11:35
2

Microsoft Sync Framework was replaced by Microsoft Sync Framework Toolkit.

Tomas Kubes
– Tomas Kubes

12/09/2014 20:56:28
Commented Dec 9, 2014 at 20:56
I am frustrated with the docs, which not that clear, specially for Non SQL-Server ADO.NET data providers, which is my case. Besides, my workplace is looking for something that does not require adding infrastructure tables/making changes in the Production environment. So I am about to discard this one.

Veverke
– Veverke

05/14/2017 13:24:18
Commented May 14, 2017 at 13:24

Add a comment |

Nir Alfasi Nir Alfasi 1932 silver badges8 bronze badges · Answer 2 · 2012-02-18 10:41:36Z

There are many theories about remote site DB synchronization. First start with INSERT. handling this one is easy - as you can create a unique ID for every site (for example an initial of the site name + ID (number): site_a_177 vs. site_b_53)

So insert should not create any conflicts. the problem is the update. I don't believe that there's a 100% failure proof method, but you can start an update by "locking" the record in the remote DB, and only after you got the handle - continue with the update, and finish by syncing the update and only then release the lock.

Thanks, I think you're talking about distributed dbs with the same schema and dealing with distributed transactions. I'm thinking more of scenarios where the two DBs are completely autonomous (e.g. they assign unique ids in completely different ways and the schemas differ) but you want to sync a subset of the data in them.
It sounds like there shouldn't be any conflicts. In that case it should be very simple - just save the "last record-id" that was synced for each table and continue from there.

Stack Exchange Network

Best practice/Patterns for two way data synchronisation

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Best practice/Patterns for two way data synchronisation

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions