Quite often in my work the idea of 2-way data synchronisation between database systems crops up. The classic example is two slightly different CRM systems (say, Raiser's Edge and Salesforce) and the need to have a two-way sync of Contact data between them.
API considerations aside, assuming you have a shared key to sync on, and purely thinking of the algorithm/pattern to be employed, this is a task that is often underestimated by non-techies.
For example, you have to watch out for:
- Can you easily detect which records have changed in both systems (or will you have to compare all records between the systems to detect changes)
- If you're going for a once-every-N-hours sync, how to deal with conflicts where the same record changes at more-or-less the same time in both systems
- If you're going for a real-time sync (ie an update in one system immediately triggers an update to the other system) how to handle divergence over time due to bugs or system crashes
Personally I can think of ways to tackle all this but I'm wondering if there are any well known patterns, literature or best practices that I could refer to.
-
what you describe sounds pretty close to Federated database system - is that correct?gnat– gnat02/17/2012 08:03:31Commented Feb 17, 2012 at 8:03
-
@gnat: Thanks for the link, some of the concerns are similar (eg dealing with heterogeneity), but I'm talking about syncing a subset of data from two autonomous databases whereas that seems to be more about creating a fully integrated view of everything across multiple dbs.codeulike– codeulike02/20/2012 11:48:12Commented Feb 20, 2012 at 11:48
-
57 years later, 50 upvotes but only 1 decent answer. There must be some syncronisation patterns or best practices out there?codeulike– codeulike02/14/2019 19:56:07Commented Feb 14, 2019 at 19:56
-
Seems to me that, in addition to the shared key, you need at least a last time updated stamp on each system’s records. That or you could hash the shared fields and compare hashes. It is a pretty low catch for a good Q (at least if MS’s stack cant be applied).JL Peyret– JL Peyret06/05/2023 01:43:09Commented Jun 5, 2023 at 1:43
2 Answers 2
Yes, a hard problem, easily underestimated. And could be a lot of work. If you are on Microsoft technologies, you may want to have a look at Microsoft Sync Framework here and here.
-
1Thanks, thats interesting. I'd heard of Ms Sync Framework but hadn't realised it was so generalised. It basically is a pattern for handling sync issues in general.codeulike– codeulike02/20/2012 11:35:06Commented Feb 20, 2012 at 11:35
-
2Microsoft Sync Framework was replaced by Microsoft Sync Framework Toolkit.Tomas Kubes– Tomas Kubes12/09/2014 20:56:28Commented Dec 9, 2014 at 20:56
-
I am frustrated with the docs, which not that clear, specially for Non SQL-Server ADO.NET data providers, which is my case. Besides, my workplace is looking for something that does not require adding infrastructure tables/making changes in the Production environment. So I am about to discard this one.Veverke– Veverke05/14/2017 13:24:18Commented May 14, 2017 at 13:24
There are many theories about remote site DB synchronization. First start with INSERT. handling this one is easy - as you can create a unique ID for every site (for example an initial of the site name + ID (number): site_a_177 vs. site_b_53)
So insert should not create any conflicts. the problem is the update. I don't believe that there's a 100% failure proof method, but you can start an update by "locking" the record in the remote DB, and only after you got the handle - continue with the update, and finish by syncing the update and only then release the lock.
-
1Thanks, I think you're talking about distributed dbs with the same schema and dealing with distributed transactions. I'm thinking more of scenarios where the two DBs are completely autonomous (e.g. they assign unique ids in completely different ways and the schemas differ) but you want to sync a subset of the data in them.codeulike– codeulike02/20/2012 11:51:37Commented Feb 20, 2012 at 11:51
-
It sounds like there shouldn't be any conflicts. In that case it should be very simple - just save the "last record-id" that was synced for each table and continue from there.Nir Alfasi– Nir Alfasi02/20/2012 16:19:52Commented Feb 20, 2012 at 16:19
Explore related questions
See similar questions with these tags.