SQL Server: Database integration and synchronization

Question 1

I have n SQL Server databases with the same data schema.

On the other hand i have a web application that needs to retrieve, for some entities, the union of the information stored into the n databases.

A possible solution is to apply the ETL operations (Extract, transform and load) using SSIS (SQL Server Integration Services) and BIDS (Business Intelligence Development Studio).

With the application of the ETL operations I obtain a database (warehouse) that can be used from my web application. See the picture below for a basic sketch.

schema1

The problem is that the web application not only access the warehouse for read, but also for write and update information.

If an entity is updating into the warehouse, the same update need to be applied into the original source.

NB. the information into the source and warehouse can be updating frequently.

Can replication be a good approach to sync the update into warehouse to the right source or there is another tools that I can use?

Maybe there is another strategy instead of ETL and warehouse that I can consider?

Question 2

If your update/insert always happens on a single one of the n databases AND you read a low number of rows on each request, you can use Linked Servers and Distributed Partitioned Views (DPV) to create the illusion of a single database: http://technet.microsoft.com/en-us/library/ms188299(v=sql.105).aspx. This can be done without introducing the complexity of replication.

DPV have several limitations and may give you poor performance if you are not careful. Without knowing more details of the queries you want to run - I can only advise that you try it and see if solves your problem.

Question 3

One option would be to replace your n identical SQL Server databases with writable views against your data warehouse.

In this way, you would keep all of your data in one place, but you could use views to filter the data such that each subset is viewable as a distinct entity. Since you really only have one copy of the data, it won't matter whether the insert/update/delete is done to the filtered view or to the unfiltered table. Both sides will see it at the same time and there will never be a synchronization problem.

Note that in order for a view to be writable, certain conditions must be enforced. See here.

Question 4

Sorry, but i don't understand very well what you say. Each Database needs to have updatable views in it? My web application needs to access in read/write to those views?

Question 5

My suggestion is that you create views in the main database instead of using tables in multiple databases. This would allow many users to have distinct subsets of your master data without having two data sources to get out of sync.

Question 6

so, if i have 2 database, for example A and B, i can create a View in B linked to a table in A (by linked server?) and i can read and write over that View? One of the pre-requisites of the writeble view isn't that needs to stay in the same database (A in this case)?

Question 7

@ced - I'm suggesting you do away with database B and just keep one database (A - your data warehouse).

Question 8

As far as I know, the update of the Warehouse data is not a good practice. But, if you really need to perform this action, you can make the following trick:

Replication is a good idea in case when you can consent a delay between data changing and receiving the changed data in "select" query.
The updates made by your application must be performed against the needed Data Source and, by replication mechanism, will be transferred to the Warehouse Data. This can be made by using mediator, main task of which will be the needed Data Source for change making definition.
Transactional replication need to be used from the Data Sources to the Warehouse Data, which will synchronize you data in the Warehouse Data database.

In this case you can avoid the lock during the Warehouse Data updating.

Question 9

The warehouse is only a possibile strategy. Any other possibile approaches is accepted!

Question 10

Warehouse data is the name of your database. The main idea was to separate input and output threads. All input (changes) need to be followed to the Data Source database and the output need to be taken from the Warehouse Data. Transfer from the Data Source databases to the Warehouse Data will be made by replication mechanism.

Question 11

Another option would be to leverage Change Data Capture to look for specific writes to the Data Warehouse that need to be written back to your source DB's. However, if you are using CDC on your source DB's as well, this can be tricky since each update will essentially be round tripped.

This approach would allow you to use SSIS as the method to do your updates both directions.

Thomas Kejser Thomas Kejser 6,2182 gold badges24 silver badges46 bronze badges · Answer 1 · 2013-12-28 15:06:57Z

If your update/insert always happens on a single one of the n databases AND you read a low number of rows on each request, you can use Linked Servers and Distributed Partitioned Views (DPV) to create the illusion of a single database: http://technet.microsoft.com/en-us/library/ms188299(v=sql.105).aspx. This can be done without introducing the complexity of replication.

DPV have several limitations and may give you poor performance if you are not careful. Without knowing more details of the queries you want to run - I can only advise that you try it and see if solves your problem.

Joel Brown Joel Brown 12.6k2 gold badges33 silver badges46 bronze badges · Answer 2 · 2013-10-26 12:50:35Z

0

One option would be to replace your n identical SQL Server databases with writable views against your data warehouse.

In this way, you would keep all of your data in one place, but you could use views to filter the data such that each subset is viewable as a distinct entity. Since you really only have one copy of the data, it won't matter whether the insert/update/delete is done to the filtered view or to the unfiltered table. Both sides will see it at the same time and there will never be a synchronization problem.

Note that in order for a view to be writable, certain conditions must be enforced. See here.

Share

Improve this answer

answered Oct 26, 2013 at 12:50

Joel Brown's user avatar

Joel Brown Joel Brown

12.6k2 gold badges33 silver badges46 bronze badges

4

Sorry, but i don't understand very well what you say. Each Database needs to have updatable views in it? My web application needs to access in read/write to those views?

ced
– ced

2013年10月28日 13:05:54 +00:00
Commented Oct 28, 2013 at 13:05
My suggestion is that you create views in the main database instead of using tables in multiple databases. This would allow many users to have distinct subsets of your master data without having two data sources to get out of sync.

Joel Brown
– Joel Brown

2013年10月28日 22:30:25 +00:00
Commented Oct 28, 2013 at 22:30
so, if i have 2 database, for example A and B, i can create a View in B linked to a table in A (by linked server?) and i can read and write over that View? One of the pre-requisites of the writeble view isn't that needs to stay in the same database (A in this case)?

ced
– ced

2013年10月29日 08:28:56 +00:00
Commented Oct 29, 2013 at 8:28
@ced - I'm suggesting you do away with database B and just keep one database (A - your data warehouse).

Joel Brown
– Joel Brown

2013年10月29日 10:47:27 +00:00
Commented Oct 29, 2013 at 10:47

Add a comment |

Sandr Sandr 2761 gold badge2 silver badges5 bronze badges · Answer 3 · 2013-10-27 15:08:01Z

As far as I know, the update of the Warehouse data is not a good practice. But, if you really need to perform this action, you can make the following trick:

Replication is a good idea in case when you can consent a delay between data changing and receiving the changed data in "select" query.
The updates made by your application must be performed against the needed Data Source and, by replication mechanism, will be transferred to the Warehouse Data. This can be made by using mediator, main task of which will be the needed Data Source for change making definition.
Transactional replication need to be used from the Data Sources to the Warehouse Data, which will synchronize you data in the Warehouse Data database.

In this case you can avoid the lock during the Warehouse Data updating.

The warehouse is only a possibile strategy. Any other possibile approaches is accepted!
Warehouse data is the name of your database. The main idea was to separate input and output threads. All input (changes) need to be followed to the Data Source database and the output need to be taken from the Warehouse Data. Transfer from the Data Source databases to the Warehouse Data will be made by replication mechanism.

njkroes njkroes 6554 gold badges7 silver badges15 bronze badges · Answer 4 · 2013-11-01 19:41:50Z

Another option would be to leverage Change Data Capture to look for specific writes to the Data Warehouse that need to be written back to your source DB's. However, if you are using CDC on your source DB's as well, this can be tricky since each update will essentially be round tripped.

This approach would allow you to use SSIS as the method to do your updates both directions.

Stack Exchange Network

SQL Server: Database integration and synchronization

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

SQL Server: Database integration and synchronization

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions