Knowledge is no fun, unless you share it!!! :)

Tuesday, 20 August 2013

SQLSERVER Replication Implementation

There are various types of replication:

Transactional Replication.
Transactional Replication with Updatable Subscriptions
Snapshot Replication
Merge replication

There are a lot of people who subscribe for magazines like readers digest or India today or get organizational news update every month in their mail box. Information about the latest organizational/national development is communicated over these email/magazines. The information is directly dropped in mail box or at the door step. We simply need to subscriber for the desired magazine or mailer. Similarly the replication feature in MS SQL Server moves the data from a remote server to our local server boxes via publications and subscriptions mechanism. There are various reasons and scenario where replications can be considered a very strong tool for data relay. We could consider replication for,

Getting the data closer to the user, consider a server stationed in Germany and since the business also operates at Bangalore in India we need the data quite frequently. Now every time we need the data from the Products or Sales table we need to use a linked server, connect to the German Server and pull the data. This will have impacts like:
We need to rely heavily on the network connectivity each time we pull the data.
Secondly the source server will have to bear the load of data reads.
Also these ad-hoc queries might create conflicts if there is any exclusive lock on a record on account of any transaction taking place on the source data. Further to this, the ad-hoc pull queries will run for a really long time in this case eating up network bandwidth and also causing unnecessary load on the source.
And if the data is to be pulled on a regular basis this ad-hoc queries seem really out of place option.
Consider replication for removing impacts of heavy read intensive operations like report generation etc. Replication is a very good option when the desired data is simply read only and the update to the source is not intended.
Consider replication when server pulling the data intends to own the pulled data i.e. make changes to the pulled version without impacting the source. Replication provides the desired autonomy to the subscriber.

Replication traditionally takes the Publisher/Subscriber analogy. It’s quite similar to the magazine example. For any magazine there is a publisher who publishes information in the form of articles. Once the magazine (which is collection of article and is called publication) is published there needs to be a distributor who will distribute it to people like you and me who are actually the subscribers. This forms the standard of the entire Publisher/Subscriber cycle. But there could be changes in the setup like there is a publisher who also acts as distributor or there could be a distributor who is also a subscriber. The key terms are:

Article: The article is the information that is going to be replicated. It could be a table, a procedure or a filtered table etc.

Publisher: The publisher is the database on the source server which is actually replicating the data. The publication which is collection of articles (various objects in the database) is published by the publisher.

Distributor: The distributor can be considered as the delivery boy who brings the publications to the subscriber. The distributor could himself be a publisher or a subscriber.

Subscriber: Subscriber is the end receiver of the publication who gets the data in the form of subscriptions. The changes published are propagated to all the subscribers of the publications through the distributor. The subscriber simply has to create a subscription on the publication from its end to receive the data.


Transactional Replication:

In the transactional replication the transactions occurring on the published articles from the publisher are forwarded on to the distributor who in turn replicates the same and commits them on the subscribers. Subscribers can use this data for read only purposes. As transactions are small items to publish the latency for transactional replication is very low. As far as the autonomy is concerned as the data is read only type each of the subscribers cannot update the data and hence there is absolutely no autonomy in this type of replication.

Suppose there is a ticket booking web site, all the tickets booked are centrally stored in the database hosted at New Delhi. There are distribution centers in every city in the country where the bookings are received and the booked ticket shipped at the addresses provided. All the tickets booked from Hyderabad needs to be shipped to the respective customers. The Hyderabad distribution center could setup a filtered (get bookings for Hyderabad only) transactional replication so that every new booking (transaction) is replicated to their center with minimal delay (almost immediately). They need a read only access to the replicated data so transactional replication fits the bill. They could dispatch the booked ticket ASAP with transactional replication setup.

Key facts of transactional replication:

As replication happens on a transaction, the latency of replication is very low.
The subscription is read only, hence there almost no autonomy for the subscribers.

Transactional Replication with Updatable Subscriptions:

This is similar to the transactional replication with the additional capability for the subscribers to be able to update the published articles. This way there is a gain in the autonomy as the subscribers can update the existing data, the latency is maintained at the same level. In most of the scenarios this considered to be the best solution. This setup uses the two-phase commit process to keep the Publisher/Subscriber in sync. The two phase commit process uses the MSDTC so that the changes are made simultaneously at the publisher-subscribers end. So before setting up this kind of replication it’s important to have the MSTDC up and running.

Snapshot Replication:
Snapshot replications as the distribution method moves the entire copy of published articles through the distributor to the subscriber. This type of replication method provides the subscribers with a very high autonomy. That means that they can update or changes their local copy of subscribed data without causing any changes on the publishers data. The next time the data gets syncronized the entire snapshot gets overwritten. Latency for such configuration setup is also high because as the entire publication gets syncronized. This kind of replications mainly finds use in OLAP servers. The data for OLAP might be pulled every week or fortnight and would be readonly for reporting and analytical processing purposes.

Merge Replication:
Merge replication allows each of the subscribers to edit their piece of subscriptions independently and at some point these changes are merged together and synced amongst all the subscribers and publisher on the whole. This seems quite complex isn’t it? It is, but with wise configuration, this setup could be a real asset as this solution provides the highest level of autonomy to each of the subscribers. This setup requires a very wise conflict resolution. Suppose a record is updated with value 10 by a subsA and the same record gets updated to 5 by subsB and also the publisher updates it to 15 now changes from both the subscribers will be merged followed by the publisher’s change. In this case the publisher’s change wins. For changes involving only the subscribers we need to have a conflict resolution in place, example it has been configured that for only subscriber’s changes SubsA wins.