Yesterday we hit the dreaded conflict in our production/DR SQL peer-to-peer replication setup. My first reaction is usually “oh <expletive>, we’re going to need to rebuild replication”, but we happened to have the best possible scenario where we had an update-update conflict.
I’d like to note that I personally hate working with SQL Server peer-to-peer replication (we’re running SQL 2008 R2 version), and it looks like some of the limitations of the solution are still an issue in SQL 2012.
- We have to quiesce (i.e. stop the application) in order to add new articles (tables, views etc), to the replication topology. Big pain + off hours work.
- Plus, unlike with MySQL replication when setup in multi-master, I can’t just not log my correction statements so they don’t replicate back.
- This is why I’m excited for the Always On Availability groups in SQL 2012/2014, with a multi-subnet setup. I’m still evaluating our environment options but this helps address some of the limitations in our peer-to-peer replication environment.
- 2 node setup with peer-to-peer replication, across two different domains. (I’m planning on documenting how this was setup in the future, testing with 2012)
- Both servers running SQL 2008 R2 Enterprise & Windows 2008 R2 Enterprise
Microsoft has the tasks outlined for reference: Conflict Detection in Peer-to-Peer Replication
Note that the recommended approach is to reinitialize the slave – this requires downtime of the application in order to get a quiesced system.
In our case, we knew our conflict was most likely due to bringing up the same data in both sites as part of our DR setup work. The application does an update of the data, which caused this conflict to occur. This requires knowledge of the data to determine if the conflicts can be accepted.
|1||Notification received||NOTE: If using Peer-to-peer replication, make sure that alerts have been enabled.See How to: Configure Predefined Replication Alerts (SQL Server Management Studio) for details.
Database Mail will need to be configured.For example, this was the alert we were receiving, similar to both Master & Slave:
DESCRIPTION:A conflict of type 'Update-Update' was detected at peer 1 between peer 2 (incoming), transaction id 0x000000000ce6b386 and peer 1 (on disk), transaction id 0x000000000dcad30d
|2||Allow peer to peer continue on conflict||Now that we have been notified of this conflict, on my DR site I wanted to continue replication on conflict so I could identify my errors.p2p_continue_onconflict needs to be set to true
|3||Restart the Distribution Agent||
|4||View the conflicts||See View Conflicts for details – How to: Validate Data at the Subscriber (SQL Server Management Studio)
|5||Disable continue on conflict detection check||p2p_continue_onconflict needs to be set to FALSE, so we will be notified of future conflicts.
DECLARE @publication AS sysname
|6||Repeat on Primary node||If Resolved conflict was found (step 4 part 4), then repeat steps 2-5 on the Primary node|
|7||Run Validation Check||Run the validation check to see if there are any differences between the Primary & Secondary nodes
This can be run through the Replication Monitor for the publication, or as T-SQL:
— Validate Publication
Once finished (depends on the time of your system, this seems to run in background – this is about 10 minutes on my 90 GB system), you can select to look for differences. Note that if you have timestamp datatypes, you will not get a consistent checksum on your validation checks.
select * from msdb..sysreplicationalerts