FIX:Replicated transactions are waiting for next Log backup or for mirroring partner to catch up   2 comments


Recently I have come across one challenging issue and thought of sharing with you all.

We have a SQL Server instance with replication setup. This replication setup is little different than normal. Customer is using script based replication setup wherein for each table they have separate publisher and subscriber. There were around 100 such publishers.

The issue started when log reader agent at subscriber failed with the error message below:

"Replicated transactions are waiting for next Log backup or for mirroring partner to catch up."

Now this is a very common error message and as it says – either the TLOG backups need to take place else some latency with db mirroring setup. We had no mirroring setup and TLOG backups were taking place fine as per the schedule yet the error message.

So, the troubleshooting starts now. The first thing we did was to research about the possible causes/resolution for this error on internet. Search engines pointed towards the two reasons/resolutions as below:

  1. EXEC sp_replicationdboption ‘PublisherDB/DistributerDB’,’sync with backup’,false

If it is set on publisher means, all transactions must be backed up before being delivered to the distribution database.

If it is set on distributer means, the transaction log for the publication database is not truncated until transactions that have been propagated to the Distributor have been backed up.

  1. Enable trace flag 1448

This trace flag allows the Log Reader Agent to continue replicating changes regardless of the mirroring state.

Apparently none of the above was applicable in our case.

Now what’s next?

Now, this is the time to connect all the dots and figure out what actually happened and why did it break at first place so that we can get some more clue.

We started checking ERRROLOG on the servers found that following is the sequence of events which led to the error noted above:

ERRORLOG at PUBLISHER instance

1- Database was restored on publisher while transaction replication was running on this database.

2014-09-29 04:27:30.12 Backup      Restore is complete on database 'PROBLEM_PUB'.  The database is now available.
2014-09-29 04:27:30.14 Backup      Database was restored: Database: PROBLEM_PUB, creation date(time): 2013/08/24(01:29:26), first 
LSN: 43420:1203835:1, last LSN: 43423:5501:1, number of dump devices: 1, device information:(FILE=1, TYPE=DISK: 
{'F:\BACKUP\PROBLEM_PUB_28Sep2014_01.bak'}). Informational message.No user action required.

 

2- After couple of hours it started throwing LSN mismatch error – which is expected, because there is an LSN mismatch between distribution db metadata and publisher db TLOG file after restore

2014-09-29 06:00:56.79 spid57      The log scan number (44453:245054:1) passed to log scan in database 'PROBLEM_PUB' is not valid. 
This error may indicate data corruption or that the log file (.ldf)does not match the data file (.mdf). 
If this error occurred during replication, re-create the publication. Otherwise, restore from backup if the problem results in a failure 
during startup.

3- Database was again backed up at publisher – This is to reconfigure the replication at subscriber.

2014-09-29 17:50:22.20 Backup      Database backed up. Database: PROBLEM_PUB, creation date(time): 2013/08/24(01:29:26), 
pages dumped: 25463599, first LSN: 43423:52007:1, last LSN: 43427:1064843:1, number of dump devices: 1,
 device information: (FILE=1, TYPE=DISK: {'F:\BACKUP\\PROBLEM_PUB_29Sep2014_01.bak'}). 
This is an informational message only. No user action is required. 

ERRORLOG on SUBSCRIBER database

4- Database was restored on Subscriber

2014-09-29 18:22:38.98 Backup      Restore is complete on database 'PROBLEM_SUB'.  The database is now available.
2014-09-29 18:22:38.99 Backup      Database was restored: Database: PROBLEM_SUB, creation date(time): 2013/08/24(01:29:26), 
first LSN: 43423:52007:1, last LSN: 43427:1064843:1, number of dump devices: 1, device information: 
(FILE=1, TYPE=DISK: {'F:\BACKUP\PROBLEM_PUB_29Sep2014_01.bak'}). Informational message. No user action required.

5- Replication was again configured and started throwing messages “Replicated transactions are waiting for next Log backup or for mirroring partner to catch up”

Now to troubleshoot further we enabled verbose log on logreader agent.

2014-09-30 03:10:18.601 Publisher: {call sp_replcmds (500, 0, 0, , 0, 500000)}
2014-09-30 03:10:18.603 OLE DB DISTOLE 'PROBELMINSTANCE\PROBLEM_PUB': sp_MSget_last_transaction @publisher_id = 2
, @publisher_db = N'PROBLEM_PUB', @for_truncate = 0x1
2014-09-30 03:10:18.604 Publisher: exec sp_replcounters N'PROBLEM_PUB'
2014-09-30 03:10:18.605 OLE DB Publisher 'PROBELMINSTANCE\PROBLEM_SUB': exec sp_replcounters N'PROBLEM_PUB'

2014-09-30 03:10:18.606 Status: 16384, code: 22522, text: 'Replicated transactions are waiting for next Log 
backup or for mirroring partner to catch up.'.

 

From the verbose log we can see logreader agent is executing 2 main stored procedures

We tried to execute both the stored procedures manually. sp_replcmds returns no result but replcounts returns 23059 commands to be replicated however replbegnlsn and replnextlsn value is 0

repl

As we can see replcmds is not showing any commands to replicate but sp_replcounteds shows commands and replbegnlsn and replnextlsn shows 0 which clearly indicate there is LSN mismatch in ldf files. Also we ran following query on publisher database to see if there are any transactions marked as “replicate” and it also showed some results.


select top 10 [transaction id], * from ::fn_dblog(null, null)

where [description] = 'REPLICATE'

order by [transaction id]

This confirms that there are stale entries in T-Log of the Published DB from previous replication drop and creation.

We contacted Microsoft as well to understand the behaviour, also to find out on what condition this error is thrown.

As per MS, when the sp_replcmds doesn’t return any result, but sp_replcounters returned the non-zero number in the “replicated transaction” column, the log reader agent code prints the error “Replicated transactions are waiting for next Log backup or for mirroring partner to catch up” directly.

We have below 2 options available to resolve this issue however there was no surety that both will resolve the issue, since no other option available we decided to try both one by one.

Option 1: run sp_repldone

Command:

SP_REPLDONE NULL,NULL,0,0,1 

Bypass all old transactions from the publication database and let the log reader agent start to the next new transaction.

Option 2: Create a new ldf file which will remove all the transaction and a fresh new empty file will clear all the LSN.

Command:

ALTER DATABASE PROBLEM_PUB REBUILD LOG ON (NAME=PUB_log, FILENAME='D:\PUB_log.ldf') 

We tried option 1 first, ran the command and then remove the replication and recreated a dummy publication and tried again but it did not fix the issue we still have the same error. This confirmed that TLOG is the issue and rebuild log is the final resort.

We then tried Option 2 and rebuild the log file. After rebuilding the log file we reconfigure the replication and replication started working fine.

Hope this will help you in troubleshooting such issues.

2 responses to “FIX:Replicated transactions are waiting for next Log backup or for mirroring partner to catch up

Subscribe to comments with RSS.

  1. Hey Manish,

    Nice article and very well explained. I guess you meant the log reader failed at the publisher at the beginning of the blog. Rebuilding the log sometimes creates logical consistency errors unless all the log is clear without any open transactions and all dirty pages are flushed. I would always hope there would be a better solution. But restoring on publisher database without disabling the replication is where is the issue should be mitigated. 🙂

    Regards,
    Manu

  2. Thanks Manu,

    This is a pull subscription that’s why I mention at subscriber. Yes, rebuilding log is not advisable but this is a special scenario which has caused issue and MS also recommended to rebuild the log as no choice…and yes..rebuild should be done with precautions 🙂

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: