SQL Litespeed Backup hung with wait type PREEMPTIVE_OS_REPORTEVENT   3 comments


Today I came across one more issue which was happening frequently in customer’s environment. Here is the brief background about the same.

DBA’s are complaining that Litespeed backups are going into hung state very often and they need to restart the SQL Server to get this issue  fixed and problem is happening often on critical servers . Hence we decided to take a close look on what is happening and why recently this issue happening so often.

When we started troubleshooting first question that came to our mind is,  how DBA’s reached to the conclusion that  backup is hung.  As per DBA’s customer has critical application hosted and they are taking regular log backup on that server. Suddenly they noticed that there is a blocking on the server also they got an alert for TLOG backup failure.

Below is the summary of our investigation and troubleshooting approach to resolve this issue.

We ran query select * from sysprocesses where cmd like ‘%backup%’

Output showed that this backup is getting blocked by spid 61

We ran blow command to see why this is getting blocked and what headblocker is doing?

select * from sysprocesses where spid =61

Spid   kpid waittype  waittime         lastwaittype                                   status               program_name   cmd

—— —— ——– ——————– —————————————————————————————————-

61     3192 0x0088   1610326653 BACKUPTHREAD                               suspended       SQL Litespeed KILLED/ROLLBACK

61     7340 0x0000                     0  PREEMPTIVE_OS_REPORTEVENT runnable           SQL Litespeed KILLED/ROLLBACK

(2 row(s) affected)

Here is the input buffer for SPID 61.

DBCC INPUTBUFFER(61)

BACKUP database [Backuphung] TO VIRTUAL_DEVICE=’VDI_201AFBBD-4C

56-4A1A-92BF-72BC88DD4C5A_0′, VIRTUAL_DEVICE=’VDI_201AFBBD-4C56-4A1A-92B

Have a look at the  wait time in the sysprocess output it was in KILLED/ROLLBACK state since long time.

So we went through the  ERRORLOG from the time when this query started its  execution, just to see if we find any error message or some other useful information.

In the ERRORLOG we see the dumps getting generated with same command as mentioned in the inputbuffer output.

***Stack Dump being sent to c:\MSSQL10.MYINSTANCE\MSSQL\LOG\SQLDump0001.txt
* *******************************************************************************
*
* BEGIN STACK DUMP:
*   12/23/12 02:01:16 spid
*
* Location:  bckvd.cpp:
* Expression:  req.m_Command.bytesTransferred == sizeOfThisChunk || req.m_Command.completionCode != 0
* SPID:   62
* Process ID:  7214
*
* Input Buffer 510 bytes -
*  BACKUP database [Backuphung] TO VIRTUAL_DEVICE='VDI_201AFBBD-4C
*  56-4A1A-92BF-72BC88DD4C5A_0', VIRTUAL_DEVICE='VDI_201AFBBD-4C56-4A1A-92B

***Stack Dump being sent to c:\MSSQL10.MYINSTANCE\MSSQL\LOG\SQLDump0002.txt
* *******************************************************************************
*
* BEGIN STACK DUMP:
*   01/12/11 02:01:21 spid
*
* Location:  bckfio.cpp:1351
* Expression:  0
* SPID:   62
* Process ID:  7214

SQL Server Assertion: File: <bckfio.cpp>, line=1351 Failed Assertion = ‘0’. This error may be timing-related. If the error persists after rerunning the statement, use DBCC CHECKDB to check the database for structural integrity, or restart the server to ensure in-memory data structures are not corrupted.

Before looking into the dumps we decided to take a close look on ERROLOG once again.

We saw below errors as well.

2012-12-23 02:01:56.76 spid61      Error: 17066, Severity: 16, State: 1.

2012-12-23 02:01:56.76 spid61      SQL Server Assertion: File: <bckvd.cpp>, line=2046 Failed Assertion = ‘req.m_Command.bytesTransferred == sizeOfThisChunk || req.m_Command.completionCode != 0’. This error may be timing-related. If the error persists after rerunning the statement, use DBCC CHECKDB to check the database for structural integrity, or restart the server to ensure in-memory data structures are not corrupted.

2012-12-23 02:01:56.78 spid61      Error: 3624, Severity: 20, State: 1.

2012-12-23 02:01:56.78 spid61      A system assertion check has failed. Check the SQL Server error log for details. Typically, an assertion failure is caused by a software bug or data corruption. To check for database corruption, consider running DBCC CHECKDB. If you agreed to send dumps to Microsoft during setup, a mini dump will be sent to Microsoft. An update might be available from Microsoft in the latest Service Pack or in a QFE from Technical Support.

2012-12-23 02:02:56.79 Backup      Error: 18210, Severity: 16, State: 1.

2012-12-23 02:02:56.79 Backup      BackupIoRequest::ReportIoError: write failure on backup device ‘VDI_F73C77A5-D7C0-44F2-876C-2FD7875B87E9_13’. Operating system error 112(failed to retrieve text for this error. Reason: 1815).

2012-12-23 02:02:56.80 spid61      Error: 18210, Severity: 16, State: 1.

2012-12-23 02:02:56.80 spid61      BackupVirtualDeviceFile::RequestDurableMedia: Flush failure on backup device ‘VDI_F73C77A5-D7C0-44F2-876C-2FD7875B87E9_17’. Operating system error 995(failed to retrieve text for this error. Reason: 1815).

In the ERRORLOG we found that this backup is first hitting into one OS error 112 and then it is going into KILLED/ROLLEDBAK State.

We checked what does OS error 112 means? below is the output of net helpmsg.

backup

As per the error message this is happening because there is not enough space on the disk but on the contrary there was enough space available on the disk.

We looked into the dump as well in case if we get some more details.

Here are the two threads for SPID 61 that were doing the backup.  

Parent Thread:
 ==============
 0:90> kc
  Call Site
 ntdll!ZwSignalAndWaitForSingleObject
 kernel32!SignalObjectAndWait
 sqlservr!SOS_Scheduler::SwitchContext
 sqlservr!SOS_Scheduler::SuspendNonPreemptive
 sqlservr!EventInternal<Spinlock<149,1,0> >::Wait
 sqlservr!BackupThread::Wait
 sqlservr!BackupStreamList::WaitUntilDone
 sqlservr!BackupOperation::CloseMedia
 sqlservr!BackupOperation::Close
 sqlservr!BackupOperation::~BackupOperation
 sqlservr!BackupOperation::`scalar deleting destructor'
 sqlservr!CPtr<BackupOperation>::Delete
 sqlservr!`BackupEntry::BackupDatabase'::`1'::catch$1
 msvcr80!_CallSettingFrame
 msvcr80!__CxxCallCatchBlock
 ntdll!zzz_AsmCodeRange_End
 sqlservr!BackupEntry::BackupDatabase
 sqlservr!CStmtDumpDb::XretExecute
 sqlservr!CMsqlExecContext::ExecuteStmts<1,1>
 sqlservr!CMsqlExecContext::FExecute
 sqlservr!CSQLSource::Execute
 sqlservr!process_request
 sqlservr!process_commands

 Child Thread:
 ==============
0:061> kc
 Call Site
 kernel32!RaiseException
 msvcr80!_CxxThrowException
 sqlservr!TurnUnwindAndThrowImpl
 sqlservr!ex_raise2
 sqlservr!ex_raise
 sqlservr!utassert_fail
 sqlservr!BackupFile::Quiesce
 sqlservr!BackupMediaIo::Discard
 sqlservr!BackupMedium::Discard
 sqlservr!BackupStream::DoClose
 sqlservr!BackupStream::ThreadMainRoutine
 sqlservr!BackupThread::ThreadBase
 sqlservr!SubprocEntrypoint

From the dump it does not look like there is some issue with SQL Server and we suspect VDI application (Litespeed)  is throwing an error and SQL Server does not know how to handle VDI error and resulting assertion.

We suspect that the issue is triggered by the following errors

2012-12-23 02:02:56.79 Backup      Error: 18210, Severity: 16, State: 1.

2012-12-23 02:02:56.79 Backup      BackupIoRequest::ReportIoError: write failure on backup device ‘VDI_F73C77A5-D7C0-44F2-876C-2FD7875B87E9_13’. Operating system error 112(failed to retrieve text for this error. Reason: 1815).

At the same time we thought since SQL Server does not seems to be a culprit here, could it be a problem with litespeed?

We tried to understand how Litespeed takes the backup and we got below kb article which is mentioning about this error message.

“as per this KB “https://support.quest.com/SolutionDetail.aspx?id=SOL51287

This may occur depending on how Litespeed extends the backup files. Litespeed initially tries to extend the file by 2146435072. If it fails, it drops it to half that amount. It will continue to drop it by half its amount and if it keeps failing, it will eventually fail completely with the 112 out of space error.

Below is the available in kb :

Script your Litespeed backup job and add the @ioflag.

EX: @ioflag=OVERLAPPED, @ioflag=SEQUENTIAL_SCAN, @ioflag=NO_BUFFERING, @ioflag=FILE_EXTENSION=-1

Also as per the below kb they have permanently resolved this issue in the future version on SQL Litespeed.

https://support.quest.com/SolutionDetail.aspx?id=SOL68482

Hope this will help you in resolving another backup hang issue.

3 responses to “SQL Litespeed Backup hung with wait type PREEMPTIVE_OS_REPORTEVENT

Subscribe to comments with RSS.

  1. Nice blog and useful. We have encountered this error many times and we had to restart SQL Server to clear the hung thread, so is outage on critical application. Now we came to know that Litespeed upgrade (5.0 version to 6.5) is required on all the servers for problem resolution.

  2. Pingback: blogs | manishkumar1980

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: