SQL Agent Job reports error “SQLServerAgent is not currently running ” Though Agent Service is running   5 comments


Hello Friends,

Hope you are enjoying our blog.

Today i am going to share one more examples of real troubleshooting where customer comes with an issue but when we actually found the root cause it turns out to be completely different all together from what we expected. To know more, look at how we worked on this issue and most amazing thing was how we resolved this one.

According to the customer when he was starting SQL Server Agent service it started successfully with status “service started successfully” but when we are running any jobs it was throwing below error.

SQLServerAgent is not currently running so it cannot be notified of this action. (Microsoft SQL Server, Error: 22022)

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=10.50.2500&EvtSrc=MSSQLServer&EvtID=22022&LinkId=20476

 ——————————

 BUTTONS:

 OK

 ——————————

Surprising part was if SQL Server Agent is running successfully then why this error message.
First thing for troubleshooting is to start with SQLAGENTLOG.OUT to see if any error message reported there. I opened SQLAGENTLOG.OUT file and it was completely blank!!! Phew!!

It was difficult to believe this, so I decided to start SQLAgent service from command prompt. I started it from Command prompt and here is the result.

C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn>SQLAGENT90.EXE -I Instancename -c
Microsoft (R) SQLServerAgent 10.50.2500.0
Copyright (C) Microsoft Corporation.

Nothing is getting reported in the command prompt as well. This is interesting, no clue at all!! We got stuck here and most of the time we give up in such situations, but in this case we decided to try something else.
I had 2 ideas as a next action plan.

1- Take a Procmon while running the service.
2- Take a Dump of SQLAGENT90.EXE.

We tried first action plan of collecting Procmon while running the service. It did not help much then we moved to next step and took dump of SQLAGENT90.EXE.

Since this is not an error message we can not directly take a crash dump so challenge here is, how should we take dump on the service? Luckily this was Windows Server 2008. Windows Server 2008 we have a facility where we can directly take dumps from the task manager for a process.
Here is the screen shot on how to take dump. Check the below screen shot.

Once dump is generated it will give you the location where it got generated.

Go to the dump location and get the DMP file.

Here we go with our typical Dump analysis
First step is to set the symbol path to Microsoft symbols server. On the Windbg command window type below command.
.sympath srv*c:\ publicsymbols*http://msdl.microsoft.com/download/symbols;

Now let’s load the symbols from Microsoft symbols server:

Again type .reload /f and hit enter.

Let’s confirm whether we have symbols loaded or not, remember this time we took dump on SQLAGENT.EXE.

0:002> lmvm SQLAGENT
 start end module name
 00000000`00b40000 00000000`00bbd000 SQLAGENT (pdb symbols) c:\publicsymbols\SQLAGENT.pdb\D6244891FD4647DBBA866D05ED4825891\SQLAGENT.pdb
 Loaded symbol image file: SQLAGENT.EXE
 Image path: C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Binn\SQLAGENT.EXE
 Image name: SQLAGENT.EXE
 Timestamp: Fri Jun 17 01:28:00 2011 (4DFB1010)
 CheckSum: 0007337E
 ImageSize: 0007D000
 File version: 2009.100.2500.0
 Product version: 10.50.2500.0
 File flags: 8 (Mask 3F) Private
 File OS: 40000 NT Base
 File type: 1.0 App
 File date: 00000000.00000000
 Translations: 0409.04b0
 CompanyName: Microsoft Corporation
 ProductName: Microsoft SQL Server
 InternalName: SQLAGENT
 OriginalFilename: SQLAGENT.DLL
 ProductVersion: 10.50.2500.0
 FileVersion: 2009.0100.2500.00 ((KJ_PCU_Main).110617-0038 )
 FileDescription: SQLAGENT - SQL Server Agent
 LegalCopyright: Microsoft Corp. All rights reserved.
 LegalTrademarks: Microsoft SQL Server is a registered trademark of Microsoft Corporation.
 Comments: SQL

Since this is full dump there can we multiple threads and we have no idea which thread is of our interest. To know that run below command on windbg.

0:000> ~
 # 0 Id: 155c.137c Suspend: 0 Teb: 000007ff`fffde000 Unfrozen
   1 Id: 155c.15f4 Suspend: 0 Teb: 000007ff`fffda000 Unfrozen
 . 2 Id: 155c.1a68 Suspend: 0 Teb: 000007ff`fffd4000 Unfrozen
   3 Id: 155c.1200 Suspend: 0 Teb: 000007ff`fffae000 Unfrozen
   4 Id: 155c.158c Suspend: 0 Teb: 000007ff`fffd8000 Unfrozen
   5 Id: 155c.128c Suspend: 0 Teb: 000007ff`fffd6000 Unfrozen

Now if you look at all the threads carefully you will find a “.” dot just before the thread 2. That is how we came to know this is the thread of our interest. Currently we are on thread 0 so we need to switch to thread 2. Below is the command to switch to a thread.

0:000> ~[2]s
 ntdll!NtWaitForSingleObject+0xa:
 00000000`7721135a c3 ret
 Run below command to dump the stack.
0:002> kc
 Call Site
 ntdll!NtWaitForSingleObject
 KERNELBASE!WaitForSingleObjectEx
 KERNELBASE!GetOverlappedResult
 sqlncli10!Np::ReadSync
 sqlncli10!BATCHCTX::SNIRead
 sqlncli10!BATCHCTX::ReadPacket
 sqlncli10!CConnection::PreLogin
 sqlncli10!CTdsParser::DoConnect
 sqlncli10!CTdsParser::OpenServerConnection
 sqlncli10!ConnectIt
 sqlncli10!DoDlgConnection
 sqlncli10!SQLDriverConnectW
 odbc32!SQLInternalDriverConnectW
 odbc32!SQLDriverConnectW
 sqlsvc!QSQLLogonExImpl
 sqlsvc!QSQLLogonExWithErrorHandling
 SQLAGENT!ConnVerifyConnectionOnStart
 SQLAGENT!DumpAndCheckServerVersion
 SQLAGENT!ServiceMain
 sechost!ScSvcctrlThreadA
 kernel32!BaseThreadInitThunk
 ntdll!RtlUserThreadStart

Looking at the highlighted part in the stack, looks like we got stuck while login to SQL Server.
There could be 2 possibilities.

1- We have some permission issues while login to SQL Server
2- We have some login issue with over all SQL Server itself.

We checked and confirmed that Agent account had sysadmin rights on the SQL Server. This signifies there is no permission issue with Agent.
So we decided to go ahead with other option.

We were not confident here that login to SQL Server could be an actual problem because customer is connected to SQL Server from his laptop and showing us the job failure status .This means Login is working fine from the client location but, what about login from the server itself?? Because SQL Agent is connecting to SQL Server using client tools of Server itself.

I asked customer to connect to SQL Server from Management Studio of server. What I see here was quite interesting Management Studio login got hung. This is interesting, isn’t it??

I tried to connect to SQL Server from Command prompt of server that also got hung. Now I was confident we are going in right direction.
It leads us to the conclusion that the actual problem is with client tools of Server. To make it 100% sure I created UDL (to know how to create udl, check this blog).

UDL worked in the first attempt because UDL by default goes with OLEDB Provider for SQL Server, but we are facing issue with native client that’s why I changed the provider of SQL Server to Native Client to check it is getting hung or not, check the blow screen on how to change it.

I changed it to Native Client 10.0 and then tried to connect and this also got hung.

Come on Manish! We are going in right direction and yes yes….We can crack this now.
Now I went back to the dump stack which we took earlier and checked it carefully, looked at below 2 frames.

 sqlncli10!BATCHCTX::ReadPacket
 sqlncli10!CConnection::PreLogin
 sqlncli10!CTdsParser::DoConnect

As per the above call looks like we got stuck at PreLogin that means we are not even going inside the SQL Server, problem is before that.
I consulted few of my friends who work on the connectivity side to check what checks we do during PreLogin.
As per them for the security reasons during the PreLogin we use SSL Encryption using Schannel.dll and this is registered at below registry location
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL
So one of the reasons for this problem could be the encryption cipher used during the SSL Encrypted Pre-Login to SQL Server is corrupt/disabled. Since the cipher is corrupt/disabled the pre-login cannot complete as the data cannot be encrypted/decrypted correctly.

To resolve this issue follow these steps:

1. Open regedit.
2. Rename the SCHANNEL key to old from below location:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL === > make it to SCHANNEL_Old
3. When you do this you might receive an error. (This can be ignored).
4. Refresh the branch in the registry and the SCHANNEL key should now be recreated with default settings.
NOTE: No Reboot is required on Windows 2003/2008 but a reboot is required on Windows 2000.

After making these changes if we try to login to SQL Server this key will be re-populated.
Now any attempt of login to SQL Server from local server will be successful. We started the SQL Server Agent now and then tried to start a job. Now all the jobs are working fine!!!

NOTE: As you now know this was the issue with Client tools of that local machine, so this can happen with any client machine not necessarily with local server were SQL Server is installed, but resolution steps will remain same for that machine where you face this issue.

As you can see from this blog we started troubleshooting SQL Agent job failing with error and ending up troubleshooting Login issue.
Hope this will help you in troubleshooting not only this issue, infact you can follow the similar approach to troubleshoot other login issues as well.

5 responses to “SQL Agent Job reports error “SQLServerAgent is not currently running ” Though Agent Service is running

Subscribe to comments with RSS.

  1. Restart of SQL server fixes the issue…

    • This is the first thing any customer would try before coming to us my friend 🙂 No!! Not only SQL Server if you re-start the whole server it will not fix the issue.

  2. excellent this worked in my case

  3. Pingback: We didn’t know that… | SQLactions.com

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: