SQL Server Agent Service fails to come online on one of the node in failover cluster   9 comments

I came across this issue many times so thought of sharing resolution with you so that it can save some time.

Here is the scenario:

You have SQL Server instance on 2 node cluster, SQL Agent resource is running fine on one node but if you do a fail-over it fails to come online on the other node.

Below are the steps of troubleshooting.

As a troubleshooting step we go and check the event log and event log post error message as follows.

[sqagtres] OnlineThread: ResUtilsStartResourceService failed (status 435)

[sqagtres] StartResourceService: Failed to start SQLSERVERAGENT service.  CurrentState: 1

[sqagtres] OnlineThread: Error 435 bringing resource online.

Not getting any clue with above error message, so thought of checking agent error log (SQLAGENT.OUT) to get some more details about service  but did not find SQLAGENT.OUT file in the Log folder.

Since file itself is not getting generated so started the service from command prompt to get some more information and it will also be helpful to find out if any permission issue is there.

Below is the result when service is started from command prompt.

Microsoft (R) SQLServerAgent 10.50.4000.0
Copyright (C) Microsoft Corporation.
StartServiceCtrlDispatcher failed (error 6)

Error 6
 indicates “The handle is invalid”

We ran procmon. Since  SQLAGENT.OUT is not getting generated and from the error it does not look like we have permission issue as  command prompt is launched from local administrator account, in procmon we analyzed below entry carefully

"1:01:48.8321460 PM","SQLAGENT.EXE","12245","RegQueryValue","HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\
MSSQL10.50.MSSQLSERVER\SQLServerAgent\ErrorLogFile","SUCCESS","Type: REG_SZ, Length: 84, Data: F:\Microsoft SQL Server\MSSQL\Log\"

We noticed Error file name SQLAGENT.OUT is missing in the end. To make it 100% sure we compare highlighted  (highlighted in red above) with the value present in active node and found working node had below entry.

Value in working node: F:\Microsoft SQL Server\MSSQL\Log\SQLAGENT.OUT

Value in problem node :F:\Microsoft SQL Server\MSSQL\Log\

We added this file name (SQLAGENT.OUT) in the registry.

HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLSERVER\SQLServerAgent\ErrorLogFile

New Value

“F:\Microsoft SQL Server\MSSQL\Log\SQLAGENT.OUT”

After that we started the service, service came fine and SQL Agent resource also came online.

Hope this will help you in future!!

9 responses to “SQL Server Agent Service fails to come online on one of the node in failover cluster

Subscribe to comments with RSS.

  1. Yes, it fixed my issue, thanks.

  2. Thanks !!

  3. perfect, thank u so much mate. it helped me fixed my issue as well

  4. Thanks very much, this fix my problem !!

  5. Thanks – this has fixed the issue, but I once the move file(SQLAGENT.OUT) to the new location and then try to restart the service after changing the location in the properties of error log of sql server agent – still it is not restarting…. if possible can you help me

  6. Even a permission issue on the folder which have SQLAGENT.OUT will throw this errors. In my case procmon captured ‘access denied’, I have located the log folder which contain SQLAGENT.OUT. and provided full permission for SQL agent service account and brought the service online.

    Smiju A. Antony

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: