SQL 2008 setup rule “Not clustered or the cluster service is up and online” Failed   6 comments


Hi all,

I am back again with another interesting issue.

We had a situation last week where we had to un-install a service pack from one of our production environment as a part of change back out plan.

Though the plan looks very simple as MS has provided a very good functionality of removing it from add and remove programs.

Just go to add and remove programs, select check box “show updates” or got to “installed an update” select the service pack which you want to remove and we are done!!

Unfortunately I was not lucky this time 😦

Lets understand what happened and what went wrong.

Environment:

SQL Server 2008

Windows Server 2003, 2 node failover cluster.

Note: You may hit this issue while trying to install a service pack on SQL Server 2012 or Windows server 2008.

The moment we started the un-installation setup from add remove program, it runs few rules and one of the rule started failing with below error.

1

---------------------------
Rule Check Result
---------------------------
Rule "Not clustered or the cluster service is up and online." failed.
 
The machine is clustered, but the cluster is not online or cannot be accessed from one of its nodes. To continue determine why the cluster is not online and rerun setup instead of rerunning the rule since the rule can no longer detect a cluster environment correctly.
---------------------------
OK  

—————————

Looks easy one, just go by the error message go to both the nodes and see cluster service and if it is not running start it on both the nodes and we are done. I checked both the nodes but I see both the services are running fine perfectly!!

Ok may be in it might be in hung state so I restarted the service on both the nodes once again and tried to un-install once again but no luck this time also.

In such scenario our best help is setup log “detail.txt” located under c:\Program Files\Microsoft SQL Server\100\Setupbootstrap\Log\<dateandtime>

Search for term “at Microsoft”

2013-04-05 23:23:35 Slp: ----------------------------------------------------------------------
2013-04-05 23:23:35 Slp: Running Action: InitializeConfigAction
2013-04-05 23:23:35 Slp: Completed Action: InitializeConfigAction, returned True
2013-04-05 23:23:36 Slp: ----------------------------------------------------------------------
2013-04-05 23:23:36 Slp: Running Action: RunDiscoveryAction
2013-04-05 23:23:36 Slp: Unable to retrieve the Cluster Service
2013-04-05 23:23:36 Slp: System.InvalidOperationException: The given key was not present in the dictionary. ---> System.Collections.Generic.KeyNotFoundException: The given key was not present in the dictionary.
   at System.ThrowHelper.ThrowKeyNotFoundException()
   at System.Collections.Generic.SortedList`2.get_Item(TKey key)
   at Microsoft.SqlServer.Chainer.Infrastructure.ServiceContainer.GetService(Type serviceType)
   --- End of inner exception stack trace ---
   at Microsoft.SqlServer.Chainer.Infrastructure.ServiceContainer.GetService(Type serviceType)
   at Microsoft.SqlServer.Chainer.Infrastructure.ServiceContainer.GetService[T]()
   at Microsoft.SqlServer.Chainer.Infrastructure.ServiceContainer.get_Cluster()
   at Microsoft.SqlServer.Configuration.SetupExtension.RunDiscoveryAction.ExecuteAction(String actionId)
2013-04-05 23:23:36 Slp: Running discovery on local machine
2013-04-05 23:23:38 Slp: Discovery on local machine is complete
2013-04-05 23:23:38 Slp: Completed Action: RunDiscoveryAction, returned True
2013-04-05 23:23:39 Slp: ----------------------------------------------------------------------
2013-04-05 23:23:39 Slp: Running Action: SetInstanceInstallStateAction
2013-04-05 23:23:39 Slp: Skipping SetInstanceInstallStateAction in UI mode because UI will let users select the patch behavior.
2013-04-05 23:23:39 Slp: Completed Action: SetInstanceInstallStateAction, returned True

When started troubleshooting this error, there were many suggestions given on the internet. I ran the procmon as well to make sure there is no other permission related issues. Also there were no errors reported in the application and system event logs as well.

As a best practice I verified below polices and account was added in these policies as well.

Log on as a service

Act as part of the operating system

Back up files and directories

Adjust memory quotas for a process

Increase scheduling priority

Restore files and directories

Finally one of my friend suggested that I need to make sure the password of the account by which i am logged in to the server should be of 15 characters, I cross checked my password and it was 13 characters. Initially I did not trust on him but since I had no other choice then trying this option out. I thought of taking an attempt and changed my password to 15 characters (e.g. Password@123456) and re-login to the server with new password and tried un-installation once again, its hard to believe but it actually worked!!

Since I was not able to understand how this worked in this case and why I am not hitting into this issue on all the other servers, I decided to do some more research and got to know I am actually hitting into this KB article  though it talks about cluster service account password but this is still valid for our case because we had “NoLMHash” policy enabled and solution was mentioned in the method 1.

Method 1: Use a password that is at least 15 characters long

When the NoLMHash policy is set in Active Directory and cannot be disabled because of security considerations, use a password that is at least 15 characters long to prevent the cluster setup wizard from using a LMHash for authentication.

There are other possible solutions you can try:

1. Make sure that the “Cluster Service” account starts and stops successfully on both the nodes.
2. Attempt logging on to the node with the cluster service account and attempt the upgrade.
3. Make sure that the “Cluster Service account” has all the previleges as mentioned.

Log on as a service

Act as part of the operating system

Back up files and directories

Adjust memory quotas for a process

Increase scheduling priority

Restore files and directories

Ref:http://support.microsoft.com//kb/307532

4. Review the application, system and cluster logs for any errors related to “cluster service”
5. Check for other solutions mentioned in this kb:

http://support.microsoft.com/kb/828861

Cluster service account password must be set to 15 or more characters if the NoLMHash policy is enabled

Hope this will help you in resolving such issues.

6 responses to “SQL 2008 setup rule “Not clustered or the cluster service is up and online” Failed

Subscribe to comments with RSS.

  1. thanks for sharing quite unexpected error, hard to guess out at first sight

  2. Thanks Manu!! yeah its was kind of misleading…

  3. Hi Manish,

    Thanks for sharing the same

    i got an error “Rule “Not clustered or the cluster service is up and online.” failed” while upgrading SQL 2008R2 SP1 to SP2 .. (on windows 2008)

    a restart of the node .. fixed the issue for me .. (also my password is less then 15 char)

  4. thank you so much! this helped a lot. We were trying for long to solve this issue and now is working!

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: