SQL Server 2008 R2 Cluster instance missing post uninstallation of other instance   Leave a comment


Today I will  talk about a strange problem i came across in SQL Server 2008 R2 instance un-installation on Windows Server 2008/R2 Failover Cluster. This can cause an un-necessary outage, in case if production environment is very critical.

The best way to understand this issue is to take a scenario into consideration, which will help us in understanding the problem better.

Environment:

SQL Server 2008 R2 (RTM/SP1)

Windows Server 2008

Faiover Cluster: 1 or multiple node cluster

SQL Server Instances: 2 or more cluster instances.

Issue Description

Say You have two SQL instances on cluster, SQL1 and SQL2 and are planning to uninstall SQL 2. During the instance selection page if we initially select SQL1(Click Next), comeback and select SQL2.

During the un-installation,

• It deletes “IP Address” and “SQL Group Name” for SQL1 (Ideally it should have done this for SQL2) Hence instance resources/groups will disappear from Failover cluster leaving the SQL instance SQLN1\SQL1 in an un-usable state.

On 2008 servers this results in all the resources of this group (SQL1) to go hidden. We can see them from command prompt using “cluster res”

• At the same time it uninstalls SQLN2\SQL2.  But doesn’t remove the “SQL Group” for this instance from the cluster. Somehow we only remove the SQL agent resource.

The problem starts with setup incorrectly determining the network name and this leads to series of other errors and wrong actions during the un-installation.

Note: you may hit into this situation in normal scenarios as well where by mistake you delete the Cluster service group or IP address resource, since group is missing and all the other resources goes in to offline/hidden state.

Troubleshooting, findings and resolution

This is one of the strange issue we have noticed and the only way to understand this issue is with log files. You can understand this problem with the help of few setup files mentioned below.

1)      From configuration files of the installation

INSTANCEID="SQL1"
ACTION="InstallFailoverCluster"
INSTANCENAME="SQL1"
FAILOVERCLUSTERDISKS="Cluster Disk 3"
FAILOVERCLUSTERGROUP="SQL Server (SQL1)"
FAILOVERCLUSTERNETWORKNAME="SQLN1"
INSTANCEID="SQL2"
ACTION="InstallFailoverCluster"
INSTANCENAME="SQL2"
FAILOVERCLUSTERDISKS="Cluster Disk 5"
FAILOVERCLUSTERGROUP="SQL Server (SQL2)"
FAILOVERCLUSTERNETWORKNAME="SQLN2"

2)      From Summary.txt

User Input Settings:
ACTION:                        RemoveNode
CONFIGURATIONFILE:             C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20121105_1715009\ConfigurationFile.ini
FAILOVERCLUSTERNETWORKNAME:    SQLN1    ==> this should have been SQLN2
INSTANCENAME:                  SQL2

3)      From Configuration file:

;SQLSERVER2008 Configuration File
ACTION="RemoveNode"
INSTANCENAME="SQL2"
FAILOVERCLUSTERNETWORKNAME="SQLN1"   ==> This should have been SQLN2

It is cleared  in the following logs, though we are trying to un-install instance SQL2 (Actual Network name SQLN2), But it incorrectly identify and delete the IP address that belongs to the instance SQL1. As Network name picked up was wrong.

4)  Detail.txt

2012-11-05 17:15:09 Slp: Running Action: as_cluster_ip_address_cluster_unconfig_Cpu64
2012-11-05 17:15:09 Slp: Action Data:
2012-11-05 17:15:09 Slp:   Feature = Analysis_Server_Full_sql_as_Cpu64
2012-11-05 17:15:09 Slp:   Scenario = uninstall
2012-11-05 17:15:09 Slp:   Timing = UnConfigNonRC
2012-11-05 17:15:09 Slp:   ConfigObjectType = Microsoft.SqlServer.Configuration.ClusterConfiguration.ClusterIPAddressPrivateConfigObject
2012-11-05 17:15:09 Slp:   FeatureName = Analysis_Server_Full
2012-11-05 17:15:09 Slp:   FeatureCpuType = Cpu64
2012-11-05 17:15:09 Slp:   FeaturePackageId = sql_as
2012-11-05 17:15:09 Slp:   FeatureClusterState = None
2012-11-05 17:15:09 Slp:   InstallId = Machine[@ID='MANITESTVM']/Product[@ID='SQL2008']/Instance[@ID='SQL2']/Feature[@ID='AS_CTP5' and @RegistryPath='SOFTWARE\Microsoft\Microsoft SQL Server\MSAS10_50.SQL2\Setup' and @RootPath='SOFTWARE\Microsoft\Microsoft SQL Server\MSAS10_50.SQL2']/Install[@ProductCode='0BC5AB24-056C-405D-5682-FDC9CB50408’]
2012-11-05 17:15:10 Slp: Taking the network name resource 'SQL Network Name (SQLN1)' offline
2012-11-05 17:15:10 Slp: Taking cluster resource 'SQL Network Name (SQLN1)' offline
2012-11-05 17:15:20 Slp: Deleting cluster IP resource 'SQL IP Address 1 (SQLN1)'.
2012-11-05 17:15:20 Slp: Removing dependency on resource 'SQL IP Address 1 (SQLN1)' from resource 'SQL Network Name (SQLN1)'
2012-11-05 17:15:20 Slp: Taking cluster resource 'SQL IP Address 1 (SQLN1)' offline
2012-11-05 17:15:20 Slp: Deleting cluster resource 'SQL IP Address 1 (SQLN1)'
2012-11-05 17:15:20 Slp: SlpConfigAction.ExecuteAction(actionId: as_cluster_ip_address_cluster_unconfig_Cpu64) completed
2012-11-05 17:15:20 Slp: Checkpoint: UNINSTALL:UNCONFIGNONRC:ANALYSIS_SERVER_FULL
2012-11-05 17:15:20 Slp: Completed Action: as_cluster_ip_address_cluster_unconfig_Cpu64, returned True
2012-11-05 17:15:20 Slp: Running Action: as_cluster_failover_instance_name_cluster_unconfig_Cpu64
2012-11-05 17:15:20 Slp: Action Data:
2012-11-05 17:15:20 Slp:   Feature = Analysis_Server_Full_sql_as_Cpu64
2012-11-05 17:15:20 Slp:   Scenario = uninstall
2012-11-05 17:15:20 Slp:   Timing = UnConfigNonRC
2012-11-05 17:15:20 Slp:   ConfigObjectType = Microsoft.SqlServer.Configuration.ClusterConfiguration.FailoverClusterNamePrivateConfigObject
2012-11-05 17:15:20 Slp:   FeatureName = Analysis_Server_Full
2012-11-05 17:15:20 Slp:   FeatureCpuType = Cpu64
2012-11-05 17:15:20 Slp:   FeaturePackageId = sql_as
2012-11-05 17:15:20 Slp:   FeatureClusterState = None
2012-11-05 17:15:20 Slp:   InstallId = Machine[@ID='MANITESTVM']/Product[@ID='SQL2008']/Instance[@ID='SQL2']/Feature[@ID='AS_CTP5' and @RegistryPath='SOFTWARE\Microsoft\Microsoft SQL Server\MSAS10_50.SQL2\Setup' and @RootPath='SOFTWARE\Microsoft\Microsoft SQL Server\MSAS10_50.SQL2']/Install[@ProductCode='0BC5AB24-056C-405D-5682-FDC9CB50408’]
2012-11-05 17:15:21 Slp: There are still dependencies on the network name 'SQL Network Name (SQLN1)' so it cannot be deleted.
2012-11-05 17:15:21 Slp: SlpConfigAction.ExecuteAction(actionId: as_cluster_failover_instance_name_cluster_unconfig_Cpu64) completed
2012-11-05 17:15:21 Slp: Checkpoint: UNINSTALL:UNCONFIGNONRC:ANALYSIS_SERVER_FULL
2012-11-05 17:15:21 Slp: Completed Action: as_cluster_failover_instance_name_cluster_unconfig_Cpu64, returned True
2012-11-05 17:15:21 Slp: ---- Flush datastore cache before calling configuration action 'as_cluster_disk_cluster_unconfig_Cpu64'
2012-11-05 17:15:21 Slp: ---- Flush datastore cache before calling configuration action 'as_cluster_disk_cluster_unconfig_Cpu64'
2012-11-05 17:15:21 Slp: ---- Flush datastore cache before calling configuration action 'as_cluster_disk_cluster_unconfig_Cpu64'
2012-11-05 17:15:21 SQLEngine: : The SQL Service resource is in cluster group 'SQL Server (SQL2)' and group 'SQL Server (SQL1)' is being unclustered.  SQL resources will not be unclustered.
Then it removes the incorrect SQL group (for SQL1)
2012-11-05 18:11:10 Slp: Running Action: as_cluster_group_cluster_finalunconfig_Cpu64
2012-11-05 18:11:10 Slp: Action Data:
2012-11-05 18:11:10 Slp:   Feature = Analysis_Server_Full_sql_as_Cpu64
2012-11-05 18:11:10 Slp:   Scenario = uninstall
2012-11-05 18:11:10 Slp:   Timing = Finalize
2012-11-05 18:11:10 Slp:   ConfigObjectType = Microsoft.SqlServer.Configuration.ClusterConfiguration.ClusterGroupPrivateConfigObject
2012-11-05 18:11:10 Slp:   FeatureName = Analysis_Server_Full
2012-11-05 18:11:10 Slp:   FeatureCpuType = Cpu64
2012-11-05 18:11:10 Slp:   FeaturePackageId = sql_as
2012-11-05 18:11:10 Slp:   FeatureClusterState = None
2012-11-05 18:11:10 Slp:   InstallId = Machine[@ID='MANITESTVM']/Product[@ID='SQL2008']/Instance[@ID='SQL2']/Feature[@ID='AS_CTP5' and @RegistryPath='SOFTWARE\Microsoft\Microsoft SQL Server\MSAS10_50.SQL2\Setup' and @RootPath='SOFTWARE\Microsoft\Microsoft SQL Server\MSAS10_50.SQL2']/Install[@ProductCode='0BC5AB24-056C-405D-5682-FDC9CB50408’]
2012-11-05 18:11:10 Slp: Running cluster group 'SQL Server (SQL1)' finalize timing.
2012-11-05 18:11:10 Slp: The cluster group 'SQL Server (SQL1)' is being deleted.
2012-11-05 18:11:10 Slp: Deleting cluster group 'SQL Server (SQL1)'

Above logs shows  group SQL Server (SQL1)  is getting deleted and SQL IP address 1 (SQLN1)resource for instance SQL1 instead of SQL2

Also above mentioned  you can check the status of all the resources by running below command to make sure your other resources are still present and in offline status.

b13

Since we know  now that all the services of this instance are still present and all the installation files and database file (mdf and ldf ) are still on the same location as well,  that confirms this instance is still there only thing is the group is missing from failover cluster and ip address resource got deleted.

This instance is still alive and can be bring back in normal working shape by following below 6 easy steps.

1. Create and empty services with the same name e.g SQL Server (SQL1)

2. Create IP Address resource.

3. Run following command

cluster . res “SQL Network Name (SQL1)” /move:”SQL Server (SQLN1)”

4. All the resources will be moved under this group.

5. Create dependency of IP Address with Network name

6. Bring the resources online.

If you are still not 100% confident, you can follow below steps with screenshots.

Let’s take an example. I have one instance called MSSQLSERVER2K8 for which resources are missing.

You can follow below steps to bring the instance in working condition from Failover Cluster Manager.

1)      Go to Services and application and right click >> More Action >> Create Empty service or Appliaction

b1

2)      Give the name to service exact same as your SQL Server cluster virtual name. in your case it was “SQL Server (MSSQLSERVER2K8)

b2

3)      Now as a next step we would need to create an IP address resource as follwos.

Right click on the services “SQL Server (MSSQLSERVER2K8)”

->Add resource -> More resource ->  3-Add IP Resource

b3

4)      Click on Static IP and enter the IP address for your SQL Server.

b4

Click on Apply ok.

5)      Once the IP resource in place we need to move the hidden resource in the Services and application.

6)      Open the command prompt run as administrator mode. And type following command.

“cluster res  “SQL Network Name (SQLVS4DOM2KCLU) Move: SQL Server (MSSQLSERVER2K8)”

7)      This command will move the hidden resource to SQL Server (ITSQL) group. And the structure on will look like below

b5

8)      Now, Since IP address is always dependent on Network name we need to add the dependency on Network name

Right Click on Network name -> Go to dependency -> Click on insert

b6

9)      Select the IP address resource.

b7

10)   Once dependency is added the structure will look like below.

b8

11)   Now, Bring the Network resource online

Right click on Network name -> Take the resource online

b9

12)   Once network name is online IP will also come online.

b10

13)   Now bring the other remaining resource online and you should be able to connect to SQL Server.

b11

Hope this post help you in resolving such high priority issues which can cause production down situation because of a small error.

Posted December 29, 2012 by Manish Upadhyay in Setup and Upgrade

Tagged with , , ,

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: