Archive for category Windows clustering

Cluster resource change events

When setting up availability groups in SQL Server 2012 we have a new set of Windows cluster resources that appear within our Cluster Management console. These new resources are the availability group name, the availability group listener name and the availability group listener IP address. The last two are present if you make use of the availability group listener functionality. Within SQL Server 2012 Management Studio you have the availability group dashboard to show you the state of the system, which database is the primary replica, which is the secondary, and any transitions in state between primary and secondary. In your own environment you’re probably looking after conventional Windows failover cluster instances of sql server. You’d, in effect, be monitoring two types of dashboard – the AG dashboard and the Windows Server Manager. Given that an availability group is represented as just another clustered resource it’s possible to monitor at the level of the common denominator. That is, just use the Windows Server Manager.

The way you can do generic monitoring for AG events and FCI events is by registering an interest in cluster resource change events. Any change in state of a resource, be it an availability group or a FCI sql server, will result in an event being raised. You can register to such events via WMI and Powershell. Here’s a script that will output a message whenever the state of any clustered resource changes:


$Query = 'SELECT * FROM MSCluster_Event where EventTypeMinor = 256'
$Action = {
$e = $Event.SourceEventArgs.NewEvent
$fmt = '{0}: (EventObjectName="{1}", EventNewState="{2}")'
$msg = $fmt -f $event.TimeGenerated, $e.EventObjectName, $e.EventNewState
Write-host $msg
}
Register-WmiEvent -Namespace root\mscluster -Query $Query -SourceIdentifier ClusterResourceStateChange -Action $Action

# Unregister when you're finished with this script
Unregister-Event -SourceIdentifier ClusterResourceStateChange

In my test lab I have a four node windows cluster housing two failover cluster instances of sql server:

FCI SQL Servers

FCI INSTANCE1 is configured to have nodes 1 and 2 as its failover nodes and FCI INSTANCE2 is allowed to have nodes 3 and 4 as its failover nodes. You can see that I have clustered resource named MyHAGroup. I have created defined an availability group on INSTANCE1 that is configured to have INSTANCE2 as its secondary replica. You can see that from this screenshot:

AG SQL Servers

Now with that Powershell script running let’s see the events that arise purely at the windows cluster resource level when I failover the availability group. The output of the script when I do this is as follows:

13/03/2013 23:58:41: (EventObjectName=”MyHAGroup”, EventNewState=”130″)
13/03/2013 23:58:41: (EventObjectName=”MyHAGroup”, EventNewState=”3″)
13/03/2013 23:58:41: (EventObjectName=”MyHAGroup_SQLHA1″, EventNewState=”130″)
13/03/2013 23:58:41: (EventObjectName=”MyHAGroup_SQLHA1″, EventNewState=”3″)
13/03/2013 23:58:41: (EventObjectName=”MyHAGroup_192.168.81.10″, EventNewState=”129″)
13/03/2013 23:58:41: (EventObjectName=”MyHAGroup_192.168.81.10″, EventNewState=”129″)
13/03/2013 23:58:41: (EventObjectName=”MyHAGroup_192.168.81.10″, EventNewState=”129″)
13/03/2013 23:58:44: (EventObjectName=”MyHAGroup_192.168.81.10″, EventNewState=”2″)
13/03/2013 23:58:44: (EventObjectName=”MyHAGroup_SQLHA1″, EventNewState=”129″)
13/03/2013 23:58:50: (EventObjectName=”MyHAGroup_SQLHA1″, EventNewState=”2″)
13/03/2013 23:58:50: (EventObjectName=”MyHAGroup”, EventNewState=”129″)
13/03/2013 23:58:50: (EventObjectName=”MyHAGroup”, EventNewState=”2″)

The EventNewState values are from the enumeration defined here: http://msdn.microsoft.com/en-us/library/windows/desktop/bb309168%28v=vs.85%29.aspx

ie:

typedef enum CLUSTER_RESOURCE_STATE {
ClusterResourceStateUnknown    = -1,
ClusterResourceInherited       = 0,
ClusterResourceInitializing    = 1,
ClusterResourceOnline          = 2,
ClusterResourceOffline         = 3,
ClusterResourceFailed          = 4,
ClusterResourcePending         = 128, // 0x80
ClusterResourceOnlinePending   = 129, // 0x81
ClusterResourceOfflinePending  = 130 // 0x82
} CLUSTER_RESOURCE_STATE;

So you can see the transition of just the availability group resources from offline pending to online as the database on INSTANCE1 changes to become the secondary and the database on INSTANCE2 changes to become the primary. Notice that there are no change events for any of the FCI instance resources for the INSTANCE1 and INSTANCE2 sql servers. It’s just the availability group that’s failed over.

I’ll now reset so that INSTANCE1 is back to being the primary and then show what happens when instead of doing an AG failover I do an FCI failover of the INSTANCE2 sql server (the secondary replica). The output of the script is as follows:

14/03/2013 00:09:34: (EventObjectName=”SQL Server Agent (INSTANCE2)”, EventNewState=”130″)
14/03/2013 00:09:36: (EventObjectName=”SQL Server Agent (INSTANCE2)”, EventNewState=”3″)
14/03/2013 00:09:36: (EventObjectName=”SQL Server (INSTANCE2)”, EventNewState=”130″)
14/03/2013 00:09:39: (EventObjectName=”SQL Server (INSTANCE2)”, EventNewState=”3″)
14/03/2013 00:09:40: (EventObjectName=”Disk INSTANCE2″, EventNewState=”130″)
14/03/2013 00:09:40: (EventObjectName=”SQL Network Name (SQLVSERVER2)”, EventNewState=”130″)
14/03/2013 00:09:40: (EventObjectName=”SQL Network Name (SQLVSERVER2)”, EventNewState=”3″)
14/03/2013 00:09:40: (EventObjectName=”SQL IP Address 1 (SQLVSERVER2)”, EventNewState=”3″)
14/03/2013 00:09:40: (EventObjectName=”SQL IP Address 1 (SQLVSERVER2)”, EventNewState=”3″)
14/03/2013 00:09:40: (EventObjectName=”Disk INSTANCE2″, EventNewState=”129″)
14/03/2013 00:09:44: (EventObjectName=”SQL IP Address 1 (SQLVSERVER2)”, EventNewState=”2″)
14/03/2013 00:09:44: (EventObjectName=”SQL Network Name (SQLVSERVER2)”, EventNewState=”129″)
14/03/2013 00:09:44: (EventObjectName=”Disk INSTANCE2″, EventNewState=”2″)
14/03/2013 00:09:50: (EventObjectName=”SQL Network Name (SQLVSERVER2)”, EventNewState=”2″)
14/03/2013 00:09:50: (EventObjectName=”SQL Server (INSTANCE2)”, EventNewState=”129″)
14/03/2013 00:09:53: (EventObjectName=”SQL Server (INSTANCE2)”, EventNewState=”2″)
14/03/2013 00:09:53: (EventObjectName=”SQL Server Agent (INSTANCE2)”, EventNewState=”129″)
14/03/2013 00:09:53: (EventObjectName=”SQL Server Agent (INSTANCE2)”, EventNewState=”2″)

I’ve moved the FCI INSTANCE2 from node 2 to node 3. The output confirms this is just a regular failover for a FCI sql server instance. The availability group clustered resources do not change.

Now to finish off the picture I’ll show the events that are raised when FCI INSTANCE1 fails over from node 1 to node 2. This is the instance that houses the primary replica:

14/03/2013 00:14:36: (EventObjectName=”SQL Server Agent (INSTANCE1)”, EventNewState=”130″)
14/03/2013 00:14:37: (EventObjectName=”SQL Server Agent (INSTANCE1)”, EventNewState=”3″)
14/03/2013 00:14:37: (EventObjectName=”SQL Server (INSTANCE1)”, EventNewState=”130″)
14/03/2013 00:14:38: (EventObjectName=”MyHAGroup”, EventNewState=”4″)
14/03/2013 00:14:39: (EventObjectName=”MyHAGroup”, EventNewState=”4″)
14/03/2013 00:14:39: (EventObjectName=”MyHAGroup”, EventNewState=”4″)
14/03/2013 00:14:40: (EventObjectName=”SQL Server (INSTANCE1)”, EventNewState=”3″)
14/03/2013 00:14:40: (EventObjectName=”SQL Network Name (SQLVSERVER1)”, EventNewState=”130″)
14/03/2013 00:14:40: (EventObjectName=”Disk INSTANCE1″, EventNewState=”130″)
14/03/2013 00:14:40: (EventObjectName=”SQL Network Name (SQLVSERVER1)”, EventNewState=”3″)
14/03/2013 00:14:40: (EventObjectName=”SQL IP Address 1 (SQLVSERVER1)”, EventNewState=”3″)
14/03/2013 00:14:40: (EventObjectName=”SQL IP Address 1 (SQLVSERVER1)”, EventNewState=”3″)
14/03/2013 00:14:40: (EventObjectName=”Disk INSTANCE1″, EventNewState=”129″)
14/03/2013 00:14:40: (EventObjectName=”SQL IP Address 1 (SQLVSERVER1)”, EventNewState=”129″)
14/03/2013 00:14:40: (EventObjectName=”Disk INSTANCE1″, EventNewState=”129″)
14/03/2013 00:14:43: (EventObjectName=”SQL IP Address 1 (SQLVSERVER1)”, EventNewState=”2″)
14/03/2013 00:14:43: (EventObjectName=”SQL Network Name (SQLVSERVER1)”, EventNewState=”129″)
14/03/2013 00:14:44: (EventObjectName=”Disk INSTANCE1″, EventNewState=”2″)
14/03/2013 00:14:50: (EventObjectName=”SQL Network Name (SQLVSERVER1)”, EventNewState=”2″)
14/03/2013 00:14:50: (EventObjectName=”SQL Server (INSTANCE1)”, EventNewState=”129″)
14/03/2013 00:14:51: (EventObjectName=”SQL Server (INSTANCE1)”, EventNewState=”2″)
14/03/2013 00:14:51: (EventObjectName=”SQL Server Agent (INSTANCE1)”, EventNewState=”129″)
14/03/2013 00:14:51: (EventObjectName=”MyHAGroup_SQLHA1″, EventNewState=”130″)
14/03/2013 00:14:51: (EventObjectName=”MyHAGroup_SQLHA1″, EventNewState=”3″)
14/03/2013 00:14:51: (EventObjectName=”MyHAGroup_192.168.81.10″, EventNewState=”129″)
14/03/2013 00:14:51: (EventObjectName=”MyHAGroup_192.168.81.10″, EventNewState=”129″)
14/03/2013 00:14:51: (EventObjectName=”MyHAGroup_192.168.81.10″, EventNewState=”129″)
14/03/2013 00:14:52: (EventObjectName=”SQL Server Agent (INSTANCE1)”, EventNewState=”2″)
14/03/2013 00:14:55: (EventObjectName=”MyHAGroup_192.168.81.10″, EventNewState=”2″)
14/03/2013 00:14:55: (EventObjectName=”MyHAGroup_SQLHA1″, EventNewState=”129″)
14/03/2013 00:15:01: (EventObjectName=”MyHAGroup_SQLHA1″, EventNewState=”2″)
14/03/2013 00:15:01: (EventObjectName=”MyHAGroup”, EventNewState=”129″)
14/03/2013 00:15:01: (EventObjectName=”MyHAGroup”, EventNewState=”2″)

From the output you can see that there are state changes for both the AG cluster resources and the FCI cluster resources as they make their way to node 2.

It’s interesting to note these transitions for the complex case of an availability group defined across two failover cluster instances, all by necessity defined on the same windows cluster. The point of this post is that from an alerting point of view if the key aspect you’re looking out for is a failover then you can use the above listed Powershell script as the basis for generic monitoring for both failover cluster instances and availability groups. Register-WMIEvent does take -ComputerName as a parameter so it’s possible to run the monitoring from another machine. The script is not SQL Server specific. It can be used to monitor changes in state for any of your clustered resources. You can modify the action clause of the commandlet to alert in any number of ways.

Leave a comment

Collection of Powershell clustering commands

Here are a collection of Powershell Windows clustering commands that I’ve found to be quite useful:

Start by opening a Powershell console under Administrator rights and running

import-module failoverclusters

And then to see the cluster resources on a cluster named SQLCLUSTER run

get-clusterresource -cluster sqlcluster

pscluster1

With the file share witness we have a node and file share majority quorum configuration. Let’s see where the file share is located:

get-clusterresource -cluster sqlcluster | where-object {$_.ResourceType -like “File Share Witness”} | get-clusterparameter

pscluster2

Visually correlating the virtual server and instance names to form a server\instance name pair is quite difficult. Let’s use this to see if we can do this with script:

get-clusterresource -cluster sqlcluster | where-object {$_.ResourceType -like “SQL Server”} | get-clusterparameter -cluster sqlcluster VirtualServerName,InstanceName

pscluster3

This is better but we can get the name pair by grouping on the first column:

get-clusterresource -cluster sqlcluster | where-object {$_.ResourceType -like “SQL Server”} | get-clusterparameter -cluster sqlcluster VirtualServerName,InstanceName | group-object ClusterObject | select-object @{Name = “SQLInstance”;Expression = {[string]::join(“\”,($_.Group | select-object -expandproperty Value))}}

pscluster4

We’d like to see the nodes that the SQL Servers are on and their current state:

get-clusterresource -cluster sqlcluster | where-object {$_.ResourceType -like “SQL Server”} | foreach-object {($ownernode,$state) = ($_.OwnerNode,$_.State); $_ | get-clusterparameter -cluster sqlcluster -Name VirtualServerName,InstanceName | group-object ClusterObject | select @{Name=”SQLInstance”;Expression = {[string]::join(“\”,($_.Group | select-object -expandproperty Value))}},@{Name=”OwnerNode”;Expression={$ownernode}},@{Name=”State”;Expression={$state}}}

pscluster5

Turning to the availability groups in the first screenshot it seems each one gets its own resource group. Let’s filter just the availability groups and their resources (some may have availability group listeners defined, others may not):

get-clusterresource -cluster sqlcluster -inputobject (get-clusterresource -cluster sqlcluster | where-object {$_.ResourceType -like “SQL Server Availability Group”}).OwnerGroup

pscluster6

Here in the first column you can see the availability group name (MyHAGroup), the listener’s IP address (192.168.81.10) and the listener name (SQLHA1). This matches all the details you would see within the SQL Management Studio under the Availability Group configuration.

These quick one liners may be useful if you’re dealing with multiple SQL instances consolidated onto the same Windows cluster.

3 Comments