Windows Failover cluster using shared disks in Azure

Creating SQL FC on shared disks

Create new resource group and create 2 Ultra Disks in supported region. First disk is for storing data and second one is quorum drive

$rg = New-AzResourceGroup -Name SharedDisk-RG -Location 'East US'
$datadiskconfig = New-AzDiskConfig -Location $rg.Location -DiskSizeGB 1024 -AccountType UltraSSD_LRS -CreateOption Empty -DiskIOPSReadWrite 2000 -DiskMBpsReadWrite 200 -DiskIOPSReadOnly 100 -DiskMBpsReadOnly 1 -MaxSharesCount 5 -Zone 1
New-AzDisk -ResourceGroupName $rg.ResourceGroupName -DiskName SharedDisk -Disk $datadiskConfig
$datadiskconfig = New-AzDiskConfig -Location $rg.Location -DiskSizeGB 8 -AccountType UltraSSD_LRS -CreateOption Empty -DiskIOPSReadWrite 2000 -DiskMBpsReadWrite 200 -DiskIOPSReadOnly 100 -DiskMBpsReadOnly 1 -MaxSharesCount 5 -Zone 1
New-AzDisk -ResourceGroupName $rg.ResourceGroupName -DiskName QuorumDrive -Disk $datadiskConfig

Create 2 VMs using image SQL Server 2019 on Windows 2019. I used SQL 2019 free Developer license for demo purposes. Compatible VM sizes with ultra disk are available here. I used D8s_v3. VM shall be deployed to availability zone which supports UltraDisk. You can find zone number by executing following. VMs needs to be joined to AD once they are provisioned. It’s requirements for all nodes to be in AD for FCI to work

PS /home/azureuser> $region = "eastus"
PS /home/azureuser> $vmSize = "Standard_D8s_v3"
PS /home/azureuser> (Get-AzComputeResourceSku | where {$_.Locations.Contains($region) -and ($_.Name -eq $vmSize) -and $_.LocationInfo[0].ZoneDetails.Count -gt 0})[0].LocationInfo[0].ZoneDetails

Name      Capabilities
----      ------------
{1, 3, 2} {UltraSSDAvailable}

Attach 2 ultradisks as data disk and quorum drive to both created VMs by stopping it and enabling Ultra disk functionality and adding both shared disks.

Login to VMs and install failover clustering and administration tools on both nodes

PS C:\Users\cloudadmin> Install-WindowsFeature Failover-Clustering, RSAT-Clustering

Success Restart Needed Exit Code      Feature Result
------- -------------- ---------      --------------
True    Yes            SuccessRest... {Remote Server Administration Tools, Failo...
WARNING: You must restart this server to finish the installation process.

Create failover cluster by adding 2 nodes to it

Open disk management and create volume/format your shareddisk and quorum drive. Add disks to Storage/Disks in cluster manager.

Configure disk quorum for cluster and choose quorum disk as cluster setting

Launch SQL 2019 installation from installation media and choose “New SQL server failover cluster installation”

Go through wizard installation and choose appropriate resources

Configure cluster IP to be any static IP address from VNET

Configure data directories to be on shareddisk

Go to SQL Server Configuration manager and enable IP connectivity on both dedicated and cluster server IP addresses, restart SQL service

SQL server can be accessed by either VNN (virtual network name) or DNN (distributed network name). Difference between those are documented here (link). DNN is new offer which available only on certain versions of OS and SQL server. It essentially registers IP addresses of both nodes in DNS and informs client to choose currently active node. No need for internal load balancer. To create DNN for my example I used powershell below. DNN resource name is sqldnnresource and DNS name is sqldnn.

PS C:\Users\cloudadmin.SQL> Add-ClusterResource -Name sqldnnresource -ResourceType "Distributed Network Name" -Group "SQL Server"

Name           State   OwnerGroup ResourceType
----           -----   ---------- ------------
sqldnnresource Offline SQL Server Distributed Network Name


PS C:\Users\cloudadmin.SQL> Get-ClusterResource -Name sqldnnresource | Set-ClusterParameter -Name DNSName -Value sqldnn
PS C:\Users\cloudadmin.SQL> Start-ClusterResource -Name sqldnnresource

Name           State  OwnerGroup ResourceType
----           -----  ---------- ------------
sqldnnresource Online SQL Server Distributed Network Name

Once resource is started you can verify that resource in fact registered DNS name and it shall return IP addresses of both nodes of the cluster

PS C:\Users\cloudadmin.SQL> resolve-dnsname sqldnn

Name                                           Type   TTL   Section    IPAddress
----                                           ----   ---   -------    ---------
sqldnn.sql.local                               A      1200  Answer     10.0.0.5
sqldnn.sql.local                               A      1200  Answer     10.0.0.4

Go to second server and choose “Add a node to SQL Server Failover Cluster” as installation procedure and finish adding second node to SQL cluster created earlier.

At this point you shall have a following:

  1. Windows Failover Cluster
  2. SQL Server failover cluster
  3. 2 shared disks (one used as a data drive and another one as a quorum)
  4. DNN (distributed network name) which points to all nodes in your cluster

Testing SQL FC

Following powershell script will help to test functionality of failover. Run the script and it will output current owner of SQL Cluster. You can test failover in Failover cluster manager by moving cluster role to a different node.

import-module sqlserver
function QueryDB{
$mirrorConnString = "Server=sqldnn;MultiSubnetFailover=True;user id=sa;password=A1234567890!;Connection Timeout=3"
try {
$record = (Invoke-Sqlcmd -Query "SELECT SERVERPROPERTY('ComputerNamePhysicalNetBIOS')" -ConnectionString $mirrorConnString -AbortOnError:$false -ErrorVariable err -ErrorAction SilentlyContinue).column1
}
catch
{
$record = "No response"
}

"{0} {1}" -f $(Get-Date), $record
}
while ($true)
{
QueryDB
start-sleep 1
}

Output would be similar to below. You can see downtime usually less then 30 seconds.

PS C:\Users\cloudadmin> script.ps1
8/13/2020 5:53:45 PM SQL1
8/13/2020 5:53:46 PM SQL1
8/13/2020 5:53:47 PM SQL1
8/13/2020 5:53:53 PM No response
8/13/2020 5:53:54 PM No response
8/13/2020 5:53:55 PM No response
8/13/2020 5:53:56 PM No response
8/13/2020 5:53:57 PM No response
8/13/2020 5:54:01 PM SQL2
8/13/2020 5:54:02 PM SQL2
8/13/2020 5:54:03 PM SQL2
8/13/2020 5:54:04 PM SQL2
8/13/2020 5:54:05 PM SQL2

Ultimate test though is to abruptly end cluster resource instead of scheduled failover. You can use notmyfault.exe to crash your server.

Downtime for unexpected shutdown is slightly longer (around 1 min)

One thought on “Windows Failover cluster using shared disks in Azure

Leave a comment