Migrating from docker swarm to Service Fabric for windows container orchestration

As of July 2018 Microsoft official support statement for windows containers designating ONLY Azure Service Fabric as supported platform for operation of windows containers. My company has been running windows containers in docker swarm for more then a year now and have to migrate to Service Fabric as a result of this. Below some observations about this process and caveats and limitations I encountered along the way.

What is Azure Service Fabric

Azure Service Fabric despite the name actually downloadable and installable application which can be run anywhere and on any OS (not only Azure). Bigger part of Azure itself runs on Service Fabric (SQL PaaS for example is one of those). Recently Service Fabric also added support for container hosting and orchestration (this is bolted on feature and not something Service Fabric was designed from ground up to support so it’s a little rough around the edges but fully functional and supported nonetheless).

As far as container support in comparison with docker swarm following things are missing or done differently

  1. No ingress routing mesh, so all ports shall be published directly on a host and hence you have to rely on external load balancer for high availability
  2. As result of number 1 you can have only 1 type of container per host. This usually is not a big issue since per microsoft ingress routing mesh is good only for development environment and for production environment direct port mapping on host is prefered
  3. No built in support for secrets, instead configuration files shall be used
  4. No way to have host variables directly passed to container

Things which Azure Service Fabric have which docker swarm does not

  1. Support for Windows Active Directory credentials to authenticate to cluster
  2. UI for administration is built in and support Windows accounts and hence you can use your corporate RBAC for access to cluster
  3. Multitude of environment variables by default piped into container which provides better visibility into underlying host and application environment (for example container hostname is one of those)
  4. Ability to map certificate deployed to container hosts via environment variables passed to a container.
  5. Ability to specify as a part of service fabric docker deployment additional parameters

Azure Service Fabric Deployment

Instructions below are for deploying SF to virtual machines. I use Azure DevTest labs to deploy single domain controller for domain sf.local and have 3 container hosts named containerhost00-02

All container hosts have data drive mapped as F drive where docker-root will be located. This will rely on option provided by option above (custom arguments passed to dockerd) which is not available in docker swarm.

Download and install Azure Service Fabric SDK on your workstation. This will provide neccessary modules for SF administration from your workstation.

To deploy service fabric download Service Fabric package and extract somewhere. I put it on c:\sf, contents of package is described here.

Azure Service Fabric can be secured both with using certificates (similar to docker swarm) as well as Windows Authentication in Active Directory environment (no workgroup scenarios). Steps below targeting service fabric security based off Windows Authentication.

Template JSON for this scenario is located in your package folder and called  ClusterConfig.Windows.MultiMachine.json. Details about all settings are found here.

Settings to modify in template are below

  • name for cluster name
  • nodename which will correspond to UI how each node is represented in cluster
  • ipaddresswhich will be either IP or hostname
  • properties\diagnosticstorecan be either Azure storage account or file share accessible from all SF machines
  • properties\security\windowsidentities\clusteridentity provide option to specify which computer group will be used for intercluster communication
  • properties\security\windowsidentities\clientIdentities provides option to specify which AD groups are used for cluster administration. IsAdmin property specify if this group can make changes to Service Fabric
  • nodetypes specifies which ports are use for communication with a cluster

For my specific scenario I created a computer group called SF-HOSTS and added by SF computer accounts to it (you need to reboot hosts for this to take effect). Created a folder for diagnostics on DC1 and shared it as diagstore. All my container hosts also have F drive and that’s where I’d want to host by Service Fabric binaries as well as docker images. Have to create folders on root of F drive called images for docker images as well as SF to host Service Fabric binaries. Complete template file for this scenario is below

To deploy service fabric first run template verification script and pass path to your template path as a parameter


PS C:\Users\cloudadmin\Documents\sf\artisticcheesecontainer\sf> C:\sf\TestConfiguration.ps1 .\ClusterConfig.Windows.MultiMachine.json
Trace folder already exists. Traces will be written to existing trace folder: C:\Users\cloudadmin\Documents\sf\artisticcheesecontainer\sf\DeploymentTraces
Running Best Practices Analyzer...
Best Practices Analyzer completed successfully.

LocalAdminPrivilege : True
IsJsonValid : True
IsCabValid :
RequiredPortsOpen : True
RemoteRegistryAvailable : True
FirewallAvailable : True
RpcCheckPassed : True
NoConflictingInstallations : True
FabricInstallable : True
DataDrivesAvailable : True
NoDomainController : True
Passed : True

Actual installation can not be started with C:\sf\CreateServiceFabricCluster.ps1 .\ClusterConfig.Windows.MultiMachine.json

If no errors has been thrown then you shall be able to connect to UI for Service Fabric but navigating to any cluster member default port 19080

RemoteDesktopManager64_2018-07-12_09-46-43

To deploy your first container from UI go to Applications and right click on “Actions” button left and choose “compose” application.

You can also deploy from command line using powershell and compose file.

Connect to service fabric endpoint with


PS C:\sf> Connect-ServiceFabricCluster -ConnectionEndpoint localhost:19000 -WindowsCredential
True

 

ConnectionEndpoint : {localhost:19000}
FabricClientSettings : {
ClientFriendlyName : PowerShell-69620b90-99d6-4d85-a4c6-ef8c7a604c88
PartitionLocationCacheLimit : 100000
PartitionLocationCacheBucketCount : 1024
ServiceChangePollInterval : 00:02:00
ConnectionInitializationTimeout : 00:00:02
KeepAliveInterval : 00:00:20
ConnectionIdleTimeout : 00:00:00
HealthOperationTimeout : 00:02:00
HealthReportSendInterval : 00:00:00
HealthReportRetrySendInterval : 00:00:30
NotificationGatewayConnectionTimeout : 00:00:30
NotificationCacheUpdateTimeout : 00:00:30
AuthTokenBufferSize : 4096
}
GatewayInformation : {
NodeAddress : containerhost00.sf.local:19000
NodeId : 85772935593a0315f92e3293832c5fe9
NodeInstanceId : 131758801394347228
NodeName : vm0
}

You can find a status of deployment which was done via UI


PS C:\sf> Get-ServiceFabricComposeDeploymentStatus

DeploymentName : whoami
ApplicationName : fabric:/whoami
ComposeDeploymentStatus : Ready
StatusDetails :

You can start application upgrade via following powershell


PS C:\sf> Start-ServiceFabricComposeDeploymentUpgrade -DeploymentName whoami -Compose C:\users\cloudadmin\Documents\sf\artisticcheesecontainer\sf\docker-stack.yml -Monitored -FailureAction Rollback

DeploymentName : whoami
ComposeFilePaths : {C:\users\cloudadmin\Documents\sf\artisticcheesecontainer\sf\docker-stack.yml}
UpgradeKind : Rolling
ForceRestart : False
UpgradeMode : Monitored
UpgradeReplicaSetCheckTimeout : 49710.06:28:15
FailureAction : Rollback
HealthCheckRetryTimeout : 00:10:00
HealthCheckWaitDuration : 00:00:00
UpgradeDomainTimeout : 10675199.02:48:05.4775807
UpgradeTimeout : 10675199.02:48:05.4775807
ConsiderWarningAsError :
MaxPercentUnhealthyPartitionsPerService :
MaxPercentUnhealthyReplicasPerPartition :
MaxPercentUnhealthyServices :
MaxPercentUnhealthyDeployedApplications :
ServiceTypeHealthPolicyMap :

You can check status of upgrade by executing powershell below


PS C:\sf> Get-ServiceFabricComposeDeploymentUpgrade -DeploymentName whoami

 

DeploymentName : whoami
ApplicationName : fabric:/whoami
TargetApplicationTypeVersion : v8
StartTimestampUtc : 7/12/2018 4:55:11 PM
UpgradeState : RollingForwardCompleted
UpgradeStatusDetails : Deployment upgraded to version: v8.
UpgradeDuration : 00:03:00
CurrentUpgradeDomainDuration : 00:00:00
NextUpgradeDomain :
UpgradeDomainsStatus : { "UD0" = "Completed";
"UD1" = "Completed";
"UD2" = "Completed" }
UpgradeKind : Rolling
UpgradeMode : Monitored
FailureAction : Rollback
ForceRestart : False
UpgradeReplicaSetCheckTimeout : 49710.06:28:15
HealthCheckWaitDuration : 00:00:00
HealthCheckStableDuration : 00:02:00
HealthCheckRetryTimeout : 00:10:00
UpgradeDomainTimeout : 10675199.02:48:05.4775807
UpgradeTimeout : 10675199.02:48:05.4775807
ConsiderWarningAsError :
MaxPercentUnhealthyPartitionsPerService :
MaxPercentUnhealthyReplicasPerPartition :
MaxPercentUnhealthyServices :
MaxPercentUnhealthyDeployedApplications :
ServiceTypeHealthPolicyMap :

It also can be seen in UI

RemoteDesktopManager64_2018-07-12_12-51-17

Advertisements