Onboarding Windows nodes to Kubernetes cluster

Below are step by step instructions how to onboard Windows nodes to Kubernetes cluster. For cluster master I used Ubuntu 18.04 (Kubernetes control plane is still UNIX only setup and probably will stay forever this way). For Windows worker nodes I used Windows Server 1909 images (but any version of Windows 2019 and up can be used instead. I run my cluster in Azure but did not use Azure CNI so steps can be replicated with on-prem clusters as well.

Install single control-plane cluster

  1. Create Ubuntu VM in Azure and download Kubernetes binaries required for installation of control plane. I will use kubeadm tool both for settings up cluster as well as onboarding Windows nodes to cluster (master1 server).
  2. Install docker on master1 server ()
  3. Flannel POD network plugin will be used for PODs and hence additional parameters should be passed to kubeadm tool (--pod-network-cidr=10.244.0.0/16). Run on master1 sysctl net.bridge.bridge-nf-call-iptables=1
  4. Initialize single control-plane cluster by running kubeadm init --pod-network-cidr=10.244.0.0/16 on master1 node
  5. Copy last line from installation for joining nodes to cluster. In my case it’s (kubeadm join 10.0.0.4:6443 --token k54f1t.5rr385g1upol2njr --discovery-token-ca-cert-hash sha256:a4994328cc8b51386101983a4f860cbd08de95c56e7714b252b6ea7d13cf6d9d)
  6. Execute following to copy config file for kubectl to access your cluster
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
  1. Install flannel POD network plugin (kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml)
  2. Verify that your cluster is healthy by executing (kubectl get nodes). Your master node shall read as Ready
  3. Follow instructions here to configure flannel to allow Windows nodes to join

Add Windows nodes

  1. Nodes need to be able to talk to each other by name so make sure DNS works. If you are in Azure you can setup private DNS zone and associate it with Virtual Network and enabled Auto-Registration.
PS C:\Users\cloudadmin> resolve-dnsname master1.kubernetes.my

Name                                           Type   TTL   Section    IPAddress
----                                           ----   ---   -------    ---------
master1.kubernetes.my                          A      10    Answer     10.0.0.4
Azure private DNS registration
  1. Install Windows version 2019+. I use image of Windows Server 1909 with containers from Azure marketplace. It shall automatically register its name with private zone.
    Set default DNS suffix to be your private zone name (kubernetes.my for me)
    Set-DnsClientGlobalSetting -SuffixSearchList "kubernetes.my"
  2. Download Windows kubernetes tools and expand to local folder.
Invoke-WebRequest https://github.com/kubernetes-sigs/sig-windows-tools/archive/master.zip -OutFile master.zip
Expand-Archive .\master.zip -DestinationPath .
  1. Modify file called Kubeclustervxlan.json under (sig-windows-tools-master\kubeadm\v1.15.0) . Values for object called ControlPlane shall be modified to point to your master1 server and use token which was copied earlier. Change username to username you use on master1 node as well. Also make sure your default Ethernet adapter is in fact called Ethernet (Get-NetAdapter). If it’s not then modify line in file "InterfaceName":"Ethernet" to whatever name adapter is. Modify Source object to point to the same version of kubernetes as the master1 node is running. Modify CRI item in configuration file to change Pause image to multi-arch image as below since default pause image does not support 1909 base OS. My complete file is below, modify with your relevant entries
 {
    "Cri" : {
        "Name" : "dockerd",
        "Images" : {
            "Pause" : "mcr.microsoft.com/oss/kubernetes/pause:1.3.0",
            "Nanoserver" : "mcr.microsoft.com/windows/nanoserver:1809",
            "ServerCore" : "mcr.microsoft.com/windows/servercore:ltsc2019"
        }
    },
    "Cni" : {
        "Name" : "flannel",
        "Source" : [{ 
            "Name" : "flanneld",
            "Url" : "https://github.com/coreos/flannel/releases/download/v0.11.0/flanneld.exe"
            }
        ],
        "Plugin" : {
            "Name": "vxlan"
        },
        "InterfaceName" : "Ethernet 2"
    },
    "Kubernetes" : {
        "Source" : {
            "Release" : "1.17.4",
            "Url" : "https://dl.k8s.io/v1.17.4/kubernetes-node-windows-amd64.tar.gz"
        },
        "ControlPlane" : {
            "IpAddress" : "master1",
            "Username" : "gregory",
            "KubeadmToken" : "c5pi79.39te6ro1fnufx5jt",
            "KubeadmCAHash" : "sha256:a4994328cc8b51386101983a4f860cbd08de95c56e7714b252b6ea7d13cf6d9d"
        },
        "KubeProxy" : {
            "Gates" : "WinOverlay=true"
        },
        "Network" : {
            "ServiceCidr" : "10.96.0.0/12",
            "ClusterCidr" : "10.244.0.0/16"
        }
    },
    "Install" : {
        "Destination" : "C:\\ProgramData\\Kubernetes"
    }
}
  1. Execute powershell script under kubeadm folder and pass location of modified configuration file .\KubeCluster.ps1 -ConfigFile .\v1.15.0\Kubeclustervxlan.json -install
  2. Open generated public key of SSH cert (called id_rsa.pub under .ssh folder) and copy it contents. Add this contents to file called .ssh/authorized_keys on master1 node.
  3. Reboot computer after successful install
  4. Once computer comes back execute script again now with -join parameter to join node to a cluster .\KubeCluster.ps1 -ConfigFile .\v1.15.0\Kubeclustervxlan.json -join
  5. If everything went with no errors you shall see node joined to K8 cluster and be in Ready state
root@master1:~# k get nodes -o wide
NAME         STATUS   ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                    KERNEL-VERSION     CONTAINER-RUNTIME
master1      Ready    master   142m    v1.17.4   10.0.0.4      <none>        Ubuntu 18.04.4 LTS          5.0.0-1032-azure   docker://19.3.6
winworker1   Ready    <none>   2m20s   v1.17.4   10.0.0.5      <none>        Windows Server Datacenter   10.0.18363.720     docker://19.3.5


10. You can schedule windows containers now and verify they work. Example below creates deployment with 2 pods which outputs random numbers to STDOUT

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: win-webserver
  name: win-webserver
spec:
  replicas: 2
  selector:
    matchLabels:
      app: win-webserver
  template:
    metadata:
      labels:
        app: win-webserver
      name: win-webserver
    spec:
      containers:
      - command:
        - powershell.exe
        - -command
        - while ($true) { "[{0}] [{2}] {1}" -f (Get-Date),(Get-Random),$env:COMPUTERNAME;
          Start-Sleep 5}
        image: mcr.microsoft.com/windows/servercore:1909
        imagePullPolicy: IfNotPresent
        name: windowswebserver
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      nodeSelector:
        beta.kubernetes.io/os: windows
      restartPolicy: Always
status: {}
PS C:\Users\cloudadmin> kubectl get pods
NAME                            READY   STATUS    RESTARTS   AGE
win-webserver-fffd4486f-cmdgx   1/1     Running   0          34m
win-webserver-fffd4486f-rp96t   1/1     Running   0          34m
PS C:\Users\cloudadmin> kubectl logs win-webserver-fffd4486f-cmdgx
[3/25/2020 12:48:07 AM] [WIN-WEBSERVER-F] 1105704259
[3/25/2020 12:48:12 AM] [WIN-WEBSERVER-F] 356015894
[3/25/2020 12:48:17 AM] [WIN-WEBSERVER-F] 1136900039
[3/25/2020 12:48:22 AM] [WIN-WEBSERVER-F] 111352898
[3/25/2020 12:48:27 AM] [WIN-WEBSERVER-F] 593146587
[3/25/2020 12:48:32 AM] [WIN-WEBSERVER-F] 1438304716
[3/25/2020 12:48:37 AM] [WIN-WEBSERVER-F] 1357778278

Azure DevOps as workflow automation for service management

Azure DevOps makes a good use case for situations where you need workflow management service for common tasks required by service management process. Example below showcases process of setting up workflow for Rename VM hypothetical task requested by service management tool.

Scenario which is being automated is request to rename VM in Azure which is currently unsupported by native control pane and require set of manual/semi-automated execution by personnel.

Entire process is documented in detailed below. Basic steps are

  • Run powershell to export current VM to a file
  • Delete original VM
  • Verify validity of generated template
  • Deploy template

Powershell

Traditionally rename VM tasks are accomplished by removing original VM while preserving original disks and NIC and then recreating new VM as close as possible to original one. This approach is suboptimal since a lot of original metadata about original VM is lost (for example host caching for disks, tags, extensions etc). Approach being taken below instead relies on pulling current resource schema for VM (ARM template) and redeploy it with new name. Highlighted lines below are required to account for situations when VM was created from market place image. Output of powershell will be template file with sanitized inputs to be recreated with custom name

[CmdletBinding()]
param (
      [Parameter(Mandatory = $true)] [string] $vmName,
      [Parameter(Mandatory = $true)] [string] $resourceGroupName,
      [Parameter(Mandatory = $true)] [string] $newVMName
)
$ErrorActionPreference = "Stop"
$resource = Get-AzVM -ResourceGroupName $resourceGroupName -VMName $vmName 
Export-AzResourceGroup -ResourceGroupName $resource.ResourceGroupName -Resource $resource.Id -IncludeParameterDefaultValue -IncludeComments -Path .\template.json -Force
$resource | Stop-AzVM -Force
$resource | Remove-AzVM -Force
$templateTextFile = [System.IO.File]::ReadAllText(".\template.json")
$TemplateObject = ConvertFrom-Json $templateTextFile -AsHashtable
$TemplateObject.resources.properties.storageProfile.osDisk.createOption = "Attach"
$TemplateObject.resources.properties.storageProfile.Remove("imageReference")
$TemplateObject.resources.properties.storageProfile.osDisk.Remove("name")
$TemplateObject.resources.properties.Remove("osProfile")
$TemplateObject | ConvertTo-Json -Depth 50 | Out-File (".\template.json")

Azure DevOps

Create classic build pipeline (until Yaml build pipeline allow UI editing I would personally stay away from them).

  • Add following variables (vmName, newVMName, resourceGroupName) to build pipeline which will identify VM name, new VMName, resource group name for VM being worked on. Allow setting of those variable at queue time.
  • Add Azure powershell task to execute powershell file script mentioned above and pass parameters set above to it and make sure it’s set as Powershell core

Add Azure Resource Group Deployment task to verify validity of generated template. Please note highlighted parameters below.

  • Add another Azure Resource Group Deployment task to perform actual rename. Settings are the same as previous step, just deployment mode shall be set to Incremental

This shall complete Build pipeline. You can test it manually by providing values for 3 parameters directly from Azure DevOps UI.

Integration with service management

Azure DevOps provides REST API to perform actions against service. Documentation available here.

To call API you need to generate PAT token first for your or service account by going to Azure DevOps and choosing PAT. The only permission need is Build - Read & Execute

To invoke build via API one have to call URI similar to following (https://dev.azure.com/artisticcheese/blog/_apis/build/builds?api-version=5.1) Below is POST contents of the body of request identifying build by number and parameters which will be passed to build at queue time.

{
"definition":
{
	"id":16
},
"parameters": "{\"vmName\": \"VM1\",	\"newVMName\": \"VM2\",	\"resourceGroupName\": \"temp\"}"
}

Response of build request would contain link to get status of the build as well which front-end service can call to get status of the build

Azure Private Link in action

Azure networking team just introduced preview of Azure Private Link (https://azure.microsoft.com/en-us/blog/announcing-azure-private-link/). It promises to bring functionality previously unavailable for bridging gap in networking between PaaS and VNETs as well as between VNETs in different tenants/subscriptions.

There are 2 distinctive use cases for Private Link:

  1. Private Link for accessing Azure PaaS Services
  2. Private Link to Private Link Service connection for connectivity across tenants and subscriptions and even overlapping IP address across VNETs

Private Link for accessing Azure PaaS Services

Traditionally if you wanted to access PaaS services securely within VNET you’d need enable VNET service endpoint which will in turn enable routing of requests from within your VNET directly to your PaaS service. PaaS will see your requests coming from private IP range of your VNET as opposed public IP address before the enablement. You still go through public IP of PaaS service though as a result, just not route through edge.

Private Link solution creates endpoint with local IP address on your subnet through which you can access your PaaS service. You will in fact see Network Interface resource being created with associated IP address once your enable this resource.

It will be similar to reverse NAT from networking point of view.

Example is below where I created storage account called privatelinkMSDN which does not have integration into VNETs so by default it will deny all connections to blobs externally or internally.

Accessing blob externally will produce HTTP error as expected due to IP filtering on storage account.

Trying to resolve name externally produces external IP address of service

PS C:\Users\174181> resolve-dnsname privatelinkmsdn.privatelink.blob.core.windows.net -Type A                                                                                                                                                                                                                                                                                                                                                                     

Name                           Type   TTL   Section    NameHost                                                                                                                                                                  ----                           ----   ---   -------    --------
                                                                                                                                                              privatelinkmsdn.privatelink.blob.core.windows.net CNAME  53    Answer     blob.bl5prdstr09a.store.core.windows.net
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Name       : blob.bl5prdstr09a.store.core.windows.net                                                                                                                                                                          QueryType  : A                                                                                                                                                                                                                  TTL        : 52                                                                                                                                                                                                                Section    : Answer                                                                                                                                                                                                             IP4Address : 40.71.240.16                                                                                                                                                                                                                                                        

Creating of Private Endpoint is not covered here since it’s well documented at Microsoft. End result is shown below. Following resources are created as result of creation of Private Endpoint:

  1. DNS zone named as privatelink.blob.core.windows.net with record pointing to your Private Endpoint
  2. Private Endpoint itself
  3. Network Interface resource associated with Private Endpoint
  4. Private IP address associated with Network Interface

While externally this URL resolves to external IP address, resolving the same name within VNET delegates resolution to private DNS zone and provides internet IP address of NIC card and hence provides access to image in blob as expected.

PS C:\Users\cloudadmin> resolve-dnsname privatelinkmsdn.privatelink.blob.core.windows.net -Type A                                                                                             
Name                                           Type   TTL   Section    IPAddress
----                                           ----   ---   -------    ---------
privatelinkmsdn.privatelink.blob.core.windows. A      1800  Answer     10.1.0.4

Private Link Service connection

Initial Configuration I’m working with is described below

  1. Azure Tenant 1 (suvalian.com) which is associated with Subscription 2. This will be hypothetical ISV customer which provides services to tenant 2 below (like VDI for example). Subscription 1 contains VNET called MSDN-VNET with 10.1.0.0/16 address space.
  2. Azure tenant 2 (nttdata.com) which is associated with Subscription 2. This is customer who would like to privately connect to your services. Subscription 2 contains VNET called NTT-VNET with 10.1.0.0/16 address space (please note it’s the same address space as VNET in Subscription 1)

There is no trust between 2 tenants (that is there no guest accounts in either directory from other directory), so essentially it’s completely separate Azure Environments.

Traditionally to connect from Azure 2 to Azure 1 you’d have to either:

  1. Expose your services via public IP address with restrictive NSG rules on it (poor security and additional cost due to ingress traffic charges)
  2. Create VNET to VNET connectivity via VPN gateway (costly, can not have overlapped IP address space, cumbersome to setup and administer)
  3. Create VNET peering between VNETS (can not have overlapped IP address space)

Solution consists of parts depicted on image below:

In Subsription 2 you create:

  • Private Link Service (PLS) which will be used as endpoint connection target for your customers
  • Network Interface resource with IP addresses which will be used for NAT (10.1.2.5)
  • Standard Load balancer with load balancing rule
  • Backend pool with IIS (10.1.1.4) which you want to provide access to your customer

In Subscription 1 you create

  • Private Endpoint which will connect to PLS in Subscription 2
  • Network Interface with IP (10.1.0.4) which will be used for connectivity to PLS

Client 1 living in Subscription 1 can connect to IIS resource in Subscription 2 via IP of 10.1.0.4. IIS is configured to respond with information about client connecting to it. Opening web page on 10.1.0.4 serves page from IIS web server identifying that HTTP connection originates from 10.1.2.5

PS C:\Users\cloudadmin> (Invoke-WebRequest http://10.1.0.4/).Content
REMOTE_ADDR 10.1.2.5

Azure lighthouse vs guest tenant management

Traditionally if you have to manage customers environment you had 2 choices:

  1. Ask customer to add your account from your tenant as guest user to their Azure Active Directory and assign specific RBAC roles afterwards on resources
  2. Customer would have to create an account for you in their tenant. You’d have to maintain 2 different username/passwords as a result and perform logon/logoff in management for each tenant

Traditional approach

For demo purposes following are initial input parameters:

  • MSDN subscription called “Customer Subscription” ( 8211cd03-4f97-4ee6-af42-38cad1387992) in “suvalian.com” tenant (c0de79f3-23e2-4f18-989e-d173e1d403d6).
  • I want to manage this subscription from my main tenant nttdata.com with account 174181@nttdata.com
  • Add your account ID into Role in customers subscription
  • Email will be dispatched with invitation and require me to accept via following link
  • Once invitation is accepted I can see new tenant is available for me to switch to in portal
  • Switching to tenant allows me to view managed subscription

Problems with traditional approach:

  1. Requires end user interaction to accept invitation to manage customers environment
  2. Can only invite individual team members and not groups
  3. Partner has to switch between tenants to manage their environment (can not see for example all VMs from all managed tenants) or execute single Azure Automation RunBook across all tenants
  4. Customer have to deal with user lifecycle management, that is remove user or add user anytime something happens on partner side

Lighthouse approach

New way of managing this process is outlined below.

You can onboard customer either through Azure Marketplace or ARM deployment. I will be using ARM deployment below since one have to be Azure MSP partener to publish to marketplace.

JSON files for this post located here.

You need to gather following information before onboarding a customer

  1. Tenant ID of your MSP Azure AD
  2. Principal ID of your MSP Azure AD group
  3. Role Definition ID which is set by Azure and available here

For my specific requirements values are below: role definitinon ID is Contributor which has ID of b24988ac-6180-42a0-ab88-20f7382dd24c, Group ID e361eaed-1a02-4b06-9e12-04417f6e2a46 from tenant 65e4e06f-f263-4c1f-becb-90deb8c2d9ff

{
      "$schema": "https://schema.management.azure.com/schemas/2018-05-01/subscriptionDeploymentParameters.json#",
      "contentVersion": "1.0.0.0",
      "parameters": {
            "mspName": {
                  "value": "NTTData Consulting"
            },
            "mspOfferDescription": {
                  "value": "Managed Services"
            },
            "managedByTenantId": {
                  "value": "65e4e06f-f263-4c1f-becb-90deb8c2d9ff"
            },
            "authorizations": {
                  "value": [
                        {
                              "principalId": "e361eaed-1a02-4b06-9e12-04417f6e2a46",
                              "principalIdDisplayName": "Hyperscale Team",
                              "roleDefinitionId": "b24988ac-6180-42a0-ab88-20f7382dd24c"
                        }
                  ]
            }
      }
}

I deploy from cloudshell since it’s already correctly logs me into correct tenant. Switch to correct subscription before running ARM deployments

PS /home/gregory> Select-AzSubscription -SubscriptionId 8211cd03-4f97-4ee6-af42-38cad1387992

Name                                     Account                                         SubscriptionName                               Environment                                    TenantId
----                                     -------                                         ----------------                               -----------                                    --------
Customer Subscription (8211cd03-4f97-4e… MSI@50342                                       Customer Subscription                          AzureCloud                                     fb172512-c74c-4f0d-bb83-3e70586312d5

PS /home/gregory> New-AzDeployment -Name "MSP" -Location 'Central US' -TemplateFile ./template.json -TemplateParameterFile ./template.parameters.json
DeploymentName          : MSP
Location                : centralus
ProvisioningState       : Succeeded
Timestamp               : 9/3/19 3:24:26 PM
Mode                    : Incremental
TemplateLink            :
Parameters              :
                          Name                   Type                       Value
                          =====================  =========================  ==========
                          mspName                String                     NTTData Consulting
                          mspOfferDescription    String                     Managed Services
                          managedByTenantId      String                     65e4e06f-f263-4c1f-becb-90deb8c2d9ff
                          authorizations         Array                      [
                            {
                              "principalId": "e361eaed-1a02-4b06-9e12-04417f6e2a46",
                              "principalIdDisplayName": "Hyperscale Team",
                              "roleDefinitionId": "b24988ac-6180-42a0-ab88-20f7382dd24c"
                            }
                          ]

Outputs                 :
                          Name              Type                       Value
                          ================  =========================  ==========
                          mspName           String                     Managed by NTTData Consulting
                          authorizations    Array                      [
                            {
                              "principalId": "e361eaed-1a02-4b06-9e12-04417f6e2a46",
                              "principalIdDisplayName": "Hyperscale Team",
                              "roleDefinitionId": "b24988ac-6180-42a0-ab88-20f7382dd24c"
                            }
                          ]

DeploymentDebugLogLevel :

Login to your customer environment and check that you see now “NTTData Consulting” in service providers

Now if you want to add additional access (like accessing second subscription) you can do it right from portal without need for ARM deployment. For example below I’m adding access to specific resource group in separate subscription to be managed by MSP.

In my MSP panel I can now see both access to entire subscription and access to specific resource group in another

You shall be able to see resources in portal just like if your account was part of customers tenant

For example I added tags to existing storage account and it appears as I was guest account in customers AD.