Automation at scale in Azure with Powershell Azure functions

Code for article below is located at https://github.com/artisticcheese/artisticcheesecontainer/tree/master/MetadataFunction

My current task was to execute certain script within big number of VMs (700+) on periodic schedule to pull Metadata information from Azure dataplane ( https://docs.microsoft.com/en-us/azure/virtual-machines/windows/instance-metadata-service ). This data is available ONLY within running VM and there is no way to access it any other way. Specifically data about ScheduledEvents ( https://docs.microsoft.com/en-us/azure/virtual-machines/windows/scheduled-events ) which informs VM if Azure initiated reboot is pending in one way or another (detailed info at https://docs.microsoft.com/en-us/azure/virtual-machines/windows/scheduled-events#query-for-events)

Microsoft provides solution called “Azure Scheduled Events Service” ( https://github.com/microsoft/AzureScheduledEventsService ) which has severe drawbacks. Namely:

  1. You have to download and install service on all machines
  2. It relies on Invoke-RestMethod cmdlet to query metadata services and hence not supported powershell 2.0 and hence by default will not run on Windows 2008
  3. It only runs on Windows obviously so none of UNIX machines will be covered
  4. It logs data into local Application Log which is completely useless since now you have to figure out how to centralize and query this information
  5. There is no centralized alerting on those events as result of point 4 above

My solution which is outlined below is relying on Azure Resources to install/maintain/query/alert on health events without the need for dedicated agents.

Solution consists of following moving parts

  1. Azure Powershell function
  2. Azure Storage Queue
  3. Azure Log Analytics Account
  4. Azure monitor

General flow is below

Azure powershell function executed on timer or via HTTP request which is populates storage queue with all VM names in subscriptions, their resource group and powerstate of Machine

Azure App Service where powershell function is hosted on has a scale out condition to jump to 8 instances upon seeing storage queue being populated which in return provides around 160 concurrently executing workers

Second Azure powershell function is bound to storage queue and spins up upon presence of queue messages. It reads queue message, pulls VM and check it’s operating system version and based on that executes either shell or powershell script to pull metadata service via Invoke-AzVMRunCommand

Upon success or error script write to LogAnalytics workspace data being returned

Azure monitor is setup to act upon Azure Log Analytics query.

Details

Create Function App which will host 2 functions mentioned above. Example is below. Don’t use consumption plan since it does not scale well with powershell and choose at least S2 size since you will be able to use multiprocessor capabilities to scale out locally and in addition to scale app service out based on queue as well.

Go to storage account which was created and create 2 queues to hold messages and message rejects (poison).

Copy storage account connection string from this storage account, this will be required for function setup

Create Log Analytics workspace to hold messages

Record values of WorkspaceID as well as primary key to be used later in function

Update local.settings.json in your Function folder to contain settings you copied earlier. Mine example is below

{
  "IsEncrypted": false,
  "Values": {
    "AzureWebJobsStorage": "DefaultEndpointsProtocol=https;AccountName=mymetadatafuncta57d;AccountKey=9/jxdL3jdsrKED+ddQHByebGkzozxiLHrNeRUrvGWhO8//dzGm9m184n0VymQBTBlkfzIPkbx1+nTSXA/6HlZQ==",
    "FUNCTIONS_WORKER_RUNTIME": "powershell",
    "LogAnalyticsWorkspaceID": "02f2eb14-85d2-4069-9a1a-6b8cd91d783c",
    "LogAnalyticsSharedKey": "D0P2Z9D4U3k8xJFLzBnLg/Ns3oyEsEj4ivVxq5buGQN5BtYND/nleWGfrsc5SD6wajW/SbtqpvvgWCjQCfPdlw==",
    "QueueName": "metadataservicequeue",
    "FUNCTIONS_EXTENSION_VERSION": "~2",
    "WEBSITE_NODE_DEFAULT_VERSION": "10.14.1"
  }
}

Deploy function to Azure from VSCode

Once function is deployed try to execute PopulateQueueWithVmNamesHTTP. You are expected to see failure since Function shall not be having necessary permissions to access Azure resources.

2019-08-20T21:07:27.528 [Information] INFORMATION: getting Queue Account info
2019-08-20T21:07:28.062 [Information] INFORMATION: getting all VM Account info
2019-08-20T21:07:29.804 [Error] ERROR: No account found in the context. Please login using Connect-AzAccount.
Microsoft.Azure.WebJobs.Script.Rpc.RpcException : Result: ERROR: No account found in the context. Please login using Connect-AzAccount.
Exception: No account found in the context. Please login using Connect-AzAccount.

Assign system assigned identity to your Function by going to Identity option in Platform feature

Add Identity to Reader and Virtual Machine Contributor roles in subscription. Reader role is needed to pull list of all VMs in subscription and Contributor role one needs to be able to execute scripts on VMs

You shall see successfull output now with details of what queue messages were created

2019-08-20T21:25:46  Welcome, you are now connected to log-streaming service. The default timeout is 2 hours. Change the timeout with the App Setting SCM_LOGSTREAM_TIMEOUT (in seconds). 
2019-08-20T21:25:49.448 [Information] Executing 'Functions.PopulateQueueWithVMNamesHTTP' (Reason='This function was programmatically called via the host APIs.', Id=3d49429c-63c9-4b8e-998b-d05514863f09)
2019-08-20T21:25:55.744 [Information] INFORMATION: PowerShell HTTP trigger function processed a request.
2019-08-20T21:25:55.761 [Information] INFORMATION: getting Storage Account info
2019-08-20T21:25:57.910 [Information] INFORMATION: getting Queue Account info
2019-08-20T21:25:58.183 [Information] INFORMATION: getting all VM Account info
2019-08-20T21:26:01.662 [Information] INFORMATION: Generating queue messages
2019-08-20T21:26:01.766 [Information] INFORMATION: Loop finished
2019-08-20T21:26:01.770 [Information] INFORMATION: Added 1 count {
"VMName" : "GregDesktop",
"ResourceGroup": "DEVTESTLAB-RG",
"State" : "VM running"
} to queue 1 records process
2019-08-20T21:26:01.920 [Information] Executed 'Functions.PopulateQueueWithVMNamesHTTP' (Succeeded, Id=3d49429c-63c9-4b8e-998b-d05514863f09)

You shall also see this queue message in your storage account

If you monitor logs for MetadataFunction you’ll see it wake up and process messages posted in queue

019-08-20T23:12:07.244 [Information] INFORMATION: Finished executing Invoke-AzureRMCommand with parameters GregDesktop, DEVTESTLAB-RG, VM running, return is {"DocumentIncarnation":0,"Events":[]} )
2019-08-20T23:12:07.255 [Information] INFORMATION: Outputing following to Log Analytics [
    {
        "Return" : "{\"DocumentIncarnation\":0,\"Events\":[]}",
        "VMName" : "GregDesktop",
        "ResourceGroup" : "DEVTESTLAB-RG"

    }
]
2019-08-20T23:12:07.588 [Trace] PROGRESS: Reading response stream... (Number of bytes read: 0)
2019-08-20T23:12:07.589 [Trace] PROGRESS: Reading web response completed. (Number of bytes read: 0)
2019-08-20T23:12:07.596 [Information] OUTPUT: 200
2019-08-20T23:12:07.644 [Information] Executed 'Functions.MetadataFunction' (Succeeded, Id=21111100-7a23-4374-93f1-9dfa5df76011)

You’ll see also output posted to LogAnalytics workspace custom folder called MetaDataLog

You can then setup alerting on scheduled redeploy events via executing Kusto query below and tying Monitor action to it

MetaDataLog_CL
| project VMName_s, TimeGenerated,  ResourceGroup, Return_s
| summarize arg_max(TimeGenerated, *) by VMName_s
| where Return_s contains "Redeploy"
| order by TimeGenerated desc 

Notes:

  1. Consumption plan is impossible to use due to scalability of powershell running on single core instances provided by consumption plan. I was unable to use it in any form or capacity until I switched to App Service plan instead. (https://docs.microsoft.com/en-us/azure/azure-functions/functions-reference-powershell#concurrency)
  2. Increase value for parameter PSWorkerInProcConcurrencyUpperBound to increase concurrency since function is not CPU or IO bound. Mine is set to 20
  3. Go to Application Service plan also configure Scale Out/In rule to scale number of instances based on size of queue. Mine is set to 8. So once application is triggered you’ll get 160 instances of powershell executing in parallel
  4. Project consists of 2 functions to populate queue. One is HTTP triggered and another one executed on timer.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s