Azure Automation runbook logging redux

Azure Automation runbook logging redux

October 29, 2018 0 By Morten Lerudjordet

This time I wanted to show a more comprehensive example from a real life need and how I went about solving it.

The challenge we faced was that we had previously set our Azure AD Connector to use the e-mail field in AD to be synced to Azure AD as the UPN. This would have been fine if not for that our need changed. As we matured in the use of cloud services we needed to also add some of our contractors to Azure AD. The problem was that in the local AD the email field was populated with the contractor’s actual email, and this would make the sync fail if the accounts were added.

We therefore decided to revert the connector back to mapping UPN –> UPN (Azure AD), as was the default. Though this did not mitigate the need to still have the username in Azure AD be email.

There are probably many ways to solve this, though we decided to leverage our investment in Azure Automation and our onprem hybrid workers to solve the issue. I wrote a Runbook that iterates through all the users needed to be synced to Azure and make sure UPN is the same as mail, and if not change it to be.

Another thing I wanted to tackle was to inform helpdesk about changes made and if any of the changes failed. This so they could go in and do a manual fix if needed.

As I already had integrated AA with Azure Log Analytics I just needed to make sure I could identify and group all the logs into one logical unit. This I solved on the Log Analytics side by writing a query and an alert that called a webhook. This webhook was tied to another Runbook that took all the logs and did some basic formatting to make it more readable. I had to do this because at the time Log Analytics did not have any formatting controls (maybe still dont) to leverage. The Runbooks groups Errors, Warnings and Information into different packages and creates a ticket for them in the helpdesk system, so helpdesk personnel can follow up on them.

How I did these things is not the focus of this post, here I will only discuss how I did the layout of the UPN change Runbook.

So let’s jump into some code.

First I always try to externalise as much of the variables that potential is used for input as possible. I do not like to set this in code, so as to allow for others than the code maintainers to change the behaviour of the logic. This is why here I load in the external config parameters from AA, noteable AD OU structure and credentials to be able to change information in AD. As these values are important for the execution of the logic, one will want to fail the Runbook if there are any problems with loading them in.

Here the pattern discussed in the previous post is used.

In addition I mix the use of Write-Verbose and Write-Output, where Write-Output contains the information I want to give helpdesk and the like. While Write-Verbose is reserved for information I might need to debug, or check that the code flow behaves as expected. Knowing when to use what can be a bit challenging to begin with, but is usually solved by iterating on the code.

One might also notice the use of $VerbosePreference in different parts of the code. Usually this is to forcefully silence some output from reaching the logs. I mostly use it before running the Import-Module command, this because the verbose part of this command usually contains a lot of unwanted log information. So this is why I do $VerbosePreference = “silentlycontinue”. Then just before I use the first Write-Verbose command I add in $VerbosePreference = “continue” so as the verbose information once again flows to the logs. Though note you will still have to activate verbose logging on the Runbook for it to be actually captured.

For the first big ticket item I do a couple of things different than most scripts found on the web.

First I set the starting point for my search in AD, default is the root. This to not have to iterate through all users each time the code runs. Next I use the Filter parameter to make sure to only return users that are enabled and with accounts that are not expired. Next I limit the amount of properties returned. This also with performance in mind. Earlier in the code I retrieve a DC that is responsive and healthy, targeting this for the user retrieval logic. I call this in the context of a user that has the needed security rights to do changes to users in AD. The users that are retrieved I further cut down using a include and exclude filter. This is useful if the AD structure has users a bit spread out, and also have some unwanted users mixed up with the desired once.

This method is not perfect, though it was good enough for us. If one want a 100% method one could first fetch the information from the Azure AD Connect server, then use this to build the filter logic that retrieves the OUs to where the users that are set up to cloud sync are. I did a blog about how to get these a while back.

We now have the users that we want to work on, and in situations like this I like to lead with a ForEach loop. This so I can layer inn a lot of tests to check for the validity of the data that is most important to me. As we are working with email addresses we need to be reasonably comfortable that these are correct. Also there is no need to do unnecessary processing, so if a user already has mail = UPN we do not need to do anything.

Now let’s look at the reason for this post, the “Continue on Error” pattern.
Below is an excerpt from the greater code, where this pattern can be observed. Now Set-ADObject is an good example of functions written by 3rd parties not always following the normal PS error flow. In this instance -ErrorAction does not behave as expected, and therefore we will need to compensate for this. This is why I wrapp it in a try/catch block, but it still follows the same global pattern.

Now as there are potential 1000s of users we need to iterate through, we do not want the code to fail and terminate the run if it encounters just some user objects that has some kind of error during processing. We instead want to continue even if an error occurs, but we want to know that an error happened and the user it happened to. To do this I added two different types of helpers for this. First I write the name of the user and the error into the AA logs, next I keep a count of how many error happens throughout the run. This will help with readability of the ticket created for helpdesk to process at the end.

During testing I also found another issue that might happen, where an UPN was potentially already taken by another user object. This usually happened where an error occurred during a manual update. As this is not what I would call typical execution error I wanted to add some logic that I could use to separate out this information for 3rd party callout. I therefore chose to treat these types of things as a warning, so I could put the information outside of the error stream and inform helpdesk through the separate warning information flow.

Next I found out through testing that Set-ADObject was not 100% consistent when it came to errors. Sometimes the $oErr variable would get populated with an error without the catch-block being triggered. This is why added a separate test for when the variable was populated but the catch block had not been entered ($NoException).

The last thing to handle is if the update was successful, in this situation I chose to use Write-Output as this is important summary information for the 3rd parties. As Write-Verbose has a lot of other code execution related information, the stream is not a good candidate to be used for the important information one will want to use to inform stakeholders with.

I therefore try to think about what type of information the different stakeholders are interested in and use all the information streams available to reduce as much noise as possible in them. Stakeholders might be helpdesk, managers, code maintainers and the automation code itself.

By combining all these techniques one are both able to let the automation go through all the users, but also inform someone else if some kind of intervention is needed.

The last thing I do for readability is to give some indication of  how each run of the code went. I do this by utilizing the finally part of the try/catch-block.

Here I blend the use of Write-Output and Write-Verbose to separate out the information I think will be more easily readable and consumable by stakeholders, and information that might be needed to get deeper insight in how the run went.

That’s it, the journey is complete. In closing there are probably many ways to solve this, but I hope my way of doing it helps further the discussion of how one can think about running robust code in Azure Automation.

Lastly the example code in it’s whole.

Happy tinkering!