ADFS3.0 MEX Monitor Fix

November 6, 2016 0 By Morten Lerudjordet

I recently did some work on publishing internal legacy applications using WAP and ADFS for pre-authentication. Wrapping of the production part of these components I wanted to get full visibility into how they performed over time. To do this I added in SCOM management packs for both products. As monitoring kicked in I started seeing that the health of the overall ADFS farm always defaulted to a warning state. Delving a bit deeper into why, I found that it was the “MEX Endpoint Is Unreachable”-monitor that was keeping the farm in this state.

mex_monitor

Now just a quick recap of the setup I was using; 2 ADFS 3.0 (2012R2) nodes using WID (windows internal database) sitting behind a Kemp loadbalancer. To make this install a bit more difficult was that there was only on LB with multiple legs in different subnets. The LB IP was always a public IP, even if it was only being used internally. This made routing a bit tricky. But enough with that. Back to the monitor.

Looking at the error message I could see that whatever the monitor was doing something inside the logic was not working as expected.

node2error

I therefore exported the MP to take a closer look. What I could see was that this monitor was executing a Powershell script. Combing through the code I discovered this line.

Normally there is no problem running Get-ADFSEndpoint, though the way Microsoft has implemented this is that the ADFS Cmdlet can only be executed on the primary node. Running any of the powershell functions from the ADFS Cmdlet on any other node, will result in the error we are seeing above. This means that for a monitor that needs to run on both nodes, one cannot use any of these functions to retrieve configuration data.

Let’s take a step back and discuss what this monitor is supposed to achieve. As it is targeted to warn about potential problems accessing the MEX endpoint, and as these are present on each node, the monitoring code will need to target each node separately. This is where we hit the other problem with this monitor. It is targeting the URL of the MEX Endpoint, and this is the URL published at the LB front. Which in turn means that when the monitoring logic running on each node pulls the URL (e.g sts.mydomiain.com) it will first hit the Kemp in my case, and then be sent to a random node on the backend. This will not always be the node where the monitor logic is currently running. So that kinda defeats the purpose of the monitor, as you want to actually test the MEX endpoint on that specific node and not on one of the other node, as they have their own monitor running there.

I searched for another way of getting the location of the MEX endpoint without having to use the ADFS Cmdlet, but I could not find any place else where this was stored without having to query the WID database. And I did not have the time to write the code for that. So as a compromise I elected to do the following.

The obvious weakness of this is that the Mex endpoint can be changed and in these scenarios this will break. I also elected to target the node the code is running on, so I actually check the MEX on that node. This comes with another problem. With ADFS 3.0 the default configuration is that it will only answer queries coming in on the ADFS URL (I.e https://sts.mydomain.com/adfs/services/trust/mex) and not on a URL containing the node name (e.g https://adfsnode01.mydomain.com/adfs/services/trust/mex). To allow this one will need to add a binding using the following command:

To check that this worked, open IE on the node and enter the MEX URL (e.g https://adfsnode01/adfs/services/trust/mex). If XML gets downloaded, everything is working as expected.

The next challenge I had to solve was getting the validation logic to work:

For me the last part was still failing when I ran the code on the node. The problem with my setup was how the Kemp was set up, and that meant I could not access the public URL of the ADFS published on the Kemp LB. This can be verified in the same way using IE, but entering this URL instead of the node one (e.g https://sts.mydomain.com/adfs/services/trust/mex). If nothing gets returned, you probably have the same problem as me. As I see it I had two options, either removing the MetadataExchange part of the code. Though this would remove some of the monitoring function, or I could add an entry in the host file on each node pointing sts.mydomain.com to the IP of the node. Doing this fixed the problem of not reaching through the LB, and letting the logic evaluate the URL in the metadata exchange.

Edit: I found a problem changing the host record as described above. This will generate another ADFS error when it comes to publishing of federationmetadata.xml over time. I have therefore found another way of working around this problem. I added the following line of code instead:

This will not do an validation of URL’s in the downloaded MEX document (e.g https://sts.mydomain.com/.…). The monitor will still go into a warning state if the document is unavailable.

Not perfect, but better. If anybody wants the new monitor head on over to Technet Gallery and get it there.

The new monitor is called “MEX Endpoint Is Unreachable WID Farm”, just disable the old monitor and activate this to profit from a green status on your farm health rollup.

That is it for now, happy tinkering!