Troubleshooting Issues with SSO and Kerberos Domain Controllers
Hello everyone, this post details some events that I have encountered when using Single Sign On (SSO) via Kerberos authentication with the SAP portal, detailing some of the ways that we discovered the issue and found the root cause. Let me set the scenario. Some intermittent issues start to occur randomly for users where SSO fails, presenting users with the NetWeaver login screen. At first only a couple of users per week, if that, are affected, by the end of the month, many more occurrences are noted, all intermittent and difficult to replicate with the end user remotely. Changes made to the system in this time were deemed unable to cause such problems. This issue had become a rather large problem, and investigation needed to be stepped up a notch.
This issue was very difficult to troubleshoot, and replicating the issue was quite difficult, and involved continuous logging on and off the server until we encountered the error, and could work with it. Once the issue had been replicated we used various network analysis tools, such as Kerbtray, HttpWatch, and Wireshark to identify the issue.
Kerbtray is a tool that can be used to show ticket authentication information for the Kerberos protocol on a computer where Kerbtray exists. Kerbtray showed us that the authentication ticket stored by the client machine for the portal server was not a Kerberos ticket which prompted us to look at the Kerberos authentication mechanism. (http://www.microsoft.com/downloads/en/details.aspx?FamilyID=4E3A58BE-29F6-49F6-85BE-E866AF8E7A88)
HttpWatch is a tool that can be used to monitor HTTP traffic from a client machine, and any web page it connects to. In this case we used it to monitor the HTTP requests and responses from our browser and the portal server. The product also allows for cookie inspection at a given point in time, when a request was made. This feature showed that the cookie was storing an NTLM (Windows NT LAN Manager) ticket, which indicated that Kerberos authentication was broken. (http://www.httpwatch.com/)
Wireshark is a fantastic network protocol analyser that provides very detailed information about network traffic transmitted over a network interface. Wireshark was used to tie everything together and identify that a domain controller was failing to issue the Kerberos ticket. Some people might say, why not start troubleshooting with Wireshark, instead of bothering with the other two tools, well, Wireshark gives you a lot of information, about many different protocols, and when there is not a clear cause for the error, the other tools assisted in narrowing the problem domain so that we could concentrate on more relevant information when analysing the issue with Wireshark. (http://www.wireshark.org/)
Further to our example, and to finish the story, we began to review changes to surrounding systems, one catching our eye, a domain controller upgrade. The change involved upgrading a single domain controller that was part of a clustered group from Windows Server 2003 domain controllers, to a Windows Server 2008 R2 domain controller.
With some research and digging around we found that upgrading this domain controller to the new version had caused the issue, and Microsoft had detailed some solutions to the issues that can occur when upgrading a domain controller within a cluster. Below is a screen capture of the Wireshark trace, where sensitive information has been replaced with generic descriptions.
There are two key issues at play here. The first occurs as the portal utilises the DES encryption protocol, which is disabled by default for Windows Server 2008 R2 and Windows7 client machines as described in the following Microsoft KB article: http://support.microsoft.com/kb/977321. The second issue is a bit more technical, but in short, occurs because Windows Server 2003 domain controllers and Windows Server 2008 R2 domain controllers utilise different data structures to store encryption type information about a user. For more detailed information on this issue and its solution please refer to the following Microsoft KB article: http://support.microsoft.com/kb/978055.
This is also a known issue with SAP detailed in the following SAP notes:
I hope this post may help some people out there that may be encountering similar issues.