Within 3 months of joining the University of Minnesota to work on their virtualization platform, our primary production vCenter 6 had expiring certificates. So we set out to replace the machine SSL certificate, following the procedures documented in this VMware KB: Replacing a vSphere 6.0 Machine SSL certificate with a Custom Certificate Authority Signed Certificate (2112277)
Upon completing this process, we quickly discovered other solutions hooked into vCenter broke, which led us to discover the next series of KBs necessary to clean up the broken SSL trust relationships.
- Our first clue something was wrong was the vSphere Web Client reporting the following error to all users upon login: “Error occurred while processing request. Check vSphere WebClient logs for details.”, this is documented in the following KB: After installing or upgrading to vCenter Server 6.0 Update 1 the Customer Experience Improvement Program displays the error: Error occurred while processing request. Check vSphere WebClient logs for details (2129053)
- This KB has a link to a larger KB that captures the various problems that affect SRM, vSphere Replication, the vCenter Support Assistant, NSX-v, VMware Integrated OpenStack (VIO), and the Customer Experience Improvement Program: vCenter Server or Platform Services Controller certificate validation error for external VMware Solutions in vSphere 6.0 (2109074). The above KB makes the following comment: “If you replace the Machine SSL certificate on the vCenter Server or the Platform Services Controller, a connection error results if the solution attempts to connect to the vCenter Server or Platform Services Controller. The reason is that the vCenter Server system and the Platform Services Controller use the new certificate, but the corresponding service registrations with the VMware Lookup Service are not updated. When solutions connect to vCenter Server or Platform Services Controller, they look at the service registration, which includes the service URL and the sslTrust string. By default, the sslTrust string is the Base 64 encoded old certificate even if you replaced the certificate successfully.”
- The above KB led to procedures necessary to correct the the issue with the broken sslTrust strings in the lookup service, depending on whether your Platform Service Controller (PSC) is embedded or external:
- Embedded: vCenter Server certificate validation error for external solutions in environments with Embedded Platform Services Controller (2121689)
- External: vCenter Server or Platform Services Controller certificate validation error messages for external solutions in environments with a External Platform Services Controller (2121701)
It was at this KB (for the sslTrust strings) that we ran into trouble correcting the issue. Both KBs essentially have you login to Managed Object Browser (MOB) of the Lookup Service (which is a component of the Platform Services Controller.) When we tried to login to the MOB with the firstname.lastname@example.org account, it repeatedly prompted for the credentials as if we were failing authentication.
To verify, we were able to login to our associated vCenter with this account on the first try, so that ruled out bad credentials or a locked account.
Another odd symptom we found in investigating the problem was the the Platform Services Controller Client failed to display, returning the following error: “PSC Client HTTP 400 Error – NULL.”
So what was really wrong here? The PSC client logs provided the best clue…
The PSC client stores its logs in a separate runtime directory from the other vCenter/PSC logs. For a Windows-based vCenter 6 installation, I found the logs located here:
Looking at the psc-client.log (or the wrapper.log), I found the following error that indicated the problem:
java.lang.RuntimeException: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: Server certificate chain is not trusted and thumbprint doesn’t match
Caused by: sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: timestamp check failed
Ultimately we determined that this vCenter 6 installation was upgraded from 5.5, and during the time it was running under version 5.5 the self-signed certificates were replaced with CA signed equivalents, which included the “ssoserver” certificate. Then when vCenter was upgraded to 6.0, the “ssoserver” CA signed certificate was retained, but had now expired.
This problem wasn’t obvious because we were connecting to the lookup service and the PSC client through the Reverse HTTP proxy, which was presenting the newly installed CA signed machine SSL certificate:
However, if I tried connecting to the MOB interface of the lookup service directly via port 7444, then the expired “ssoserver” certificate was presented:
With vSphere 6, the “ssoserver” certificate is effectively an internal certificate, as your connections can be brokered through the RHTTP proxy service going forward. The reason port 7444 may remain exposed in your vSphere 6 installation is for backward-compatibility with vCenter 5.5, as the PSC can support both vCenter versions during the upgrade process.
With an expired “ssoserver” certificate, access to the Lookup Service MOB and PSC-Client will not work.
Considering that this certificate is now internal, and the machine SSL certificate is presented through the RHTTP proxy service, it didn’t make sense for us to continue maintaining a CA signed certificate for this component. Therefore we decided to have the VMCA issue a new certificate for this component, following the steps documented in the following VMware KB: Replacing the Lookup Service SSL certificate on a Platform Services Controller 6.0 (2118939)
Some notes concerning KB 2118939:
- Plan for down time. This process will require you to restart your PSC and vCenter to take effect.
- Take a snapshot / backup of your vCenter / PSC before you attempt these procedures.
- Follow the instructions exactly! You will ultimately be generating a new .p12 (PKCS #12) certificate file and will replace that file only under the VMware Secure Token Service (STS) directory. But in that directory you’ll find other related files such as the ssoserver.crt and ssoserver.key files. Do not be tempted to try updating these other files (or files with the same names under vCenterServer\cfg\sso\keys)! As VMware clearly documented here near the bottom of the page, attempting to modify other certificate files directly outside of what’s documented in a KB or as direct by VMware GSS, may result in unpredictable behaviors. We initially did not heed this warning and had to revert our snapshot to recover, as the entire vCenter + Embedded PSC failed to come back online.
- If you still wish to use a CA signed certificate for ssoserver, note the KB states at the bottom under the “Additional Information” section: “If you do not want to use the VMware Certificate Authority to generate the certificate, you can manually generate the Certificate Signing Request and provide it to your desired Certificate Authority.For more information, follow the steps for VMware vCenter Single Sign-On 5.5 in Creating certificate requests and certificates for vCenter Server 5.5 components (2061934) to generate new certificate files for the Lookup Service.”
After applying KB 2118939 to our installation, both the Lookup Service MOB and the PSC Client were working again! We were then able to move on and correct the sslTrust strings and clear that issue.
Finally, we had to update SSL trust for the ESX Agent Manager (EAM), Auto Deploy, and for our vCenter 6 Appliances with VSAN clusters, the VSAN Health Check Plugin, based on the following three KBs:
- After replacing the vCenter Server certificates in VMware vSphere 6.0, the ESX Agent Manager solution user fails to log in (2112577)
- If you also have VSAN running atop the vCenter 6 Server Appliance, then this KB will likely also be applicable to you to resolve issues with the VSAN Health Check plugin: Replacing VMware vCenter Server Certificates with SSL Certificate Automation Tool causes the vSAN health check plugin to stop functioning with the error: Error HTTP status 503 (2128353)
- After replacing the VMware vCenter Server certificates in VMware vSphere 6.0, the VMware vSphere Auto Deploy solution user fails to log in (2123631)
While the certificate management process has improved significantly from vSphere 5.5, the number of KBs above confirm that more needs to be done under vSphrre 6 to ensure replacing a certificate doesn’t create so much fallout. However to VMware’s credit, at least the issues are well documented. Hopefully this article helps you navigate the certificate issues in vSphere 6 more effectively!