Troubleshooting Expired PSC Certificates with vSphere 6

Within 3 months of joining the University of Minnesota to work on their virtualization platform, our primary production vCenter 6 had expiring certificates.  So we set out to replace the machine SSL certificate, following the procedures documented in this VMware KB: Replacing a vSphere 6.0 Machine SSL certificate with a Custom Certificate Authority Signed Certificate (2112277)

Upon completing this process, we quickly discovered other solutions hooked into vCenter broke, which led us to discover the next series of KBs necessary to clean up the broken SSL trust relationships.

It was at this KB (for the sslTrust strings) that we ran into trouble correcting the issue.  Both KBs essentially have you login to Managed Object Browser (MOB) of the Lookup Service (which is a component of the Platform Services Controller.)  When we tried to login to the MOB with the administrator@vsphere.local account, it repeatedly prompted for the credentials as if we were failing authentication.

Lookup Service MOB Repeatedly Prompts for Credentials

Lookup Service MOB Repeatedly Prompts for Credentials

To verify, we were able to login to our associated vCenter with this account on the first try, so that ruled out bad credentials or a locked account.

Another odd symptom we found in investigating the problem was the the Platform Services Controller Client failed to display, returning the following error: “PSC Client HTTP 400 Error – NULL.”

PSC Client HTTP 400 Error - NULL

PSC Client HTTP 400 Error – NULL

So what was really wrong here?  The PSC client logs provided the best clue…

The PSC client stores its logs in a separate runtime directory from the other vCenter/PSC logs.  For a Windows-based vCenter 6 installation, I found the logs located here:

<Drive Letter>:\ProgramData\VMware\vCenterServer\runtime\vmware-psc-client\logs

Looking at the psc-client.log (or the wrapper.log), I found the following error that indicated the problem:

java.lang.RuntimeException: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: Server certificate chain is not trusted and thumbprint doesn’t match

Caused by: sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: timestamp check failed

Microsoft Remote DesktopScreenSnapz243

PSC Client – Mismatch SSL Thumbprint and Expired Certificate

Ultimately we determined that this vCenter 6 installation was upgraded from 5.5, and during the time it was running under version 5.5 the self-signed certificates were replaced with CA signed equivalents, which included the “ssoserver” certificate.  Then when vCenter was upgraded to 6.0, the “ssoserver” CA signed certificate was retained, but had now expired.

This problem wasn’t obvious because we were connecting to the lookup service and the PSC client through the Reverse HTTP proxy, which was presenting the newly installed CA signed machine SSL certificate:

Lookup Service Connection Through RHTTP Proxy

Lookup Service Connection Through RHTTP Proxy

However, if I tried connecting to the MOB interface of the lookup service directly via port 7444, then the expired “ssoserver” certificate was presented:

Lookup Service Direct Connection via Port 7444

Lookup Service Direct Connection via Port 7444

With vSphere 6, the “ssoserver” certificate is effectively an internal certificate, as your connections can be brokered through the RHTTP proxy service going forward.  The reason port 7444 may remain exposed in your vSphere 6 installation is for backward-compatibility with vCenter 5.5, as the PSC can support both vCenter versions during the upgrade process.

With an expired “ssoserver” certificate, access to the Lookup Service MOB and PSC-Client will not work.

Considering that this certificate is now internal, and the machine SSL certificate is presented through the RHTTP proxy service, it didn’t make sense for us to continue maintaining a CA signed certificate for this component.  Therefore we decided to have the VMCA issue a new certificate for this component, following the steps documented in the following VMware KB: Replacing the Lookup Service SSL certificate on a Platform Services Controller 6.0 (2118939)

Some notes concerning KB 2118939:

  • Plan for down time.  This process will require you to restart your PSC and vCenter to take effect.
  • Take a snapshot / backup of your vCenter / PSC before you attempt these procedures.
  • Follow the instructions exactly!  You will ultimately be generating a new .p12 (PKCS #12) certificate file and will replace that file only under the VMware Secure Token Service (STS) directory.  But in that directory you’ll find other related files such as the ssoserver.crt and ssoserver.key files.  Do not be tempted to try updating these other files (or files with the same names under vCenterServer\cfg\sso\keys)!  As VMware clearly documented here near the bottom of the page, attempting to modify other certificate files directly outside of what’s documented in a KB or as direct by VMware GSS, may result in unpredictable behaviors.  We initially did not heed this warning and had to revert our snapshot to recover, as the entire vCenter + Embedded PSC failed to come back online.
  • If you still wish to use a CA signed certificate for ssoserver, note the KB states at the bottom under the “Additional Information” section: “If you do not want to use the VMware Certificate Authority to generate the certificate, you can manually generate the Certificate Signing Request and provide it to your desired Certificate Authority.For more information, follow the steps for VMware vCenter Single Sign-On 5.5 in Creating certificate requests and certificates for vCenter Server 5.5 components (2061934) to generate new certificate files for the Lookup Service.”

After applying KB 2118939 to our installation, both the Lookup Service MOB and the PSC Client were working again!  We were then able to move on and correct the sslTrust strings and clear that issue.

Finally, we had to update SSL trust for the ESX Agent Manager (EAM), Auto Deploy, and for our vCenter 6 Appliances with VSAN clusters, the VSAN Health Check Plugin, based on the following three KBs:

While the certificate management process has improved significantly from vSphere 5.5, the number of KBs above confirm that more needs to be done under vSphrre 6 to ensure replacing a certificate doesn’t create so much fallout.  However to VMware’s credit, at least the issues are well documented.  Hopefully this article helps you navigate the certificate issues in vSphere 6 more effectively!

Aaron Smith
Architect: Virtualization Platform, IT@UMN
Twitter: @awsmith99

One thought on “Troubleshooting Expired PSC Certificates with vSphere 6

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s