Within 3 months of joining the University of Minnesota to work on their virtualization platform, our primary production vCenter 6 had expiring certificates.  So we set out to replace the machine SSL certificate, following the procedures documented in this VMware KB: Replacing a vSphere 6.0 Machine SSL certificate with a Custom Certificate Authority Signed Certificate (2112277)

Upon completing this process, we quickly discovered other solutions hooked into vCenter broke, which led us to discover the next series of KBs necessary to clean up the broken SSL trust relationships.

It was at this KB (for the sslTrust strings) that we ran into trouble correcting the issue.  Both KBs essentially have you login to Managed Object Browser (MOB) of the Lookup Service (which is a component of the Platform Services Controller.)  When we tried to login to the MOB with the administrator@vsphere.local account, it repeatedly prompted for the credentials as if we were failing authentication.

Lookup Service MOB Repeatedly Prompts for Credentials

Lookup Service MOB Repeatedly Prompts for Credentials

To verify, we were able to login to our associated vCenter with this account on the first try, so that ruled out bad credentials or a locked account.

Another odd symptom we found in investigating the problem was the the Platform Services Controller Client failed to display, returning the following error: “PSC Client HTTP 400 Error – NULL.”

PSC Client HTTP 400 Error - NULL

PSC Client HTTP 400 Error – NULL

So what was really wrong here?  The PSC client logs provided the best clue…

The PSC client stores its logs in a separate runtime directory from the other vCenter/PSC logs.  For a Windows-based vCenter 6 installation, I found the logs located here:

<Drive Letter>:\ProgramData\VMware\vCenterServer\runtime\vmware-psc-client\logs

Looking at the psc-client.log (or the wrapper.log), I found the following error that indicated the problem:

java.lang.RuntimeException: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: Server certificate chain is not trusted and thumbprint doesn’t match

Caused by: sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: timestamp check failed

Microsoft Remote DesktopScreenSnapz243

PSC Client – Mismatch SSL Thumbprint and Expired Certificate

Ultimately we determined that this vCenter 6 installation was upgraded from 5.5, and during the time it was running under version 5.5 the self-signed certificates were replaced with CA signed equivalents, which included the “ssoserver” certificate.  Then when vCenter was upgraded to 6.0, the “ssoserver” CA signed certificate was retained, but had now expired.

This problem wasn’t obvious because we were connecting to the lookup service and the PSC client through the Reverse HTTP proxy, which was presenting the newly installed CA signed machine SSL certificate:

Lookup Service Connection Through RHTTP Proxy

Lookup Service Connection Through RHTTP Proxy

However, if I tried connecting to the MOB interface of the lookup service directly via port 7444, then the expired “ssoserver” certificate was presented:

Lookup Service Direct Connection via Port 7444

Lookup Service Direct Connection via Port 7444

With vSphere 6, the “ssoserver” certificate is effectively an internal certificate, as your connections can be brokered through the RHTTP proxy service going forward.  The reason port 7444 may remain exposed in your vSphere 6 installation is for backward-compatibility with vCenter 5.5, as the PSC can support both vCenter versions during the upgrade process.

With an expired “ssoserver” certificate, access to the Lookup Service MOB and PSC-Client will not work.

Considering that this certificate is now internal, and the machine SSL certificate is presented through the RHTTP proxy service, it didn’t make sense for us to continue maintaining a CA signed certificate for this component.  Therefore we decided to have the VMCA issue a new certificate for this component, following the steps documented in the following VMware KB: Replacing the Lookup Service SSL certificate on a Platform Services Controller 6.0 (2118939)

Some notes concerning KB 2118939:

  • Plan for down time.  This process will require you to restart your PSC and vCenter to take effect.
  • Take a snapshot / backup of your vCenter / PSC before you attempt these procedures.
  • Follow the instructions exactly!  You will ultimately be generating a new .p12 (PKCS #12) certificate file and will replace that file only under the VMware Secure Token Service (STS) directory.  But in that directory you’ll find other related files such as the ssoserver.crt and ssoserver.key files.  Do not be tempted to try updating these other files (or files with the same names under vCenterServer\cfg\sso\keys)!  As VMware clearly documented here near the bottom of the page, attempting to modify other certificate files directly outside of what’s documented in a KB or as direct by VMware GSS, may result in unpredictable behaviors.  We initially did not heed this warning and had to revert our snapshot to recover, as the entire vCenter + Embedded PSC failed to come back online.
  • If you still wish to use a CA signed certificate for ssoserver, note the KB states at the bottom under the “Additional Information” section: “If you do not want to use the VMware Certificate Authority to generate the certificate, you can manually generate the Certificate Signing Request and provide it to your desired Certificate Authority.For more information, follow the steps for VMware vCenter Single Sign-On 5.5 in Creating certificate requests and certificates for vCenter Server 5.5 components (2061934) to generate new certificate files for the Lookup Service.”

After applying KB 2118939 to our installation, both the Lookup Service MOB and the PSC Client were working again!  We were then able to move on and correct the sslTrust strings and clear that issue.

Finally, we had to update SSL trust for the ESX Agent Manager (EAM), Auto Deploy, and for our vCenter 6 Appliances with VSAN clusters, the VSAN Health Check Plugin, based on the following three KBs:

While the certificate management process has improved significantly from vSphere 5.5, the number of KBs above confirm that more needs to be done under vSphrre 6 to ensure replacing a certificate doesn’t create so much fallout.  However to VMware’s credit, at least the issues are well documented.  Hopefully this article helps you navigate the certificate issues in vSphere 6 more effectively!

Aaron Smith
Architect: Virtualization Platform, IT@UMN
Twitter: @awsmith99

I thought I would share a PowerCLI script I recently wrote that acts as a wrapper around plink.exe to enable executing commands against one or many ESXi hosts as securely as possible.  In case you’re unaware, plink.exe needs the password in plain text in order to execute in batch mode, and this script takes multiple steps to mask/protect that password during execution.  I’ve already found this useful to complete configuration tasks via shell commands that PowerCLI does not currently expose (or cannot access), or to perform operational tasks such as restarting the management agents on all hosts in a cluster to clear the deprecated VMFS warning we see in our environment @UMN every time new storage is added.

You’ll need to download a copy of plink.exe (which is part of the Putty suite) and place it somewhere on your Windows computer (I keep my Putty binaries in C:\Putty for simplicity.)  This script is designed with the following considerations in mind:

  • Scale to one or more ESXi hosts.
  • Automatically start the SSH service on the ESXi host, if not running.
  • Automatically stop the SSH service on the ESXi host, but only if this script had to start it.
  • Use the PowerShell “PSCredential” object to securely store the user ID/password used to authenticate to the ESXi host via plink.exe/SSH.  The credentials are decrypted during execution since plink.exe needs the password in plain text, but best-effort measures are in place to keep the plain text password as protected as possible, especially with respect to preventing the password from displaying in the output / on screen.
  • Enable/disable the plink.exe verbose switch.
  • Enable a switch to auto-accept the ESXi host’s SSH key.
  • Some parsing/cleanup of the output and verbose/error output to make it easier to consume the results.
  • Echo back the command, user ID, and date/time of execution in case you want to pipe the output into another source such as a CSV file for future reference.

This script makes the following assumptions:

  • You have PowerCLI loaded into your current PowerShell session.
  • You’re using a supported combination of PowerShell, PowerCLI and Windows.
  • You’re already connected to your target vCenter(s) via PowerCLI.

Here are links to my script in the following locations:

  • GitHub
  • VMware Developer Center (which points to GitHub)
  • VMware {code} … forthcoming; code repository is not up and running at the time of publishing this post.

Below are some example of running this script against ESXi 5.5 hosts using PowerCLI 5.8 Release 1.  However, this script was originally written and tested against ESXi 6.0 hosts using PowerCLI 6.3 Release 1, so it should work for multiple versions of PowerCLI and vSphere.  I’ll provide some examples against ESXi 6.0 as well for reference.

Example 1: Simple execution running the ‘vmware -lv’ command against all ESXi 5.5 hosts in a target cluster using PowerCLI 5.8 Release 1, also specifying the plink.exe verbose switch and the “auto accept host key” switch. I captured the output to a variable to make it easier to display the output.

Running 'vmware -lv' using the script against all ESXi hosts in a single cluster.

Running ‘vmware -lv’ using the script against all ESXi hosts in a single cluster.

If I expand out just the error/verbose output from a single ESXi host, we can see the verbose contents from plink.exe that you can use to reference when troubleshooting issues with running this script.  During development, this is one place where PowerShell would capture the password used in the output (because of how plink.exe pipes its verbose output to the error output stream, and therefore how we have to capture that in PowerShell), so the script scrubs the data to remove the output and empty lines to improve readability and security.

Displaying verbose/error output from plink.exe from one ESXi host.

Displaying verbose/error output from plink.exe from one ESXi host.

Notice the script’s output and verbose/error output does not display the password that was passed to plink.exe in plain text!  This improves the overall security of using plink.exe because the password is never displayed on screen, as would happen if you tried running plink.exe by hand.  If you examine the script’s code, you’ll see how it takes as many best-effort measures as possible to protect the password after it decrypts it for use with plink.exe.

Example 2: Restarting the management agents against all ESXi 6.0 hosts in a single cluster using PowerCLI 6.3 Release 1 atop Windows 10 + PowerShell 5.0.  Expanding on the prior example, I’ll first exclude the “auto accept host key” switch and demonstrate how the error/verbose output indicates the host key is not yet trusted and does not complete the connection.  Then I’ll add the auto accept switch, but exclude the verbose switch and show how the error output can help identify the problem.  Finally, I’ll have everything correct and successfully restart the management agents.  These ESXi hosts also do not have SSH service started, so I’ll display the start/stop of the SSH service from the vSphere Web Client.

2.a: Connection aborts because the ESXi host key is not cached and plink.exe won’t auto-connect to an untrusted host in batch mode.  Notice the output from each ESXi host is empty.  Expanding the error/verbose output from the first ESXi host reveals the connection failed because the host key was not known.

ESXi Host SSH Key Not Trusted, plink.exe Does Not Connect Without "Auto Accept Key" Switch

ESXi Host SSH Key Not Trusted, plink.exe Does Not Connect Without “Auto Accept Key” Switch

2.b: Adding in the “auto accept host key” switch, but providing bad credentials for the ESXi root account.  I’m also excluding the plink.exe verbose switch.  Again the output from each ESXi host is empty and expanding the error output will reveal the problem.

SSH connection auto-accepted, but bad credentials for "root" provided.

SSH connection auto-accepted, but bad credentials for “root” provided.

2.c: Everything is correct this time and the management agents are successfully restarted on all ESXi hosts in the cluster.

Successful restart of ESXi management agents.

Successful restart of ESXi management agents.

2.d: Looking at the Tasks within the vSphere Web Client, we can see multiple start/stop actions initiated against these ESXi hosts as I executed the above examples.  Note that restarting the management agents causes the ESXi hosts to disconnect from vCenter briefly.  As such, the script failed to stop SSH after completing its work.  This would be expected behavior as PowerCLI / vCenter cannot manage the host’s services while it remains disconnected from vCenter. This issue could be address in a future version by having the script check the connectivity status of the host and wait “x” seconds for connectivity to return before attempting to shutdown the SSH service.  Always opportunities for improvement!

vSphere Web Client shows script stopping/starting the SSH service.

vSphere Web Client shows script stopping/starting the SSH service.

I hope you find the script useful, and I welcome feedback, comments, (suggestions for) improvements, or success stories on how this script saved you time and effort in your daily professional life!

Aaron Smith
Architect: Virtualization Platform, IT@UMN
Twitter: @awsmith99

It’s been too long since I’ve posted anything new!  Time goes by faster than expected, with lots of changes for both Kirk Marty (@kmarty009) and myself with respect to our professional careers.  Both of us have left VMware to pursue other opportunities.  I decided after being a TAM for a year that I ultimately missed being hands-on with technology too much.  I had no issues with VMware as a company or my role as a TAM.  It was a great opportunity and the experience working for a world-class technology company was both amazing and overwhelming.  It wasn’t an easy decision to make leaving VMware.

But ultimately I couldn’t miss being a part of the next big wave of the IT landscape evolving through automation and cloud.  I was given an incredible opportunity to take over architecture for the virtualization platform at the University of Minnesota within the Office of Information Technology (OIT.)  I’ve been in the role for about 3 months now and been enjoying it immensely.  The openness of the education and research community and being in a position to help advance the pursuit of knowledge is very rewarding!

Being back on the customer side (and in the public sector for the first time in my professional career), I plan to publish new blog posts that share new knowledge I gain during this next stage of my virtualization journey.  I love sharing information in hopes of educating and saving you time.

My personal thanks to Kirk Marty for continuing to keep our site up and running!  I won’t speak for him on his new digs other than that he remains on the vendor side and continues to be very well respected and successful in his endeavors, and between the two of us will contribute knowledge to this site in hopes that others such as yourself find it valuable.

Thanks for reading this quick update, and be on the lookout for more (technical) content to come soon!

Aaron Smith, Architect – Virtualization Platform at IT@UMN, Twitter: @awsmith99

Hi Everyone!

Been a long time since I’ve posted, partly due to the fact that I’ve switched from being a VMware Sr. Systems Engineer (SE) in pre-sales to a VMware Sr. Technical Account Manager (TAM) under professional services and being focused on the new role!  But I am eager to write some new posts, and have a few topics in mind, so keep your eyes open!

An item that recently came up for one of my TAM customers is supportability of Multi-NIC vMotion beyond 2 physical uplinks (commonly referred to in this context as vmnic adapters.)  There may be some confusion around the fact that the reference KB below for configuring Multi-NIC vMotion walks you specifically through a 2 vmnic setup, and whether that implies the supported configuration maximum for this feature:

http://kb.vmware.com/kb/2007467

Does this imply that the supported configuration maximum is 2 adapters?  Not at all, but this even had me initially confused and unsure, so I started checking internally.

Ultimately, our customer pointed out the vSphere documentation section on vMotion Networking Requirements, which clearly states that you can support more than 2 physical adapters for this feature.  Example from vSphere 5.5 documentation:

https://pubs.vmware.com/vsphere-55/topic/com.vmware.vsphere.vcenterhost.doc/GUID-3B41119A-1276-404B-8BFB-A32409052449.html

You can configure multiple NICs for vMotion by adding two or more NICs to the required standard or distributed switch. For details, see the VMware knowledge base article at http://kb.vmware.com/kb/2007467.

I’ve made the suggestion that this KB article be updated to call out the same statement written in our vSphere documentation to help clarify this fact to those who may reference it in support situations or when determining their options for implementing this feature.  Our customer in question configures 4 physical uplinks for Multi-NIC vMotion to support effectively moving VMs with very large amounts of RAM assigned between ESXi hosts.

Aaron Smith
Sr. Technical Account Manager | VCP5-DCV
@awsmith99

 

Well, I’m finally writing my first post!  I decided to start this blog about a year ago, and it’s embarrassing that I’m finally committing to Virtually Understood.  You know how it is…like the Nationwide commercial that says “life comes at you fast”.  Big thanks to my Colleague and Friend; Aaron for being a contributor.    We look forward to more of his articles in the future.  Off the soapbox now.

VMware just recently released the long awaited vSphere 6.0 the successor to vSphere 5.5.  With numerous enhancements and features, I can envision many beginning to make the transition to vSphere 6.0.  I think one of the biggest announcements/improvements is the VMware vCenter Server Appliance capabilities.  In the past versions of vSphere we were required to install vCenter onto a Windows Based OS.  In vSphere 5.0, VMware announced the first version of a Linux-based appliance version, but it had numerous limitations and was deemed not enterprise ready.  We figured that the next few corresponding releases with address those limitations and scalability would be increased.  However, it took until vSphere 6.0 to finally match the “full capabilities” of the Windows Based version.  Which now gives large enterprises the ability to begin utilizing the appliance based vCenter – and many customers have been asking for it!  Let’s review some of the new capabilities, and keep in mind, that these metrics apply to both Windows and VCSA.

– Hosts per Virtual Center = 1,000

– Powered-ON VMs per Virtual Center = 10,000

– Hosts Per Cluster = 64

– VMs per Cluster = 8,000

– Linked Mode = Yes

The only part that you need to be cognizant is that of the Database connections.  Most customers are using Microsoft SQL for Windows as the primary ODBC source.  The appliance does not have that capability to use Microsoft SQL databases, therefore, you must utilize the embedded Postgres or you can use Oracle.  One of the more common questions I get about this is a migration path from vCenter on Windows to the VCSA.  Unfortunately, there is no direct migration, however, the new “Cross-vCenter VMotion” offers you the flexibility to migrate to the new appliance without any downtime for Virtual Machines and Hosts.  This will offer you flexibility and make it much easier than previous methods (cold migration).

One thing I noticed was the new upgrade is so simple for updating your vCenter Appliance.  You can execute a setup file and it will bring you to a web page that allows you to update your new appliance.  You can access the setup file upon extracting the vCSA appliance to a directory on your computer.  Once that is complete execute the setup file.

1.  Click the “Upgrade” Button.

03

2.  This will then initiate the “vCenter Server Appliance Deployment”

– Enter an ESXi host that you’d like to deploy the new vCenter Appliance too.

04

3.  Next the appliance you will need to provide a name for the new vCenter Appliance instance.  This is the only part that I didn’t like.  My vCenter Server is named “vc01”.  I could not deploy it with the same name…  Therefore, I had to give it a different display name.  Once this completed and I deleted the old vCenter 5.5 instance, I renamed it “vc01” in the gui and did a SVM to change files associated with it.  Just let be known that this is the case.

05

4.  Next you will enter all your relevant information for the vCenter Upgrade.

06

5.  Next choose your deployment option.  Since this is my home lab, I want the smallest resource configuration model.  07

6.  Next you will select a Datastore for the new appliance

08

7.  Next you will choose the network that you want to assign.  I did DHCP network because as part of the migration the “actual IP and DNS name” will move to the new appliance.

09

8.  You are ready to deploy your new instance of the vCenter Appliance.  Remember, your 5.5 instance is still operational and running.  The information is going to begin the transfer to the new 6.0 appliance.

010

9.  The Migration process begins!

011

10.  Once the migration completes, your new vCenter is online.  Do remember that it will assume the new “VC01” dns name and IP information.

012

Overall, I think VMware has done a great job with the new upgrade process for VCSA!  Definitely better than the previous versions.  I do recommend that you try this in your lab first before a real production environment.  Just to familiarize yourself with the process.  It seems pretty seamless and easy!

By: Kirk Marty | vCloud Air Specialist

@vCaptain_Kirk

October marks my one year anniversary at VMware (already?!)  Since I joined in 2013, I’ve been asked by friends and family what I do and what is VMware?  My wife (a former software developer turned project manager) always chuckled at my attempts to describe virtualization because I would, true to engineer form, try to explain it purely in technical terms, taking my unfortunate victim through what virtualization is, how it helps consolidate hardware, etc.

Recently I came up with a new idea for how to describe virtualization to a non-technical audience that I thought I would share.  I believe this explanation is framed in a way that many people can relate to in this age of digital everything.  So here goes…

What is virtualization?  Remember when the only way you could watch a movie was to grab the single DVD from your shelf, put it in the player, sit down and enjoy?  But if you decided to watch something different, you had to get up (ugh), grab the case, and put in the new DVD.  Even worse, what if you wanted to watch the movie in your bedroom vs. the living room?  Then you have to buy another TV and DVD player?!

Now, you can buy and store movies on your computer or tablet.  No more DVDs.  No more buying multiple devices to play them in different locations.  You can store multiple movies in one place, easily switch between them and take them with you wherever you want.  And we can do the same thing with books!

So virtualization makes possible for computers what we already do with digital movies and books … it makes it possible to store and use multiple computers in one.

Simple.  Relating virtualization to something that’s being done or is possible in many peoples’ lives today seems like a great way to help them understand our core business at VMware.

So how do you explain virtualization to the masses that seems to resonate well?  Thanks for reading!

Aaron Smith (Sr. Systems Engineer, VMware), @awsmith99 on Twitter

The following procedures can be used to create local administrative IDs on the vCenter Server Appliance (vCSA) under version 5.5.  Such accounts are useful as either service accounts for third party applications that need access to the operating system of the appliance or as a means of eliminating use of the root account for improved security and non-repudiation purposes.  However, please note the following procedures were specifically created with service accounts in mind.  As such, some of the following steps (which will be noted) should not be applied if you wish to create local admin IDs to replace use of the root account on the appliance for non-repudiation purposes.  At the end of this post, I will also provide links to reference material I used in my research.

Important: Sections of these procedures may need to be completed again after upgrades to the vCSA.  For example, in my testing, after setting up a local administrative ID on the appliance under version 5.5 Update 1b, then upgrading it to 5.5 Update 2, one command had to be executed again to enable the account to function as a service account (specifically, protect against a DoS account lockout attack.)

Disclaimer: These procedures are provided as-is for information sharing purposes with no expressed warranty or guarantee that they will work in your environment.  I created this process through personal research and testing, but execute them in your environment at your own risk and test thoroughly!  I strongly advise you review this information with your security team to validate alignment with your organization’s policies and best practices.

For the purposes of demonstrating the procedures, we will create a service account named “viadmin” on a greenfield vCSA at version 5.5 Update 2.

1. Login to the vCSA appliance using the root account.

2. Create a new user account.

# useradd -g users -G wheel -d /home/viadmin -m -s /bin/bash viadmin

Running the "useradd" command.

Running the “useradd” command.

The switches for this command as follows:

  • -g: Sets the default group.
  • -G: List of additional group memberships.
  • -d: Sets the location of the user’s home directory.
  • -m: Creates the user’s home directory with the appropriate permissions.
  • -s: Sets the default shell for the user account.

In this example, the user account “viadmin” is created with a home directory at /home/viadmin and the default shell set to bash.  Additionally, the account is added to the “wheel” group (though the default group for the account is set to “users”), which will enable SSH and console login access, and enable the ability (among other things) to switch to the root account via the “sudo su -” command.  However, switching to the root account context requires the root password to do so.

If your security policies and best practices do not agree with adding the user account to the “wheel” privileged group, then additional steps will be required to enable SSH access to the group(s) specified and you may need to create those additional groups.

Note: Additional default settings for new user accounts are referenced from the /etc/login.defs and /etc/default/useradd files.  Some example properties from these files, such as those that specify the minimum/maximum password age are shown in the pictures below.

Example Properties From /etc/login.defs

Example Properties from /etc/login.defs

Example Properties From /etc/default/useradd

Example Properties from /etc/default/useradd

3. (Optional) Verify the home directory exists for the new user account and the permissions are set correctly.

# ls -l /home | grep “viadmin”

# ls -la /home/viadmin

The picture below shows the default permissions and contents for the new home directory.

Initial Home Directory Permissions and Contents

Initial Home Directory Permissions and Contents.

4. Set a temporary password for the account, such as “vmware.”  Doing so makes testing the initial login process for the new account easier to validate functionality before increasing security measures.  You can opt to set a more secure password at this point as well, but be advised that password complexity requirements are not enforced under the root account (as shown in the picture below where passwd reports a bad password specified but allows it to be changed regardless.)  Therefore best practice is to set a temporary password, login with the new account, and change it to meet the enforced complexity requirements.

# passwd viadmin

Set Temporary Password.

Set Temporary Password.

5. (Service Accounts Only) Disable password expiration for the new user account and remove the minimum password age requirement (which otherwise limits password changes to once per day.)  Verify the changes before and after.

# chage -l viadmin

# chage -m 0 -M -1 -E -1 viadmin

# chage -l viadmin

Remove Minimum Password Age and Password Expiration.

Remove Minimum Password Age and Password Expiration.

The switches for this command as follows:

  • -l: Displays the password age and expiration settings for the specified user account.
  • -m: Adjusts the minimum password age.  Setting to 0 removes the restriction and allows for multiple password changes in a single day.
  • -M: Adjusts the maximum password age before a new password is required.  Setting to -1 eliminates the password maximum age.  Only advisable for service accounts, and typically does not align with security policies and best practices for an organization, which often require a password change for service accounts at least once per year (e.g. -M 365) or anytime an administrator departs from the team.
  • -E: Adjusts the number of days before the user password expires and the account becomes locked.  Setting to -1 eliminates the password/account expiration.

6. Attempt to SSH into the vCSA using the new account with the (temporary) password.  Keep this session open.

Verified Successful Login With New Account via SSH.

Verified Successful Login with New Account via SSH.

7. Attempt to login to the vCSA console using the new account with the (temporary) password.  Logout when verified.

Verified Successful Login WIth New Account via Console.

Verified Successful Login wIth New Account via Console.

8. Returning to the SSH session left running from step 6 (or login again via SSH with the new account if you closed it), update the password to meet the vCSA complexity requirements or your organization’s complexity requirements, whichever is greater.

# passwd

Updated Password for New Account.

Updated Password for New Account.

9. (Service Accounts Only) Via the root account, disable account lockout after multiple failed password attempts, which protects service accounts from a DoS attack, and verify the settings before and after the change.  Note this step will only disable account lockout for the new (service) account, and will not be enforced for additional local user accounts unless if the same process is also followed for each.  By default, 3 failed login attempts will lock an account on the vCSA 5.5 (root account is an exception and is protected from being locked by such DoS attacks.)

# faillog -u viadmin

# faillog -u viadmin -m -1

# faillog -u viadmin

Account Lockout Disabled.

Account Lockout Disabled.

Edit the following two files to modify the behavior of the pam_tally.so module to enforce account lockouts on a per-user basis and ignore any accounts where the failed login attempt maximum is explicitly set (as done above.)  These files are used during login via SSH and the console of the vCSA by the PAM modules to control authentication flows and behaviors.

  • /etc/pam.d/sshd
  • /etc/pam.d/login

The following steps use the “vi” text editor and assume you have experience with it.  If not, use your favorite text editor if available on the appliance.  Otherwise spend some time learning the vi editor via a tutorial:

http://www.tutorialspoint.com/unix/unix-vi-editor.htm

# vi /etc/pam.d/sshd

Modify the following line for the pam_tally.so module, adding the “per_user” setting as shown in the picture below.

Add "per_user" Setting to pam_tally.so Module.

Add “per_user” Setting to pam_tally.so Module.

# vi /etc/pam.d/login

Make the same adjustment as done to the login configuration file shown in the picture above, adding “per_user” to the pam_tally.so module line.

10. Via SSH, attempt 4-5 failed logins, then complete a successful login.  The login should ultimately succeed without locking the account.

4 Failed Login Attempts, 5th Successful.

4 Failed Login Attempts via SSH, 5th Successful.

Additionally, attempt 4-5 failed logins with the new account via the vCSA console, then complete a successful login.  Again, the login should ultimately succeed without locking the account.  Note the console will not report the number of failed logins, and after 3 failed login attempts it returns to the default blue login screen.  The example below shows the 4th failed login attempt followed by the 5th successful login with no indication of an account lockout.

4 Failed Logins via Console, 5th Successful.

4 Failed Logins via Console, 5th Successful.

11. (Optional) Test the ability to switch to the root account context via the new (service) account.  Note this step requires knowledge of the root account password.

# sudo su –

(Enter root account password.)

# id

(Output should indicate you are now running under the root account context.)

Switch to root Account Context.

Switch to root Account Context.

These procedures were successfully tested against the following versions of the vCSA:

  • 5.5 Update 1b
  • 5.5 Update 1c
  • 5.5 Update 2
  • Upgrade from 5.5 Update 1b -> 5.5 Update 2*

* From my testing, after an in-place upgrade of the vCSA appliance, I had to execute the “faillog” command from step 9 again to disable the account lockout setting on the service account.

References:

I hope you find this information of value!  Please leave feedback or suggestions if you have thoughts on security implications of these procedures or changes to improve this process.

Aaron Smith (Sr. Systems Engineer, VMware), @awsmith99 on Twitter

I recently encountered an issue in my home lab that I’ve not seen before and wanted to share my experiences, as what I learned was really interesting and highlights a value-add of ESXi that isn’t possible with other hypervisors today.

When I logged into my lab to do some work I noticed ESXi03 had a warning icon displayed.  Upon checking the summary tab I found the following warning:

Lost connectivity to device backing the boot filesystem.

Lost connectivity to device backing the boot filesystem.

Lost connectivity to device mpx.vmhba32:C0:T0:L0 backing the boot filesystem /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0.  As a result, host configuration changes will not be saved to persistent storage.

My ESXi hosts are booting from USB sticks to maximize drive capacity for the VSAN cluster, and this warning message was indicating the USB stick was no longer available to the ESXi host.  I confirmed this further by accessing the host via SSH and checking the status of the storage devices:

# esxcli storage core device list

USB Disk Status "Dead Timeout"

USB Disk Status “Dead Timeout”

The USB stick was showing a status of “dead timeout.”  Additionally, nothing was listed for the “Devfs Path.”  Contrast this to what ESXi02 was showing for its USB disk:

ESXi02 Disk Status "On"

ESXi02 Disk Status “On”

Additionally, I did a side-by-side comparison of the root filesystem between ESXi02 and ESXi03, finding the following on ESXi03:

ESXi02 vs. ESXi03 Root FS

ESXi02 vs. ESXi03 Root FS

On ESXi03, the /altbootbank symbolic link was broken, /bootbank was now pointing to /tmp, and /productLocker and /store symbolic links were also broken.  Broken links are denoted in red text in this case.

This was a puzzle to me … how could the hypervisor still be functioning if the USB stick backing the filesystem was no longer available?  Upon digging, I first came across VMware KB 2014558 that helps you determine the installation type of ESXi:

http://kb.vmware.com/kb/2014558

# esxcfg-info -e

boot type: visor-usb

ESXi Embedded Confirmation

ESXi Embedded Confirmation

Per the KB, this indicates I am running ESXi embedded (USB.)  There’s an interesting side note as well in our installation documentation that I wanted to highlight pertaining to running ESXi from a USB drive, which I chose to follow:

http://pubs.vmware.com/vsphere-55/index.jsp#com.vmware.vsphere.install.doc/GUID-DEB8086A-306B-4239-BF76-E354679202FC.html

VMware strongly recommends using a 4GB or larger USB/SD device. The extra space will be used for an expanded coredump partition on the USB/SD device.   VMware recommends using a high quality USB flash drive of 16GB or larger so that the extra flash cells can prolong the life of the boot media.

Finally, this older VMware blog from 2011 by Kyle Gleed highlights the fact that booting ESXi from a USB device causes it to load the entire hypervisor into RAM to minimize I/O load to the USB device:

http://blogs.vmware.com/vsphere/2011/09/booting-esxi-off-usbsd.html

Unlike a local disk or SAN LUN, USB/SD devices are sensitive to excessive amounts of I/O as they tend to wear over time.  This naturally raises a concern about the life span of the USB/SD device.  When booting from USB/SD keep in mind that once ESXi is loaded, it runs from memory and there is very little ongoing I/O to the boot device. The only reoccurring I/O is when the host configuration is routinely saved to the USB/SD device.

Circling back to the warning message from ESXi03, because my hosts are operating in embedded mode (since the hypervisor is installed on a USB device), the entire hypervisor and its filesystem are loaded into RAM to avoid accelerating the I/O wear of the USB device.  So the warning indicated further changes to the ESXi configuration would not persist after a reboot because the USB disk was no longer reachable by the host.

However, the hypervisor itself could continue to function normally in supporting virtualized workloads, since it was running entirely in RAM!  From reading blogs posts on the same warning message, along with posts to the VMware Communities, I found other people who encountered this same issue waited days/weeks until they had time to visit the data center to replace the defective USB device, and had no issues or concerns with the host continuing to run production virtual machines.  I’m not aware of any other hypervisor available today that has this capability to sustain the complete loss of it’s physical disk backing.

Now, in the event that you do encounter this warning message and are running ESXi Embedded, there is a PowerCLI cmdlet you can run to capture the host configuration before you replace the potentially defective USB device, which by extension means reinstalling ESXi.  I write “potentially” because it’s possible the USB device is still healthy and can be recovered by a simple reboot or reseating of the USB device connection.

Before I rebooted my ESXi host to see if the USB device was truly dead, I leveraged PowerCLI to login to vCenter and download the hypervisor configuration in case I would need to rebuild to a new USB stick:

PS C:\…> Get-VMHostFirmware -VMHost esxi03.helios.local -BackupConfiguration -DestinationPath $HOME\Desktop | Format-List

Get-VMHostFirmware To Backup ESXi Configuration

Get-VMHostFirmware To Backup ESXi Configuration

Next, I placed the host into maintenance mode and selected the VSAN “Ensure accessibility” (default) option, which completed successfully.

VSAN Maintenance Mode

VSAN Maintenance Mode

After rebooting, the BIOS failed to find the USB device.  In my case, the USB stick was plugged into an internal port on my Dell PowerEdge R610 server.  I shut it down, removed the power cables, opened the chassis, and reseated the USB stick within the internal port.  After a second boot, the BIOS was able to locate the USB stick to start ESXi.  So fortunately I did not have to replace it, reinstall ESXi, perform enough configuration to get it on the network, and use Set-VMHostFirmware to complete the restore.

This experience left me with a new nugget of technical information on how ESXi Embedded behaves, and highlights another value-add for using ESXi to virtualize workloads.  You could install ESXi or any hypervisor onto magnetic disk atop a logical volume backed by a RAID-1 set (for example), but comparably that is much more expensive and complex to achieve the same level of protection from disk failure.  And ultimately RAID solutions won’t protect the hypervisor from complete disk failure (i.e. loss of all disks.)

Summary of my recommendations for running ESXi embedded:

  • Use a 4GB+ high quality USB device to install ESXi.  This enables use of a coredump partition on the USB device.
  • Ideally, use a 16GB high quality USB device for the same reasons above, but also to enable the additional cells to prolong the life of the USB device.
  • Offload the ESXi host logs to a syslog server, such as Log Insight.
  • Leverage Host Profiles to ensure the host configurations are consistent.  This is especially helpful when rebuilding a host onto a new USB device because you can use the host profile to automatically reconfigure it properly once it’s joined to vCenter.

Aaron Smith (Sr. Systems Engineer, VMware)

Aaron Smith here again with my second blog post (I’m sure I will soon lose count!)

Anyhow, today I wanted to share my testing of the automatic upgrade of VMware Tools within a Windows VM, as I had a recent conversation with one of my customers on this feature.  For those not familiar, here’s some information published about the capability to enable a VM to automatically check for and install updates for VMware Tools upon power on/reboot:

https://pubs.vmware.com/vsphere-55/index.jsp?topic=%2Fcom.vmware.vsphere.vm_admin.doc%2FGUID-A2491004-1C67-4E14-B47B-807E20C19108.html

“The guest operating system checks the version of VMware Tools when you power on a virtual machine. The status bar of the virtual machine displays a message when a new version is available … For Windows and Linux guest operating systems, you can configure the virtual machine to automatically upgrade VMware Tools. Although the version check is performed when you power on the virtual machine, on Windows guest operating systems, the automatic upgrade occurs when you power off or restart the virtual machine. The status bar displays the message Installing VMware Tools … when an upgrade is in progress.”

In other words, the automatic upgrade requires a second reboot of the VM to complete the installation on Windows.  For Linux, the following applies:

“When you upgrade VMware Tools on Linux guest operating systems, new network modules are available but are not used until you either restart the guest operating system or stop networking, unload and reload the VMware networking kernel modules, and restart networking. This behavior means that even if VMware Tools is set to automatically upgrade, you must restart or reload network modules to make new features available.”

Getting back to the conversation I had with one of my customers, it had been a while since the feature was last tested, and the last time it was reviewed they encountered issues, which possibly included VMware Tool upgrades getting initiated after a vMotion (which to the ESXi host receiving the VM was possibly being interpreted as a new VM power on action.)

I decided to test the automatic upgrade on a Windows 7 VM, and focused on two test cases:

1. Does a vMotion trigger the automatic upgrade of the VMware Tools (and subsequent reboot) on a Windows VM?

2. Does a reboot of the guest OS trigger the automatic upgrade of the VMware Tools (vs. having to power off, then back on?)

For the first test, I downloaded an older version of the VMware Tools from the following site. This is a handy repository for obtaining different versions of the VMware Tools for the guest operating systems we support:

http://packages.vmware.com/tools/esx/index.html

There’s a useful document that helps correlate the VMware Tools version to the ESXi version:

http://packages.vmware.com/tools/versions

Test #1: Does a vMotion trigger the automatic upgrade of the VMware Tools (and subsequent reboot) on a Windows VM?

For the test, I downloaded tools 9.0.10 (build 1479193), which is paired with ESXi 5.1 Update 2 (or ESXi 5.1 Express Patch 04.)

Next, I removed the original installation of VMware Tools from my Windows 7 VM and replaced it with the older copy I downloaded from the above site.  Then I confirmed the VM reported it tools were outdated.  Also note the VM is currently running on ESXi03.

VMware Tools Reports itself as out of date.

VMware Tools Reports itself as out of date.

Next, I modified the VM settings to enable automatic tool upgrades (VM Options -> Tools Upgrades.)

Enable Automatic Tool Upgrades.

Enable Automatic Tool Upgrades.

Next, I completed a combined datastore + host vMotion (at the time of testing I only had local storage implemented before getting VSAN off the ground.)  I chose to vMotion from ESXi03 -> ESXi01.

Datastore + Host vMotion.

Datastore + Host vMotion.

While vMotion was in progress, I logged into the guest OS and opened the about box to display the tools version currently installed.

Outdate tools version 9.0.10 (build 1479193.)

Outdate tools version 9.0.10 (build 1479193.)

After the vMotion completed, the VM resided on ESXi01.  After waiting a few minutes, nothing happened.  No tool upgrade was initiated.  That’s good news and expected behavior!

No change to VM after vMotion from ESXi03 -> ESXi01.

No change to VM after vMotion from ESXi03 -> ESXi01.

For additional confidence, I initiated the same vMotion back to ESXi03 and observed no tool upgrade/reboot of the Windows VM.

Test #2: Does a reboot of the guest OS trigger the automatic upgrade of the VMware Tools (vs. having to power off, then back on?)

This test is simple to execute.  With the automatic tool upgrade feature still enabled on the VM, I initiated a reboot and observed what would happen from the console of the guest OS.

Shortly after Windows 7 reached the Control-Alt-Delete screen, a prompt displayed indicating VMware Tools upgrade would automatically reboot the OS to complete the process, unless if I intervened.

Automatic tool upgrade initiated.

Automatic tool upgrade initiated.

After the guest OS completed its second reboot, the tools version was current to the version provided by the underlying ESXi host!

Tools version is current from Web Client.

Tools version is current from Web Client.

Tools version is current as seen within guest OS.

Tools version is current as seen within guest OS.

This is a great solution for including VMware Tool upgrades during routine patches of your guest operating systems!  And you can control this VM setting via PowerCLI, so as to ensure it is only enabled ahead of the patch process.  A valid concern is if this setting is left enabled at all times, in the event a VM reboots due to an issue and a new version of tools is available, it will automatically attempt to upgrade, introducing another variable into the troubleshooting process.

This post leaves me with a couple planned follow-on items to write about in the near future: (1) Testing automatic tool upgrades on a Linux VM, (2) Write a script to automate enabling/disabling the automatic VMware Tool Upgrades.

Hello, I am Aaron Smith, a core Systems Engineer for VMware located in the Twin Cities of Minnesota, and this is my first post on our local SE site.  Our goal is to provide the community at large with valuable technical information and news as we expand our knowledge.

As I’ve been building up my home lab atop 3x Dell PowerEdge R610 servers, I decided to try implementing VSAN to make use of the local storage I have on these nodes.  So to ensure I could maximize use of all disk slots, I switched my hosts to boot from internal USB keys.  Because I had already built my lab using the GA of vSphere 5.5, I reinstalled ESXi 5.5 GA onto the USB keys initially.  Next, to prepare for VSAN I needed to upgrade my Virtual Center and ESXi hosts to 5.5 Update 1.

For ESXi, booting from a USB key means the host will not create a scratch partition, even if the drive has ample capacity to support it.  This behavior is called out in the installation and setup guide on page 15:

Click to access vsphere-esxi-vcenter-server-55-installation-setup-guide.pdf

“Due to the I/O sensitivity of USB and SD devices the installer does not create a scratch partition on these devices. As such, there is no tangible benefit to using large USB/SD devices as ESXi uses only the first 1GB. When installing on USB or SD devices, the installer attempts to allocate a scratch region on an available local disk or datastore. If no local disk or datastore is found, /scratch is placed on the ramdisk.”

So while I can direct logs to a remote syslog server, patch installation can be problematic because the RAMDisk is not large enough to stage and install a rollup as large as ESXi 5.5 Update 1.

The good news is you can create additional RAMDisk partitions (if your host has sufficient memory available) and use it for various purposes, such as manually staging and installing patches for the hypervisor.

To that end, I executed the following procedures to create a 2GB RAMDisk partition mounted to a new directory /patch.  From there, I was able to successfully upload the ~650MB ESXi 5.5 Update 1 rollup and install it.  After the host rebooted, the RAMDisk was deleted which automated cleanup for me.

Disclaimer: Follow these procedures at your own risk and responsibility!  There are multiple ways to update ESXi hosts. I am simply providing a method I came up with on my own.

1. Enable SSH on your ESXi host.

2. Login to the ESXi host as an administrative user (e.g. root.)

3. Execute the following commands to create a new /patch directory and verify its existence.

~ # mkdir /patch

~ # ls -l /

Create /patch and verify.

Create /patch and verify.

4. Create a new 2GB RAMDisk (setting permissions to Owner=All, Group=Read/Execute, Others=Read/Execute) and verify:

~ # esxcli system visorfs ramdisk add -m 1024 -M 2048 -n patch -t /patch -p 0755

~ # esxcli system visorfs ramdisk list

Create 2GB RAMDisk.

Create 2GB RAMDisk.

5. Stage patch(es) to the /patch directory via SCP (e.g. WinSCP.)

6. Install patches from /patch directory using esxcli command.  In my case:

~ # cd /patch

/patch # ls -l

(Verify the depot ZIP is in place…)

/patch # esxcli software sources profile list -d /patch/update-from-esxi5.5-5.5_update01.zip

(The output from the above command is used to select the desired profile to install from the depot.  In my case, I selected “ESXi-5.5.0-20140302001-standard”)

/patch # esxcli software profile install -d /patch/update-from-esxi5.5-5.5_update01.zip -p “ESXi-5.5.0-20140302001-standard”

(Verify installation is successful, and reboot the ESXi host if necessary.)

Get image profiles, select the one to install.

Get image profiles, select the one to install.

7. Verify the host’s build level.  You can correlate the build level to the patch level at the following KB:

http://kb.vmware.com/kb/1014508

~ # vmware -v

VMware ESXi 5.5.0 build-1623387 

8. Verify the RAMDisk no longer exists, which validates the cleanup was automatic since they do not persist across reboots.

~ # esxcli system visorfs ramdisk list

Verify RAMDisk no longer exists after reboot.

Verify RAMDisk no longer exists after reboot.

Given the esxcli command can be executed via the vCLI (e.g. PowerCLI), it should be possible to script these procedures to automate the process and make it more scalable!