The other day I was troubleshooting a vCenter Server where the VMware VirtualCenter Server service started, and in a few seconds stopped – and restarted again.
Symptoms
These symptoms just started one night – no one had done any changes to anything…it just started to happen things…
Symptom 1
The vCenter Server service stops. When starting it again it runs for a few seconds and stops again. You are not able to log in or just be logged in for a few seconds or the connection intermittedly reconnects/disconnects. If you have configured the service recovery options to retry restarting the VMware VirtualCenter Server service the last described behaviour will most likely occure.
Symptom 2
The hard drive where VMware vCenter Server is installed are filling up, and eventually all the space will be used, causing other strange behaviors. This is caused by the VMware VirtualCenter Server service making crashdumps when it crashes. If, as written above, the service is configured to retry restarting itself – it will do so until the hard drive where it’s installed is filled up…with crashdumps!
Crashdumps are located (default location) at:
“C:\Windows\System32\config\systemprofile\AppData\Local\CrashDumps”
Symptom 3
The vCenter server itself is slow or unresponsive. Maybe you can’t log in with your domain user, just hanging at “Applying user configuration…”. This is most likely because the system harddrive is full and profile data and logging could not be written to disk…
Error messages
Windows logs
The Windows System log was full of messages like this:
The Windows Application logs contained this message:
And a little bit more information regarding this error message:
Faulting application name: vpxd.exe, version: 5.5.0.43013, time stamp: 0x542efe72 Faulting module name: ntdll.dll, version: 6.2.9200.17313, time stamp: 0x5507a832 Exception code: 0xc0000005 Fault offset: 0x000000000002bcf6 Faulting process id: 0x1ba0 Faulting application start time: 0x01d0c2ebd3b1ed8c Faulting application path: C:\Program Files\VMware\Infrastructure\VirtualCenter Server\vpxd.exe Faulting module path: C:\Windows\SYSTEM32\ntdll.dll Report Id: 52556a7c-2edf-11e5-9413-5cf9dd245545 Faulting package full name: Faulting package-relative application ID: |
VMware logs
When loggen into vCenter with C# client, you get this pop-up when the VMware VirtualCenter Server service stops:
vpxd.log
You will get a lot of error messages in the vpxd.log when the service are starting…in my logs I got a lot of entries regarding the distributed switches and updating information in the database.
A lot of these:
2015-07-21T10:13:55.265+02:00 [10628 error ‘vpxdvpxdMoHost’] Response from host host-584842 is null, even though no error received
And these:
2015-07-21T10:06:37.667+02:00 [12112 error ‘vpxdvpxdMoHost’] [MoDVSwitch::HandleAsyncQueryPerfResults]InstanceId: host-234844 2307
And these:
2015-07-21T10:12:41.920+02:00 [10288 error ‘Default’ opID=task-internal-2-60ddd85c-5a] Alert:capacityMB <= 1@ d:/build/ob/bora-2183111/bora/vpx/drs/interface/drmInterfacePrivate.cpp:11161
–> Backtrace:
–> backtrace[00] rip 000000018018b7fa
–> backtrace[01] rip 0000000180104c78
–> backtrace[02] rip 0000000180105f6e
…
–> backtrace[25] rip 0000000062f53080
–> backtrace[26] rip 000007fde2aa1842
–> backtrace[27] rip 000007fde52002a9
Steps to fix this issue
Now, we have an idea of whats happening.
1. First we want to stop the VMware VirtualCenter Server service trying to restart. Find the service and set the recovery options to “Take no action” like the picture shows.
If the service have the status “Starting” and never finishes, try force it to stop. To do this, start a command prompt with Administrator privileges (right click cmd.com and choose “Run as Administrator…”). Run these two commands:
C:\>taskkill /f /pid 3700
/f => Force
/pid => specifies the Process ID of the VMware VirtualCenter Server service.
2. Second task, make the server responsive again after the disk being filled up. Navigate to the crashdump folder “C:\Windows\System32\config\systemprofile\AppData\Local\CrashDumps” and delete all the files here. This will most likely free up a lot of space!
3. Third thing, what was actually causing this behaviour ? After some troubleshooting and snooping around I found this VMware KB2076054 which is spot on. But, the ThreadStackSizeKb value they refer to may not be present in the vpxd.cfg file, and then it most likely defaults to be 256KB (kilobytes), which is the default stack size for Java threads.
Now when we know why, we can go ahead and fix the problem. Navigate to “C:\ProgramData\VMware\VMware VirtualCenter” and open “vpxd.cfg” in notepad (or your favourite alternative editor) and add, or modify, the ThreadStackSizeKb value, save and quit. When done it could look something like this:
<vmacore> <threadPool> <TaskMax>90</TaskMax> <ThreadStackSizeKb>1024</ThreadStackSizeKb> <threadNamePrefix>vpxd</threadNamePrefix> </threadPool> <ssl> <useCompression>true</useCompression> </ssl> </vmacore> |
4. Starting up the services again. Remember to reconfigure the recovery options for the “VMware VirtualCenter Server” service. Keep an eye on the services just to make sure everything is normal again.