Monday, March 29, 2010

How to troubleshoot Pink Screen on Death (PSOD) in ESX

Source : VMware KB, VMworld PSOD Guide, TechTarget

VMware might not want to get into Blues like Microsoft much famed BSOD (Blue Screen on Death). So they named it as PSOD (Pink Screen on Death).. LOL..

As you know PSOD crash normally could have no shortcut to find the source of problem, VMware tried it's best to make to straight, possible PSOD causes as mentioned below

1) Hardware Faults
2) Host Faults
3) VMM Faults
4) Guest Operating System Faults
5) Application Faults

  How to Capture the PSOD crash dump (By default PSOD dump is saved on /root)

1) type "vm-support" (on # Prompt) without any options. The utility will run and create a single Tar file that will be named  "esx-xxxxxx..tgz".

2) Alternatively, you can generate the same file by using the VMware Infrastructure Client (VI Client). Select  Administration, then Export Diagnostic Data, and select your host (VirtualCenter data optional) and a directory on your local PC to store the file that will be created. 


You may follow these steps to resolve the issue

 Click below --> "Read More" for Full Article

1) Check ESX host version, Patch Level, Agent version, VMware tools version - ensure all up2date

  • Type vmware –v to check ESX Server version, i.e., VMware ESX Server 3.0.1 build-32039
  • Type esxupdate –l query to see which patches are installed.
  • Type vpxa –v to check the ESX Server management version, i.e. VMware VirtualCenter Agent Daemon 2.0.1 build-40644.
  • Type rpm –qa | grep VMware-esx-tools to check the ESX Server VMware Tools installed version – i.e., VMware-esx-tools-3.0.1-32039
2) Find out with your team, any new VM's are created (or any new app on VM's)

3) Verify, If any H/w changes are been made recently

4) Troubleshooting Strategies -Hangs

Whether the hang occurs at the VM level or at the host level, these steps will help
a)
Check the console for inactivity
b) Ping the host or VM for a response
c) Monitor network traffic from outside the VM or host
d) (VM only) Monitor performance statistics of the VM from host to see if it is consuming a lot of  resources
e) (VM only) Run vm-support –s –i 10 –d 15 to collect performance statistics and logs
f) (VM only) Run vm-support –X to kill the VM, generate core dumps of the VM and collect logs 
g) (Host only)Increase BIOS watchdog timers to see if the system will return to normal operation
h) (Host only)Disable watchdog timers and see if any other symptoms arise

5) Use vmkdump –l < core_dump_file > (at # Prompt) to extract the vmkernellog from the core dump. A vmware-log.1 file is extracted from the dump. Near the end of this file you will be able to see what had happened on the system. That's cool - Thanks to VMware from built-in decipher

Fixing a frozen service console
Another problem that can occur is your Service Console can hang and not allow you to log in locally. This can be caused by hardware lock-ups or a deadlocked condition. Your VMs may continue to operate normally when this occurs, but rebooting ESX is usually the only way to recover Service Console. Before you do that, however, try shutting down your guest VMs and/or using VMotion to migrate them to another ESX host. To do this, use the VI Client by connecting remotely via SSH or by using one of emergency consoles, which you can access by pressing Alt-F2 through Alt-F6. You can also press Alt-F12 to display VMkernel messages on the console screen.

If you are able to shutdown or move your VMs, then you can try rebooting the server by issuing the reboot command through the VI Client or alternate consoles. If not, cold-booting the server is your only option

I have managed to get some interesting link - Bunch of PSOD examples & probable resolutions
http://www.rtfm-ed.co.uk/vmware-content/psod/

 
Dont miss to read this PDF, All about PSOD.
http://download3.vmware.com/vmworld/2006/tac0028.pdf

 
Coming Soon : how to troubleshoot, if Guest OS causes PSOD

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.