On a VMware 5.5 ESXi host I suddenly experienced guest machined that hanged or froze. I had to reset the guest machine to get it up and running again. In the Events tab I could see entries like below at the time of freezing.
Lost access to volume 51d15c1a-17706da4-0ae6-001f29e5c998 (datastore1) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
Device mpx.vmhba1:C0:T0:L0 performance has deteriorated. I/O latency increased from average value of 69864 microseconds to 4383213 microseconds.
I started to check my disks and raid as I suspected a failing disk, but the disks and raid was in working order.
It turned out that the problem started around the time I did a guest snapshot. The datastore had about 14 GB free space out of it’s 924 GB capacity and this seems to be to0 little. After deleting a couple of snapshots the problem disappeared.
When the snapshots had been deleted, I also had to consolidate some of the virtual machines disks.
Edit: A couple of weeks later one of the disks in the raid actually failed without any warning that it was about to fail. After replacing the disk the problems were less frequent but sometimes still occured, and I discovered it was during backups, i.e. with a lot of traffic on the disks.
Moving away one of the most busy clients to another VMware host, the problems disappeared. The disks were SATA disks because this was not really a production system but more of a test and development environment where speed was not so critical. The cause was probably just overloading the disks with more requests than they could handle in combination with a disk about to fail.