HP Proliant “Hardware RAID support is disabled via NVRAM Configuration Setting”

I got my hands on a used HP Proliant server with P420i raid controller, however was unable to create a logical raid drive using the ACU. A message while booting shows “Hardware RAID support is disabled via NVRAM Configuration Setting“. It turns out the raid had been disabled by enabling hba mode.

This can be solved but it is a bit tricky. I used information from the following sources:

https://systemausfall.org/wikis/howto/Disable%20HP%20Proliant%20Hardware-RAID

http://downloads.linux.hpe.com/SDR/project/mcp/

https://wiki.debian.org/HP/ProLiant#HP_Repository

https://cdimage.debian.org/debian-cd/current-live/amd64/iso-hybrid/

  1. Connect one of the network ports to a switch on a LAN where you have a DHCP server and and Internet connection (so you don’t have to fiddle with manual network configuration).
  2. From another computer, download Debian Live ISO. I used the “standard” version (link above).
  3. Boot the server on Debian Live either by making a bootable stick or mounting it via ILO (I had to use Firefox, in Chrome the ISO was unmounted mid process)
  4. When booted on the Debian Live do:
    sudo nano /etc/apt/sources.list
  5. Add the line to the file and save it (CTRL-X):
    deb http://downloads.linux.hpe.com/SDR/repo/mcp jessie/current non-free
  6. Add the keys for the repository:
    sudo curl http://downloads.linux.hpe.com/SDR/hpPublicKey1024.pub | apt-key add -
    sudo curl http://downloads.linux.hpe.com/SDR/hpPublicKey2048.pub | apt-key add -
    sudo curl http://downloads.linux.hpe.com/SDR/hpPublicKey2048_key1.pub | apt-key add -
    sudo curl http://downloads.linux.hpe.com/SDR/hpePublicKey2048_key1.pub | apt-key add -
  7. Install ssacli:
    apt-get install ssacli
  8. Check the status (you will se hbamode true somewhere):
    ssacli controller slot=0 show
  9. Disable hbamode:
    ssacli controller slot=0 modify hbamode=off
  10. Now reboot the server, remove the USB stick or ISO using ILO and press F5 during boot to start the ACU. Now you should be able to create a logical raid drive.

FreeNAS, VMware and iSCSI with MTU 9000

In order to improve perfomance there are recommendations to set the MTU for the iSCSI interface to MTU 9000 instead of the default 1500.

To do this, it important that you set MTU to 9000 on all devices involved; FreeNAS network interface, VMware NICs and vSwitch and all switch ports on the switches connecting those.

Note: When changing these settings, it will cause loss of connectivity so it might be a good idea not to do it on systems in production.

At one point I had forgotten to set the MTU to 9000 for the VMware NIC (only for the vSwitch) causing connectivity problems and error messages in the FreeNAS logs looking like this:

WARNING: 172.16.1.200 (iqn.1998-01.com.vmware:host-585e8872): no ping reply (NOP-Out) after 5 seconds; dropping connection

FreeNAS settings

  • In Network interface setup, add “mtu 9000” to options.

VMware settings

  • Networking -> VMkernel NICs -> edit -> set MTU to 9000 -> Save
  • Networking -> Virtual switches -> edit -> set MTU to 9000 -> Save

Network switch ports

  • Make sure all network switch ports all the way between the FreeNAS and VMware host have MTU set to 9000

Testing

On the VMware host, enable SSH and login using SSH. Use the command:

vmkping -s 8972 172.16.1.100 -d

Replace 172.17.1.100 above with the IP-address of your FreeNAS. -s 8972 sets packet size to 8972 bytes allowing 28 bytes for headers and -d means fragmentation should not be allowed.

If everyting works you will get echo replies. If you get the error message “Message too long” it means somewhere on the way between your VMware host and the FreeNAS there is a limit not allowing MTU 9000.

VMware ESXi 5.5 guest freezes randomly [solved]

On a VMware 5.5 ESXi host I suddenly experienced guest machined that hanged or froze. I had to reset the guest machine to get it up and running again. In the Events tab I could see entries like below at the time of freezing.

Lost access to volume 51d15c1a-17706da4-0ae6-001f29e5c998 (datastore1) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
info
2018-10-11 04:03:50
datastore1

Device mpx.vmhba1:C0:T0:L0 performance has deteriorated. I/O latency increased from average value of 69864 microseconds to 4383213 microseconds.
warning
2018-10-11 04:03:50
localhost

I started to check my disks and raid as I suspected a failing disk, but the disks and raid was in working order.

It turned out that the problem started around the time I did a guest snapshot. The datastore had about 14 GB free space out of it’s 924 GB capacity and this seems to be to0 little. After deleting a couple of snapshots the problem disappeared.

When the snapshots had been deleted, I also had to consolidate some of the virtual machines disks.

Edit: A couple of weeks later one of the disks in the raid actually failed without any warning that it was about to fail. After replacing the disk the problems were less frequent but sometimes still occured, and I discovered it was during backups, i.e. with a lot of traffic on the disks.

Moving away one of the most busy clients to another VMware host, the problems disappeared. The disks were SATA disks because this was not really a production system but more of a test and development environment where speed was not so critical. The cause was probably just overloading the disks with more requests than they could handle in combination with a disk about to fail.