Fake RAID Adventures

The other day I got my geeky hands on two old SuperMicro X8STI-F 1U servers. I plan to use them as build and embedded target emulation servers for my open source projects as well as Minecraft server for my kids :)

During the install of Ubuntu Server 16.04 I realized I wanted try to use the Intel PCH “Fake Raid” on the two 1 TB disks, RAID-1. The support Intel has in Linux and mdadm for their southbridge RAID controller is really excellent!

I’m not so sure though about the level of support that Debian/Ubuntu and systemd have for RAID controllers though. I’ve run into problems with how to shutdown, reboot, and power-off the servers with the RAID.

So far I’ve made out that it is possible to power-off the servers with the power button, but only straight after boot! To make matters worse it doesn’t seem that Ubuntu sets the IMSM controller in “idle” state when rebooting, so unmount (remount ro) fails. I think this is the recommended way:

echo idle > /sys/block/md127/md/sync_action

… for all the md devices found by globbing /sys/block/md*

And then wait for state to become idle before unmounting and remounting / read-only, as you have to do, but that doesn’t seem to be done, so when the system is booted again it always comes up in a degraded state :-/

Anyhow, this little adventure taught me a few tricks worth remembering.

Show status of Intel Container

This command lists member arrays and disks used in the container:

user@example:~$ sudo mdadm -D /dev/md/imsm0

This command details RAID (firmware) capabilities:

user@example:~$ sudo mdadm --detail-platform
       Platform : Intel(R) Matrix Storage Manager
        Version : 8.9.1.1002
    RAID Levels : raid0 raid1 raid10 raid5
    Chunk Sizes : 4k 8k 16k 32k 64k 128k
    2TB volumes : supported
      2TB disks : not supported
      Max Disks : 6
    Max Volumes : 2 per array, 4 per controller
 I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)

Remove a disk from a container

Say disk /dev/sda is unhealthy in any of the member volumes, fail it in all volumes, remove it, and zero out the superblock to re-add:

user@example:~$ sudo mdadm /dev/md/Boot -f /dev/sda
user@example:~$ sudo mdadm /dev/md/Root -f /dev/sda
user@example:~$ sudo mdadm /dev/md/imsm0 -r /dev/sda
user@example:~$ sudo mdadm --zero-superblock --force /dev/sda
user@example:~$ sudo mdadm /dev/md/imsm0 -a /dev/sda

Here Root and Boot are my member volumes in the Intel PCH imsm0 container. I use a small Boot partition for /boot, first partition on the array, to be able to boot using GRUB from it.

If the above doesn’t work for you (it didn’t for me), try zeroing out the complete device, which naturally will take a while, try smaller blocksize (bs) if the following does not work:

user@example:~$ sudo dd if=/dev/zero of=/dev/sda bs=100M

Then reboot and tell the Intel firmware (Ctrl-I maybe) at POST to add the “new” disk to the array, or rather to the container to be correct. When booting up the system again, Linux starts a background rebuild.

Check satus of RAID

Check progress of the rebuild with:

user@example:~$ cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md125 : active raid1 sda[1] sdb[0]
      2097152 blocks super external:/md127/0 [2/1] [_U]
        resync=DELAYED

md126 : active raid1 sda[1] sdb[0]
      974660608 blocks super external:/md127/1 [2/1] [_U]
  [===>.................]  recovery = 17.5% (171137152/974660740) finish=103.4min speed=129396K/sec

md127 : inactive sda[1](S) sdb[0](S)
      5024 blocks super external:imsm

unused devices: <none>

Which is very useful with the watch command!

Good Luck!