The other day a drive died in a hardware RAID5 array on one of my GlusterFS replica servers. I had a spare drive on hand, but I wasn’t sure what the process was for replacing it, and just how much down time my cluster would incur. To my surprise, I took the replica down and none of the clients even noticed!
For posterity, here are my rough notes of the procedure:
- Power server off
- Replace drive with a new drive (new, off-the-shelf drives will be added to the RAID automatically, while previously-used ones might have to be cleared and added manually via
MegaCli64
- Power server on (I started in single user mode so I could assess the RAID before letting GlusterFS try to heal itself)
- Check rebuild status:
# /opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ShowProg -PhysDrv [16:11] -aALL Rebuild Progress on Device at Enclosure 16, Slot 11 Completed 13% in 110 Minutes.
Note:
[16:11]
is the enclosure and slot numbers, which you can piece together from looking at the output ofMegaCli64 -PDList -a0
- After seeing that the RAID was rebuilding ok, I allowed the system to boot up by exiting single user mode
- GlusterFS automatically detected that it needed to heal the replica:
# gluster volume heal homes info Gathering Heal info on volume homes has been successful Brick storage0:/mnt/gfs/storage0/sda1/homes Number of entries: 1 /inzuki/Mglaziovii/GINC1trial_1.trimmed.fastq Brick storage1:/mnt/gfs/storage1/sdb1/homes Number of entries: 0
- After 30 minutes or so the replica is healed completely:
# gluster volume heal homes info Gathering Heal info on volume homes has been successful Brick storage0:/mnt/gfs/storage0/sda1/homes Number of entries: 0 Brick storage1:/mnt/gfs/storage1/sdb1/homes Number of entries: 0
- After 24 hours the RAID rebuild is finished, and the 30TB RAID5 is Optimal:
# /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -L0 -a0 Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name : RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3 Size : 30.013 TB Sector Size : 512 Is VD emulated : Yes Parity Size : 2.728 TB State : Optimal Strip Size : 64 KB Number Of Drives : 12 Span Depth : 1 Default Cache Policy: WriteBack, ReadAhead, Cached, Write Cache OK if Bad BBU Current Cache Policy: WriteBack, ReadAhead, Cached, Write Cache OK if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Disabled Encryption Type : None Is VD Cached: No
Great Success!
That’s a major win for GlusterFS!
For reference, the GlusterFS version in use during this exercise was 3.3.1 on CentOS 6.4.
Edit (October 24, 2013): I had another drive fail in this server, and I simply hot swapped the drive this time as opposed to shutting the server down first. MegaCli64 -PDList -a0
will show the drive going from “Firmware state: Failed” to “Firmware state: Rebuild”.
How can a HDD fail in a RAID array and no client notices the failed drive?
Thats’ some epic stuff. Zen-like.