Replacing Failed Drive in Hardware RAID5 on GlusterFS Replica

The other day a drive died in a hardware RAID5 array on one of my GlusterFS replica servers. I had a spare drive on hand, but I wasn’t sure what the process was for replacing it, and just how much down time my cluster would incur. To my surprise, I took the replica down and none of the clients even noticed!

For posterity, here are my rough notes of the procedure:

  1. Power server off
  2. Replace drive with a new drive (new, off-the-shelf drives will be added to the RAID automatically, while previously-used ones might have to be cleared and added manually via MegaCli64
  3. Power server on (I started in single user mode so I could assess the RAID before letting GlusterFS try to heal itself)
  4. Check rebuild status:
    # /opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ShowProg -PhysDrv [16:11] -aALL
                                         
    Rebuild Progress on Device at Enclosure 16, Slot 11 Completed 13% in 110 Minutes.

    Note: [16:11] is the enclosure and slot numbers, which you can piece together from looking at the output of MegaCli64 -PDList -a0

  5. After seeing that the RAID was rebuilding ok, I allowed the system to boot up by exiting single user mode
  6. GlusterFS automatically detected that it needed to heal the replica:
    # gluster volume heal homes info
    Gathering Heal info on volume homes has been successful
    
    Brick storage0:/mnt/gfs/storage0/sda1/homes
    Number of entries: 1
    /inzuki/Mglaziovii/GINC1trial_1.trimmed.fastq
    
    Brick storage1:/mnt/gfs/storage1/sdb1/homes
    Number of entries: 0
  7. After 30 minutes or so the replica is healed completely:
    # gluster volume heal homes info
    Gathering Heal info on volume homes has been successful
    
    Brick storage0:/mnt/gfs/storage0/sda1/homes
    Number of entries: 0
    
    Brick storage1:/mnt/gfs/storage1/sdb1/homes
    Number of entries: 0
  8. After 24 hours the RAID rebuild is finished, and the 30TB RAID5 is Optimal:
    # /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -L0 -a0
                                         
    
    Adapter 0 -- Virtual Drive Information:
    Virtual Drive: 0 (Target Id: 0)
    Name                :
    RAID Level          : Primary-5, Secondary-0, RAID Level Qualifier-3
    Size                : 30.013 TB
    Sector Size         : 512
    Is VD emulated      : Yes
    Parity Size         : 2.728 TB
    State               : Optimal
    Strip Size          : 64 KB
    Number Of Drives    : 12
    Span Depth          : 1
    Default Cache Policy: WriteBack, ReadAhead, Cached, Write Cache OK if Bad BBU
    Current Cache Policy: WriteBack, ReadAhead, Cached, Write Cache OK if Bad BBU
    Default Access Policy: Read/Write
    Current Access Policy: Read/Write
    Disk Cache Policy   : Disabled
    Encryption Type     : None
    Is VD Cached: No

Great Success!

That’s a major win for GlusterFS!

Borat says, "Great success!"

For reference, the GlusterFS version in use during this exercise was 3.3.1 on CentOS 6.4.

Edit (October 24, 2013): I had another drive fail in this server, and I simply hot swapped the drive this time as opposed to shutting the server down first. MegaCli64 -PDList -a0 will show the drive going from “Firmware state: Failed” to “Firmware state: Rebuild”.

One thought on “Replacing Failed Drive in Hardware RAID5 on GlusterFS Replica

  1. How can a HDD fail in a RAID array and no client notices the failed drive?

    Thats’ some epic stuff. Zen-like.

Comments are closed.