Clear Enormous GlusterFS Mount Logs

Today Munin was complaining that a partition is nearly full on one of my servers. Looking at the disk usage graph it kinda seems like a slow loris DOS attack… Sure enough, something has gone and filled up the /var/log partition: $ df -h /var/log/ Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_root-log 9.9G 9.0G […]

Parallelizing rsync

Last week I had a massive hardware failure on one of the GlusterFS storage nodes in the ILRI, Kenya Research Computing cluster; two drives failed simultaneously on the underlying RAID5. As RAID5 can only withstand one drive failure, the entire 31TB array was toast. FML. After replacing the failed disks, rebuilding the array, and formatting […]

GlusterFS mounts fail at boot on CentOS

After nearly a year of running GlusterFS on my compute cluster, I’m surprised it took me so long to run into this bug. I rebooted one of my compute nodes and found that the machine hung at Mounting network filesystems during the boot sequence. From the volume log — /var/log/glusterfs/<volname>.log — on the client, it […]

“No data available” error in GlusterFS brick log

Recently I was poring over my GlusterFS brick logs trying to troubleshoot a problem with self healing and I saw some (6,000+!) errors which alarmed me: [2013-11-22 09:21:51.138732] E [posix.c:2668:posix_getxattr] 0-homes-posix: getxattr failed on /mnt/gfs/wingu0/sda1/homes/yajamma/qdd_programs/qdd_files/qdd3_beta/MS_extract.pl: system.posix_acl_access (No data available) It turns out this error isn’t an error after all, and there’s a patch merged upstream […]