After nearly a year of running GlusterFS on my compute cluster, I’m surprised it took me so long to run into this bug. I rebooted one of my compute nodes and found that the machine hung at Mounting network filesystems during the boot sequence.
From the volume log — /var/log/glusterfs/<volname>.log — on the client, it looks like a networking issue:
[2014-02-28 06:43:15.935153] E [socket.c:2157:socket_connect_finish] 0-glusterfs: connection to 192.168.5.12:24007 failed (No route to host)
The volume in question has
_netdev in its mount options in /etc/fstab, and the
netfs service is enabled, so this volume should theoretically mount after networking is up and running. For whatever reasons, CentOS 6.4, at the time of this writing, does not behave this way.
The solution is to add a delay to your network interface’s configuration. For example, /etc/sysconfig/network-scripts/ifcfg-eth4:
... ONBOOT="yes" MTU=9212 LINKDELAY=20
Thanks to someone on
#gluster on Freenode IRC for suggesting it!