CentOS mirror script

If you’ve got more than two or three CentOS machines on your network, it’s really a waste of time and bandwidth for each machine to download the same updates as the previous/next machine. Or, sometimes you tell your users “I just need 5 minutes to run some updates and reboot” and it ends up being more like 10 or 20 minutes (hey, we’re in Africa, Internet speeds are unpredictable!).

You can set up a local mirror and then point all the servers in your network to the local copy to relieve your network bandwidth and get massive speedups on system updates.

The script

The CentOS wiki has a few example rsync snippets, but I expanded it to be a bit more abstracted, configurable to different environments, and to be able to handle multiple CentOS versions (ie 5.x and 6.x).

centos_mirror.sh:

#!/bin/bash

MINOR_VERSIONS=(5.9 6.4)
LOCK_FILE=/var/lock/subsys/centos_mirror
NETWORK_MIRROR=mirror.nsc.liu.se
DESTINATION_TOP=/export/data/mirror/CentOS

# check to make sure we're not already running
if [ -f $LOCK_FILE ]; then
    echo "CentOS mirror script already running."
    exit 1
fi

# sync all versions
for version in ${MINOR_VERSIONS[@]}; do
    if [ -d $DESTINATION_TOP/$version ] ; then
        touch $LOCK_FILE
        rsync -avSHP --timeout=300 --delete --exclude "local*" --exclude "isos" $NETWORK_MIRROR::CentOS/$version/ $DESTINATION_TOP/$version/
        /bin/rm -f $LOCK_FILE
    else
        echo "Target directory $DESTINATION_TOP/$version not present."
        exit 1
    fi
done

exit 0

Just make sure to change the paths, version, servers, etc to make sense for your environment. For example, we have a great Internet2 link to academic institutions in Sweden, so I picked a CentOS mirror in Sweden. When you’re done, don’t forget to create a cron job to run this script every so often (be kind to the mirror, don’t sync more than a few times a day).

Start using it

You’ll also have to modify your /etc/yum.repos.d/CentOS-Base.repo; comment out or remove the mirrorlist lines and then add new baseurl lines for each section, eg:

[base]
name=CentOS-$releasever - Base
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os
baseurl=file:///export/data/mirror/CentOS/$releasever/os/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

#released updates 
[updates]
name=CentOS-$releasever - Updates
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates
baseurl=file:///export/data/mirror/CentOS/$releasever/updates/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

In this example I point my repo to /export/data/mirror/CentOS/… because that’s where I’ve set my mirror script to sync to.

Disk space usage

As of this writing, the disk space used by the CentOS 5.9 and 6.4 repos is:

# du -sh /export/data/mirror/CentOS/{5.9,6.4}
14G	/export/data/mirror/CentOS/5.9
33G	/export/data/mirror/CentOS/6.4

Update — May 31, 2013: I’ve added a timeout to the rsync command. After having some issues with yum on a few machines, I noticed there were some hung rsync tasks on the mirror box, which of course meant that the mirror script never let go of its lock, and subsequent runs of the script would refuse to run. Bad all around! Since adding the timeout I’ve been running for a day now with no issues (no hung rsync processes!).