Leveraging the Ansible Python API for Infrastructure Reporting

A few days ago I had to get some basic information from a handful of servers for an inventory report—just basic stuff like hostname, IP address, storage capacity, distro version, etc. I already manage all of my servers with Ansible, and there’s a wealth of information available in Ansible’s setup module, so I knew there […]

Parallelizing rsync

Last week I had a massive hardware failure on one of the GlusterFS storage nodes in the ILRI, Kenya Research Computing cluster; two drives failed simultaneously on the underlying RAID5. As RAID5 can only withstand one drive failure, the entire 31TB array was toast. FML. After replacing the failed disks, rebuilding the array, and formatting […]

Genome assembly likes RAM!

This is what it looks like when you do a genome assembly and run out of memory… The machine in question actually has 384GB of RAM (not much, as far as machines which do genome assembly go!). Assembling a genome is like doing a massive puzzle; you need to have all the “pieces” in contiguous […]

Update GlusterFS 3.3.1 -> 3.4.0 on CentOS 6.4 cluster

Notes from the GlusterFS 3.3.1 -> 3.4.0 upgrade on my storage / compute cluster at ILRI, Kenya. I referenced Vijay Bellur’s blog post about upgrading to 3.4, then added my own bits using Ansible for my infrastructure (I gave an overview of my Ansible setup here). Our cluster is comprised of: Three “storage” nodes (gluster […]

“interactive” script for SLURM

I recently rolled out a new distributed model for our research computing cluster at work. We’re using GlusterFS for networked home directories and SLURM for job/resource scheduling. GlusterFS allows us to scale storage with minimal downtime or service disruption, and SLURM allows us to treat compute nodes as generic resources for running users’ jobs (ie, […]