Validating Subject Terms Against the AGROVOC REST API

AGROVOC is a controlled vocabulary covering all areas of interest of the Food and Agriculture Organization (FAO) of the United Nations, including food, nutrition, agriculture, fisheries, forestry, environment etc. It is published by FAO and edited by a community of experts ¹. At the time of this writing AGROVOC consists of over 36,000 concepts and is […]

Leveraging the Ansible Python API for Infrastructure Reporting

A few days ago I had to get some basic information from a handful of servers for an inventory report—just basic stuff like hostname, IP address, storage capacity, distro version, etc. I already manage all of my servers with Ansible, and there’s a wealth of information available in Ansible’s setup module, so I knew there […]

Parallelizing rsync

Last week I had a massive hardware failure on one of the GlusterFS storage nodes in the ILRI, Kenya Research Computing cluster: two drives failed simultaneously on the underlying RAID5. As RAID5 can only withstand one drive failure, the entire 31TB array was toast. FML. After replacing the failed disks, rebuilding the array, and formatting […]

Genome assembly likes RAM!

This is what it looks like when you do a genome assembly and run out of memory… The machine in question actually has 384GB of RAM (not much, as far as machines which do genome assembly go!). Assembling a genome is like doing a massive puzzle; you need to have all the “pieces” in contiguous […]

Update GlusterFS 3.3.1 -> 3.4.0 on CentOS 6.4 cluster

Notes from the GlusterFS 3.3.1 -> 3.4.0 upgrade on my storage / compute cluster at ILRI, Kenya. I referenced Vijay Bellur’s blog post about upgrading to 3.4, then added my own bits using Ansible for my infrastructure (I gave an overview of my Ansible setup here). Our cluster is comprised of: Three “storage” nodes (gluster […]