What Keeps Me Busy at Work

I work as a Systems Analyst at the International Livestock Research Institute (ILRI) in Nairobi, Kenya. In the name of alleviating poverty in the developing world, our scientists generate a lot of data about diseases, weather, markets, etc. Lots of data means lots of servers, routers, switches, etc and, because most science types aren’t tech savvy, I get paid to help them figure out what to buy, where to put it, and how to use it!

Because I’m super organized and love documentation, I keep a log of what I do from day to day; who I help, how I fixed a bug, what I installed, etc. I’ve been doing it for about a year and a half now, so I thought it’d be quite interesting to see what the data would look like if I used the word frequencies to create a word cloud.

Graphic representation of what I did in 2010 and 2011

Waaa! I’m jealous!

Quit whining, it’s easy. First, I fed my big “diary” of activities into Antconc to generate the raw numbers:

$ head -n 5 diary_wordlist.txt
10      164     HPC
11      155     I
12      155     org
17      109     server
27      81      ilri

… then massaged them a bit to get a simple word:frequency format:

$ awk '{print $3":"$2}' diary_wordlist.txt > diary_wordlist_frequencies.txt

Right now it’s technically ready to go, but there’s a lot of commonly-occurring junk words such as “to” and “from” which should be cleaned out or else your word cloud will be extremely underwhelming.

Clean up the junk words

I wrote a script which reads lines from a reference file (such as C keywords, common English words, etc) and strips them from my word list, strip_keywords.sh:

#!/bin/bash

# a space-separated list of files containing keywords
# we want to strip from the results
keywords_files="english.txt"

for keywords in $keywords_files
do
    while read word
    do
        # delete lines containing keywords
        sed -i "/$word\s\+$/d" diary_wordlist_frequencies.txt
    done < $keywords
done

Hapo sawa

Now paste that bad boy into Wordle and you'll be good to go.

Mjanja Tech

Ujanja Ni Uhai (Hustling Is Life)

What Keeps Me Busy at Work

Waaa! I’m jealous!

Clean up the junk words

Hapo sawa