Software for cost effective community data collection

In 2012 we started enumerating a population of about 40,000 people in the five Mukim (sub-districts) in the District of Segamat in the state of Johor on peninsular Malaysia.  This marked the start of the data collection for establishing a Health and Demographic Surveillance Site (HDSS) — grandly called the South East Asia Community Observatory (SEACO).

When establishing SEACO we had the opportunity to think, at length, about how we should collect individual and household data.  Should we use paper-based questionnaires? Should we use Android Tablets?  Should we use a commercial service, or should we use open source software?  When we eventually collect the data, how should we move it from paper/tablets into a usable database?  In thinking about this process, one of the real challenges was that HDSS involve the longitudinal follow-up of individuals, households, and communities.  Whatever data collection system we chose, therefore, had to simplify the linking of data about Person_1 at Time_1 with the data about Person_1 at Time_2.

I eventually settled on OpenDataKit (ODK).  ODK is a marvellous piece of software developed by a team of researchers at the University of Washington, it runs on Android Tablets and it was released under an open source license. We hacked the original codebase to allow the encryption of data on the Tablet (later it became a mainstream option in ODK), I wrote a small Python script for downloading the data from the Tablets, and the IT Manager wrote a PHP script to integrate the data with a MySQL database. We managed the entire process from collection to storage, and it worked extremely well.  I hate the idea of using proprietary software if I don’t have to, and when we set up SEACO we decided that as much as possible we would use open source software so that others could replicate our approach.

 

SEACO data collector using an Android Tablet with ODK completes a household census, 2012

Recently we moved to away from ODK to a proprietary service: surveyCTO.  Unlike ODK, we have to pay for the service and for reasons I will go into, it has thus far been worth the move.

ODK did not do exactly what we needed and this meant that the IT team regularly made adjustments to the code-base (written in Java).  The leading hacker on the team moved on.  That left us short-handed and also without someone with the familiarity he had with the ODK codebase. I was torn between trying to find a new person who could take on the role of ODK hacker versus moving to proprietary software.  The final decision rested on a few factors — factors that are worth keeping in mind should the question arise again.  First, our operation has grown quite large. There are multiple projects going on at any one time, and we required a full-time ODK person.  SurveyCTO maintained most of the functionality we already had, and it also had some additional features that were nice for monitoring the data as they came in, and managing access to data.  Second, the cost of using surveyCTO was considerably lower than the staff costs associated with having an in-house developer.  We would lose the capacity for some of our de novo development but benefit by having a maintained service at a fraction of the cost.

If I had more money, my preference would be to maintain the capacity for in-house development.  If I were only doing relatively small, or only one-off cross-sectional studies, I would use ODK without hesitation.  For a large, more complex operation, a commercial service made economic and functional sense.

One of the other services I considered was Magpi. At the time I took the decision, it was more expensive than surveyCTO for our needs. If you, however, are just beginning to look at the problem, you should look at all options. I am sure there are now other providers we had not considered.