In the past, doing 'digital' data collection involved a months-long game of telephone, where the Principal Investigator (PI) would tell the data manager to write a form, then they'd tell the programmer to make a digital version, who'd show it to the Data Manager, who'd show it to the PI, who'd request a bunch of changes, and so on. Then, (if they were following good practice), they'd run a pilot test and find a whole bunch more to change, and the cycle would continue.
The result would be a bespoke data collection tool that only the programmer knew how to update, only the data manager knew how to use and only the PI knew the purpose of. The tool would be used once - maybe twice, then never looked at again.
It's no surprise that most data were collected on pieces of paper!
Fortunately, we don't live in such dark times any more. Smartphones and the internet have brought with them a variety of powerful ways to collect data without having to code an application from scratch. For us, the big player here is the Open Data Kit - powerful, flexible, open source - a tool dedicated to making it as easy as possible to collect survey and experiment with data using cheap mobile phones.
One of the most powerful things about ODK is that you don't need programmers to come and build your forms for you. Any researcher with decent spreadsheet skills can build a form using the XLS-Form standard, upload it to a service like Kobotoolbox and have it on their phone ready to collect data within an hour of deciding to run a survey. Bad news for the programmer who’s suddenly out of a job - but is it good news for the researcher? On balance, yes, but such power always comes with a cost.
ODK makes it easy to collect data. But, it also makes it easy to collect terrible data.
The tools have removed so many of the barriers between idea and implementation, but by being so flexible they allow us to get away with ignoring a lot of ‘good practice’ around data collection. It's easy to forget the rigour and careful planning that's required to design a truly good data collection form. A good form asks “the right questions, at the right time, to the right people” – and doing all 3 of those is surprisingly hard!
So how do we address this problem? How do we keep using these quick and flexible tools without falling into the trap of collecting too much data of unknown quality – just because we can?
We often sell ODK to new users with the promise of speed and simplicity. It’s quick to get started - you just need 3 columns and you immediately have a valid, working form. Take this example.
This form is perfectly functional. If you save this in Excel and upload it to Kobotoolbox, you can immediately start asking people about their caffeine habits. Great! Quick and easy, as promised.
Now let's take a look at some data collected with this form:
Now, obviously this example is faked to show off lots of issues. But I have seen every single issue here in real data collected with ODK, usually at the point where it’s too late to go back and check with the enumerators. So what can we do about it?
Fortunately, all of these issues are preventable by using existing features of ODK. While type, name and label are the only ‘essential’ columns needed for a functioning ODK form, there are some features that are just as vital for collecting good quality data.
So, you want to improve your ODK forms. You have some basic questions written as an XLS form, and you want to add some quality control. Where do you start?
One of the key things to remember is that humans are fallible. We all make mistakes, and we make lots of them during easy, mundane activities like data entry. You may assume that no-one will accidentally enter 245 instead of 24 into the 'Age' field, but after 6 years of doing data quality checks, I can assure you this type of mistake is not only likely; it is almost guaranteed, even in surveys of just few hundred records.
Fortunately, ODK has a lot of useful features you can use to limit these sorts of mistake. The tips below are all quick to add to a form; not too technical and will dramatically improve the quality of your data:
The last tip here is a bit more complex than the others. I'm including it because it answers a very common challenge - how to filter through a set of nested options lists, for example how to identify a specific household.
So, using these tips, what would our caffeine form look like?
Worksheet: choices
I know - this looks a lot more complex than the first version of the form. We've added 5 new columns and an entirely new worksheet. It definitely takes longer to learn and write than the first version! But I hope you can see that it's not that much extra time, and I guarantee this updated form will give you data that is much more usable.
Even if you only do some of these - for example, making questions select_one or select_multiple instead of text, or adding some basic relevantcode to improve your form flow, you will see a big difference in the quality of your collected data.
If you want an annotated version of this example form, you can download the Excel files here. We also have an XLS form template available, which includes all the common column headings I’ve mentioned here.
For more discussion about the technical and non-technical aspects of writing good data collection forms, check out these videos.
Let us know in the comments if you found these tips useful – and if you have any neat ODK tips of your own. I’d like to do another ‘ODK Tips’ post in the future, so perhaps we can feature yours!