Changing the Data Flow Diagram
I mentioned that we want to update our Data Flow diagram. This leads to a whole set of questions, but perhaps the most fundamental one is that of shape. One of the main drawbacks to this model is that it’s linear. You start off thinking about who owns the data, then go through planning, collection, analysis and so on. This sort of works in the context of a ‘typical’ project, but it’s a huge simplification of the relationships between these stages, and there’s a danger that people go away from the diagram thinking they don’t need to worry about data storage until after they’ve written the report., or think about the analysis when “planning data collection and data entry”.
It also misses a key step, namely the influencing of future work. Many projects are adopting “developmental” or iterative approaches to planning, so that final step “Dissemination & feedback” should really link back to the beginning of the line.
There - straight away it looks a bit better - more like an ongoing process rather than a rush to the end. So, what’s next? I mentioned last time how much the changing technology has changed the practice of “data management”. What does that mean for our flow diagram?
A couple of years ago, the Statistical Services Centre made a set of videos about how new technology was changing the process of conducting research. One of our videos discusses how the entire project timetable changes when swapping paper forms for digital forms. It’s generally agreed that the work gets “front-loaded” - with much more time required for form creation and careful testing, but much less time needed after collection to get the data properly organised. (In fact, if done well, it’s possible to get an ‘analysis-ready’ dataset the moment the last record is collected! Hard to achieve, but definitely possible.)
This means the focus of our Data Flow needs to change accordingly. I suspect we’ll need to add explicit “testing and piloting” steps, reduce (or remove) the focus on data entry, and probably move the “data storage” to a stage much closer to the start. If you’re collecting data digitally, you need to have a storage solution in place from the start2, otherwise you’ll have enumerators wondering around with valuable data on their phones that they can’t do anything with!
So, what are your thoughts? Have you encountered “Data Flow” in this context before? What do you think about the diagram, and how well does it map to your experiences of projects involving data?
Author: Dave Mills
Dave developed an IT & data infrastructure that allows us to close information loops and deliver tailored information to diverse users, through data collecting mobile apps. He is also responsible for the development of our eLearning portfolio and Open Educational Resources.