If there is one thing statisticians can claim expertise in, it should be analysis of data. For data coming from designed experiments, the situation is very clear – or so we and many other scientists are taught! The right way to analyse the data is determined by the design – and that is why we worry about whether those replicates were really replicates – and whether the layout was actually a split-plot, or not. As a student I remember being very impressed with, and at the same time mystified by, the equivalence of an analysis based on linear models (the way everyone is taught), and that based entirely on the randomisation of the design. It seemed to imply that there is indeed only one way to analyse data from a given experiment. Now that we work with farmers doing large-N trials, the situation looks much less clear. Yes, we can still do the standard analysis –an analysis of variance, some F-tests, tables of mean treatment effects and so on, but it typically only reveals a little of the story to be told from the data. What’s going on here…?
Think of a trial in which several hundred farmers each compare a set of treatments on their farm, a type of design which is commonly used. Typically, farmers are organised into groups of some sort. Data collection is by farmers or agents attached to each group. There is a process of aggregation that eventually results in a data-set reaching the researchers. The analysis is done on at least three levels:
- Researchers can do the statistical analysis, starting with the standard analysis described above.
- Farmers make their own interpretation of observations on their farms. These can often lead to conclusions that are very different from those a researcher would reach, such as the most productive option not being liked because it is complex to manage.
- At a group level, there can be a process of participatory analysis during which members compare findings. The most valuable results of this are often insights into variation between farms – e.g. that the new treatment only does well on good soil or when planted early.
These insights and explanations should lead to revisions in the way researchers look at the data. At the same time, the statistical analysis of the whole data-set can reveal patterns that farmers find useful. For example, knowing that early planting had a similar benefit wherever it was tried, helps increase farmers’ confidence in their own results.
If the information channels are working well, then the initial tentative conclusions from each layer of analysis will be updated. Maybe a few more iterations will be needed to make the most of the data and experience of the experiment before next steps are negotiated by all those concerned.
All this is very much richer and more complex to manage than the ‘analysis of designed experiments’ that we teach trainee statisticians, and is a good example of what needs to change in statistical practice if we are to keep being relevant to current applied research.
What are your views on teaching trainee statisticians? Do you feel that it’s relevant and uses enough applied research? Perhaps you’re a trainee statistician yourself? We’re keen to hear your views, so do please jot down a comment or two in the box below 🙂