Learning R: University vs. the Workplace
It has now been six months since my first day as a Statistical Intern here at Stats4SD. In that time I have been lucky to work on a variety of topics, methods and software that has enabled a diverse workload. However, there has been one program which has stood above and beyond any other when it has come to usage. This is R. We as statisticians are blessed with a range of software to suit our needs and capabilities. Others in the office may remain committed to the likes of SPSS or STATA, while others choose to basically do everything in R.
R had been introduced to me at university through a module known as “Statistical Computing”. Personally, as this blog shall explore further, I do not believe the teaching methods of this course were what was needed as a beginner. Therefore, it should hardly be surprising that come the time of my MSc dissertation, all my analysis (multilevel modelling) was actually conducted in STATA. I barely understood how to do a chi-square test in R at this point, let alone knew how to use R to analyse attitudes from a European wide survey in over 20 countries. In all honesty, R was my least favourite statistical software when I came to the office. But six months on? I use it for everything.
1A graph made in STATA from my MSc Dissertation
Application makes all the difference
Of all the issues I had with how I’d been taught R at university, its biggest issue was the lack of application. As my degrees were neither strictly mathematical or statistical, (they were social sciences degrees), the rest of my teaching with statistical methods and other statistical software not including R had been entirely applied. They were real world problems utilised as examples to teach effective problem solving and the answering of research questions using statistical techniques. And this real-world problem solving is one of the core features of our work here at Stats4SD.
In terms of learning R at university however, this real-world problem solving was seemingly not a priority. This made it very difficult to actually focus on the software, as the demonstrations the course would use were simply not relatable or were of a level of statistical knowledge beyond my understanding. In the workplace, I have been able to learn about classification trees and hierarchical clustering through real data problems coming out of our work with the McKnight Foundation. I had not been familiar with either method before, let alone how to do this in R, but the real-world application allowed me to learn effectively. The methods were appropriate to my skill level in R at the time. In contrast, at university, basic for-while loops were taught using an example of maximising the likelihood of a truncated Poisson. This lack of applied knowledge meant I had to try to understand what this even intended before I could do anything in R.
So what does this mean in terms of teaching R?
I think it’s clear that when teaching R, it is important to know the level of knowledge of the people you are teaching. If you are teaching a room of all theoretical statisticians then the course I undertook may have been appropriate, but more often than not you are likely to deal with a group with a wide variety of skills and knowledge. Using real-world problem solving is not only simpler to understand for a wider range of people, but in reality this is exactly what we need to use R for most of the time. This does not even need to be the kind of data problems we receive here at Stats4SD. The data “problems” Sam and Nicolas used for their R course at Reading University teaching postgraduate students back in November may not have been “real problems” as such, but they were using real and understandable data. I may not ever need to know exactly how many pigs per capita there are in Ireland, but I at least understand what this means over a truncated Poisson - meaning that the learner can simply focus on getting to grips with R. To quote a recent paper on applied statistical analysis in R “Students may be bored if statistical courses are conducted with standard or passive approaches, such as answering questions from textbooks or tutorials.”1
Keeping the material up-to-date
Learning R in the workplace has also demonstrated a much greater sense of recency. In other words: the tools, methods and resources we use are much more up-to-date and incorporates a larger scope of R’s full functionality. Coming into this internship, I had no knowledge of the existence of rmarkdown, shiny, or the tidyverse, but now these are three of the main tools within R that I use on a regular basis. In fact, the tidyverse of packages and rmarkdown were two of the very first things I was introduced to here. The issue with R in the academic setting as I learnt it was that its topics were quite simple in terms of its functionality. The pure wide array of things R can do was not on display, and without this it was difficult to see exactly what advantages R presented over other statistical software.
I have seen multiple tips and tricks for teaching R across the internet; one of the common pieces of advice is to focus on teaching the tidyverse, due it’s accessibility to those who are without any programming or extensive theoretical statistics knowledge 1,2. As the tidyverse’s own website states, it “makes data science faster, easier and more fun”3. I believe it accomplishes this, and creates a much more logical flow to data analysis with easier to understand functions and the ability to pipe functions together. With the base functions, there is generally too much of a need for also getting to grips with unfamiliar coding structures and an understanding of different types of data structures such as arrays, matrices, vectors etc. For those with little coding experience, it can be difficult to actively learn these topics.
All in all, I am extremely grateful for the lessons in R I have learnt over the past six months. I think the two teaching styles I have encountered come down to a clash of two distinct approaches: passive vs active learning. The academic setting was an odd blend of the two approaches as though it included questions and exercises, all the information was simply fed to us in the form of a html site, with no lectures, small amounts of information and little application. However, the workplace by definition is a pure active learning experience with applied and flexible problem solving. I believe when it comes to learning/teaching a topic such as R, active learning methods are much more appropriate.
Me at my desk in the Stats4SD office
1 Gunawan, A., Cheong, M.L.F. and Poh, J., 2018, December. An Essential Applied Statistical Analysis Course using RStudio with Project-Based Learning for Data Science. In 2018 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE) (pp. 581-588). IEEE.
2 Grolemund, G. 2017. “How to teach R: Common mistakes”. Available at : https://rviews.rstudio.com/2017/02/22/how-to-teach-r-common-mistakes/
3 Tidyverse website: https://www.tidyverse.org/
Author: Alex Thomson
Alex joined the team as a Statistics Intern in October 2019 following the completion of his Undergraduate and Master’s degrees in Population and Geography, and Social Research Methods at the University of Southampton. With a background in demographic and social science research, especially family demography - Alex hopes to extend both his skillset and knowledge base of issues affecting the developing world and building upon an undergraduate trip to Ghana. With experience in STATA and SPSS, Alex hopes to develop his skills in R, survey design and management while at Stats4SD.
1 comments for "Learning R: University vs. the Workplace":
Add a comment:
We run an anonymous commenting system. If you are not logged in, we do not collect any information on who you are when you leave a comment. This means we manually confirm comments before they appear on the site.
If you want to have a comment you submitted deleted, please contact us, giving the date of the comment and name of the article.