Learning R: University vs. the Workplace
It has now been six months since my
first day as a Statistical Intern here at Stats4SD. In that time I have been
lucky to work on a variety of topics, methods and software that has enabled a
diverse workload. However, there has been one program which has stood above and
beyond any other when it has come to usage. This is R. We as statisticians are
blessed with a range of software to suit our needs and capabilities. Others in
the office may remain committed to the likes of SPSS or STATA, while others
choose to basically do everything in R.
R had been introduced to me at
university through a module known as “Statistical Computing”. Personally, as
this blog shall explore further, I do not believe the teaching methods of this
course were what was needed as a beginner. Therefore, it should hardly be
surprising that come the time of my MSc dissertation, all my analysis
(multilevel modelling) was actually conducted in STATA. I barely understood how
to do a chi-square test in R at this point, let alone knew how to use R to
analyse attitudes from a European wide survey in over 20 countries. In all
honesty, R was my least favourite statistical software when I came to the
office. But six months on? I use it for everything.
graph made in STATA from my MSc Dissertation
Application makes all the
Of all the
issues I had with how I’d been taught R at university, its biggest issue was
the lack of application. As my degrees were neither strictly mathematical or
statistical, (they were social sciences degrees), the rest of my teaching with
statistical methods and other statistical software not including R had been
entirely applied. They were real world problems utilised as examples to teach
effective problem solving and the answering of research questions using
statistical techniques. And this real-world problem solving is one of the core
features of our work here at Stats4SD.
In terms of learning R at university however, this real-world problem solving was seemingly not a priority. This made it very difficult to actually focus on the software, as the demonstrations the course would use were simply not relatable or were of a level of statistical knowledge beyond my understanding. In the workplace, I have been able to learn about classification trees and hierarchical clustering through real data problems coming out of our work with the McKnight Foundation. I had not been familiar with either method before, let alone how to do this in R, but the real-world application allowed me to learn effectively. The methods were appropriate to my skill level in R at the time. In contrast, at university, basic for-while loops were taught using an example of maximising the likelihood of a truncated Poisson. This lack of applied knowledge meant I had to try to understand what this even intended before I could do anything in R.
So what does this mean in terms of teaching R?
it’s clear that when teaching R, it is important to know the level of knowledge
of the people you are teaching. If you are teaching a room of all theoretical
statisticians then the course I undertook may have been appropriate, but more
often than not you are likely to deal with a group with a wide variety of
skills and knowledge. Using real-world problem solving is not only simpler to
understand for a wider range of people, but in reality this is exactly what we
need to use R for most of the time. This does not even need to be the kind of
data problems we receive here at Stats4SD. The data “problems” Sam and Nicolas
used for their R course at Reading University teaching postgraduate students back
in November may not have been “real problems” as such, but they were using real
and understandable data. I may not ever need to know exactly how many pigs per
capita there are in Ireland, but I at least understand what this means over a
truncated Poisson - meaning that the learner can simply focus on getting to
grips with R. To quote a recent paper on applied statistical analysis in R “Students
may be bored if statistical courses are conducted with standard or passive
approaches, such as answering questions from textbooks or tutorials.”1
Keeping the material
in the workplace has also demonstrated a much greater sense of recency. In
other words: the tools, methods and resources we use are much more up-to-date
and incorporates a larger scope of R’s full functionality. Coming into this
internship, I had no knowledge of the existence of rmarkdown, shiny, or the
tidyverse, but now these are three of the main tools within R that I use on a
regular basis. In fact, the tidyverse of packages and rmarkdown were two of the
very first things I was introduced to here. The issue with R in the academic
setting as I learnt it was that its topics were quite simple in terms of its
functionality. The pure wide array of things R can do was not on display, and without
this it was difficult to see exactly what advantages R presented over other
I have seen
multiple tips and tricks for teaching R across the internet; one of the common
pieces of advice is to focus on teaching the tidyverse, due it’s accessibility
to those who are without any programming or extensive theoretical statistics
knowledge 1,2. As the
tidyverse’s own website states, it “makes data science faster, easier and
more fun”3. I believe it accomplishes this, and creates a much
more logical flow to data analysis with easier to understand functions and the
ability to pipe functions together. With the base functions, there is generally
too much of a need for also getting to grips with unfamiliar coding structures
and an understanding of different types of data structures such as arrays,
matrices, vectors etc. For those with little coding experience, it can be
difficult to actively learn these topics.
All in all, I am extremely grateful for the lessons in R I have learnt over the past six months. I think the two teaching styles I have encountered come down to a clash of two distinct approaches: passive vs active learning. The academic setting was an odd blend of the two approaches as though it included questions and exercises, all the information was simply fed to us in the form of a html site, with no lectures, small amounts of information and little application. However, the workplace by definition is a pure active learning experience with applied and flexible problem solving. I believe when it comes to learning/teaching a topic such as R, active learning methods are much more appropriate.
Me at my desk in the Stats4SD office
1 Gunawan, A., Cheong, M.L.F. and Poh, J., 2018, December. An
Essential Applied Statistical Analysis Course using RStudio with Project-Based
Learning for Data Science. In 2018 IEEE International Conference on Teaching,
Assessment, and Learning for Engineering (TALE) (pp. 581-588). IEEE.
2 Grolemund, G. 2017. “How to teach R: Common
mistakes”. Available at : https://rviews.rstudio.com/2017/02/22/how-to-teach-r-common-mistakes/
3 Tidyverse website: https://www.tidyverse.org/
Author: Alex Thomson
Alex joined the team as a Statistics Intern in October 2019 following the completion of his Undergraduate and Master’s degrees in Population and Geography, and Social Research Methods at the University of Southampton. With a background in demographic and social science research, especially family demography - Alex hopes to extend both his skillset and knowledge base of issues affecting the developing world and building upon an undergraduate trip to Ghana. With experience in STATA and SPSS, Alex hopes to develop his skills in R, survey design and management while at Stats4SD.
1 comments for "Learning R: University vs. the Workplace":
Add a comment:
We run an anonymous commenting system. If you are not logged in, we do not collect any information on who you are when you leave a comment. This means we manually confirm comments before they appear on the site.
If you want to have a comment you submitted deleted, please contact us, giving the date of the comment and name of the article.