Investigating the R in (R)evolution of Open Science

Computer code appearing on a monitor

Dr. Nathaniel Helwig is an associate professor in both the Quantitative and Psychometric Methods program in the Department of Psychology and in the School of Statistics. Helwig is an expert on R, developing numerous new packages and training the next generation of data scientists on how to use this powerful tool. He recently sat down with Psychology communications staff to talk about why R is so important to the Open Science movement and the field of psychology. 

What exactly is R?

R is an open-source language and programming environment that is freely available for all major operating systems (e.g., Mac, Windows, Linux) to perform statistical computing and data visualization. Users can both access and edit the source code, check it for accuracy, implement it themselves, and/or customize it for their own purposes.

According to Helwig, “R is the language of statisticians.” Helwig was first exposed to it as a first-year graduate student in Quantitative Psychology in 2007 at the University of Illinois at Urbana-Champaign. At that time, seeing R used by a psychologist was rare; SPSS was the preferred statistical software. By the time he completed his doctoral degree, however, Helwig had developed an 83-page report on how to use R for analyzing categorical data, which he still refers to today. Fast forward to 2023–R has gained significant traction in the field of psychology. Helwig described that using R in psychological research and teaching is quickly becoming the new norm; “Each time I teach a new class or start a new project, it is quite likely that I will need to develop some new R code.” And he suspects that its popularity will only continue to grow in the coming years.

Caucasian male standing in front of a brick wall and tree.
Dr. Nathaniel Helwig

Several factors have contributed to R’s enduring popularity, including:

  • it is open-source and free to use;
  • it provides a wide range of statistical techniques and packages that cater to the needs of statisticians, data
  •  scientists, and researchers;
  • it now has a rich ecosystem of packages to extend its functionality, including the easily accessible repository of packages available via the Comprehensive R Archive Network (CRAN) (Helwig publishes the R codes he develops in CRAN);
  • it has powerful data visualization capabilities;
  • it has a strong and active user community;
  • it can easily be integrated with other programming languages;
  • it is compatible with big data technologies;
  • it is now commonly taught in academic settings; and, 
  • it promotes good practices for reproducibility in research.

That last point–reproducibility in research–is an especially critical factor in Open Science.

R and the Open Science movement

Because of R's open-source nature, its robust community of users, and its support for reproducibility, it has quickly become a key tool in the Open Science movement. R facilitates researchers’ and data scientists’ abilities to disseminate state-of-the-art statistical methods, share their own analysis work, and reproduce the work of others. With regard to R’s specific impact in psychology, Helwig described:

Psychologists have historically been rather secretive with their code and data. For over 100 years, psychologists have gotten away with publishing papers without any supporting evidence. Now that sharing code and data is the new norm, psychologists are required to share their evidence for the results reported in their publications. This [requirement] is good for the field because it allows others to critically examine the evidence behind an idea. Also, anytime a statistical analysis is conducted, many choices and assumptions are made, and those details do not always make it into the manuscript write-up. By sharing the R code used to conduct the analyses, there is no longer any ambiguity about what was done to obtain the results.

Psychologists who use R, therefore, are well-positioned to contribute to and benefit from the principles of openness, transparency, and collaboration that characterize Open Science.

Interestingly, as the field of psychology continues to evolve towards an Open Science model, so do our perceptions of what we consider to be meritable academic products. Helwig discussed that now that reproducibility and openness are primary concerns in the field, it is becoming more common for open-source software and datasets to be considered academic products in and of themselves; “[Nonetheless], it seems that we are still a few years away from the time when a good R package is viewed as important as a good peer-reviewed publication.” Many people in the field are starting to realize, though, that developing and maintaining good software often requires far more expertise and work than simply writing up a paper or two.

How can I learn more about R and how it is being used in psychology?

Helwig suggested starting with CRAN, which is where he prefers to publish his own R code. The CRAN Task View: Psychometric Models and Methods, maintained by Patrick Mair at Harvard, highlights several R packages that are useful for analyzing psychological data, including, but not limited to:

  • psych: Procedures for Psychological, Psychometric, and Personality Research. (William Revelle; Northwestern University);
  • lavaan: Latent Variable Analysis. (Rosseel et al.; Ghent University);
  • mirt: Multidimensional Item Response Theory. (Chalmers et al., York University); and,
  • lme4: Linear Mixed-Effects Models using 'Eigen' and S4 (Bates et al., Wisconsin)

In addition, Helwig has many R-related materials posted on his teaching website and software website. For those new to R, four resources in particular may be of particular interest:

To conclude, R stands not only as a powerful statistical programming language, but also as a driving force behind the principles of transparency, collaboration, and reproducibility at the heart of the Open Science movement. Its contributions to a more accessible and collaborative scientific landscape are now well-established, thus positioning it to play a pivotal role in continuing to shape the future of knowledge in the scientific community. 
 

Read more about Open Science in Psychology...

Share on: