R Projects

morals

“For every minute spent organizing, an hour is earned.” — (maybe) Benjamin Franklin.

“A good system shortens the road to the goal.” — Anonymous

“Nobody has ever said ‘I regret having such organized R code’” — Me

in brief…

An R project enables your work to be bundled in a portable, self-contained folder. Within the project, all the relevant scripts, data files, figures/outputs, and history are stored in sub-folders and importantly - the working directory is the project’s root folder.

—The Epidemiologist R Handbook

why wouldn’t you?

a twitter thread:  @JennyBryan what made you reluctant? 
were you worried it would be complicated? curious because I see the reluctance, then joy/relief, 
often in STAT545. @kierisi I couldn't find any good explanation on what Projects did and why it 
was better than manually saving all my R scripts in the same folder

from Jenny Bryan’s talk https://speakerdeck.com/jennybc/zen-and-the-art-of-workflow-maintenance

two pictures: one of a room full of cans, and one with cans
neatly hanging on the wall

from Jenny Bryan’s talk https://speakerdeck.com/jennybc/zen-and-the-art-of-workflow-maintenance

R projects allow you …

  • to have one folder per code project;
  • to have one git repository per project;
  • and to run one R session that’s tied to that project

creating an R project

you should see New Project in the File Menu in RStudio

File -> New Project

creating an R project

that will launch the New Project Wizard which will guide you through the project creation process

Create Project: 
In a new directory? In an existing directory? 
Checkout a version controlled repository?

creating an R project

that will launch the New Project Wizard which will guide you through the project creation process

what kind of project will it be? an
R project? an R Package? a Shiny application? a Quarto project? a Quarto website? etc.

creating an R project

that will launch the New Project Wizard which will guide you through the project creation process

Set a name for the project, create it
as a subdirectory of another folder on your computer, and create a git repository to go 
with it. Optionally open the project in a new window.

opening R projects

there are a few ways you can open an R project

open the .Rproj file from Finder / Explorer

opening R projects

there are a few ways you can open an R project

open the .Rproj file from Spotlight or the Start Menu

opening R projects

there are a few ways you can open an R project

open the project from RStudio dropdown

opening R projects

open the project from RStudio dropdown in a new 
window using the small button on the right

expect to iterate

i want to stress that on any major project, you will likely need to iterate substantially on the code between the first version and the final version.

prototyping

therefore, sometimes it can be useful to prototype your work just to get a sense of what needs to be done, and then later on move onto a more robust workflow.

suggested project layout

folders including: code, docs, outputs, data along with a README and CHANGELOG

working in a single file

working in a single file

use the outline in RStudio

the outline in RStudio

working with multiple files

9 files called 00_dependencies.R, 01_load_data.R, 02_clean_data.R, 03_merge_data.R, 04_exploratory_data_visualization.R, 05_model.R, 06_table_model_results.R, 07_supplementary_analysis.R, 99_run_everything.R

versus:
just a file called script.R

use the goto file/function

(on Windows/Mac/Linux): pressing Ctrl+. [period] will open the Goto Function/File popup:

window for go to file function

use the here package

Burned laptop

original photo from https://commons.wikimedia.org/wiki/File:Burned_laptop_secumem_11.jpg

https://www.tidyverse.org/blog/2017/12/workflow-vs-script/

If the first line of your R script is

setwd("C:\Users\jenny\path\that\only\I\have")

I will come into your office and SET YOUR COMPUTER ON FIRE 🔥.

If the first line of your R script is

rm(list = ls())

I will come into your office and SET YOUR COMPUTER ON FIRE 🔥.

use the here package

the payoff of the here package is that it will automatically detect the root of your package and construct paths based off it.

meaning, you can write code like the following without having to worrying about where your working directory is set to or using absolute paths that are specific to your computer:

library(here)
... 

# read data from the data/ directory
readr::read_csv(here("data/specific_data_file.csv"))

... 

# save a plot to the outputs/ directory 
ggsave(filename = here("outputs/my_best_figure_yet.png"))

extras

renv

hex sticker for renv

renv let’s you control exactly what versions of packages are used within an R project and ensure that your collaborators also use exactly the same package versions.

the workflow is pretty simple:

install.packages("renv")
renv::init() # to initialize a new package lockfile
# work in the project as normal
renv::snapshot() # to save the packages used
# keep working ... 
renv::snapshot() # take an updated snapshot

# then when your collaborators want to load 
# up the package versions as you've locked them
# into the project: 
renv::restore()

this prevents one of the very hardest bugs to debug: silent differences in functionality between versions of the same R package.

learn more about renv: https://rstudio.github.io/renv/

Hodu with a Bubble

renv helps you to implement a principle called encapsulation:

meaning that your code is bundled together with its dependencies in order to ensure a consistent, usable interface.

targets

hex sticker for targets

the targets package allows users to write scripts that explicitly declare which code produces what outputs depending on what inputs and the result is that targets is able to run your workflow from start to finish updating any processes where the underlying data or code have changed.

this avoids the problem of having to ask yourself “do i need to update this intermediate output? i can’t remember if i changed the data or code that underlies it.”

a targets diagram

learn more about targets: https://books.ropensci.org/targets/

rocker

a modified docker sticker with R written above it what if i need the absolute most possible reproducibility because my code has to execute exactly the same no matter the person running it?

then the rocker project may be the solution for you: the rocker project provides docker (containerized) images to support R programming.

(this is quite a bit beyond our course.)

key takeaways

  • use R Projects to help organize your code
  • use sections within your code to help organize .R files
  • use the names of your .R files to help you stay organized
  • expect to iterate!
  • use the here package to make working with paths easy
  • use renv and targets to make reproducing your analysis easy

references