git and github


why version control?

  • makes coordinating files across computers easy đź’»
  • keeps a clean history of your code evolution đź“ś
    • no need for messy suffixes (v1, v2, …, v19380, vfinal) đź—‚
  • gives you the chance to ask yourself “do i really want to make these changes?” 🤔
  • streamlines review of external code changes đź‘Ą

maybe you already know other ways to share code?

galaxy brain meme with a progression from emailing code, im-ing code, to using git

why git?

  • it’s popular! in the stackoverflow developer survey, 93.9% of developers using version control software said they use git.
stackoverflow developer survey shows that 93.9% of developers who use version control use git

the result is that git comes with a large resource base and is compatible with all the major operating systems.

there are git downloads for all major operating systems

a bit about git

  • git is an open-source version control system.
  • git stores code and its history in a repository.
  • each revision to the code is added to the repository through a commit process.
  • git allows you to have branches of your code that keeps development separate from the main codebase until it is complete.
    • the main version of your code is often on the “main” branch (what used to be called the “master” branch).
  • git allows you to push or pull code from remote servers.

github

  • github is a website and online service with free and paid tiers that allows you to:
    • host git repositories
    • publicize your profile and repositories
    • track issues
    • document your projects with wikis
    • host static websites
    • coordinate teams of developers
    • do project management
    • automate project workflows

github logo github logo

github’s mascot and logo is the octocat, a creature with five octopus like arms. the octocat character was designed by the same graphic designer, simon oxley, who designed the bird logo that twitter uses.

getting started

personal reflections

  • what features of git and github are you looking to leverage the most?
  • look through examples of successful repositories

installation

a quick note about the terminal

in RStudio you can open up a terminal in a window next to the R Console:

the terminal 
tab is located next to the Console in RStudio

local setup

  • after installation, you’ll need to configure git:
git config --global user.name 'she-ra, princess of power'
git config --global user.email 'adora@eternia.com'
git config --global --list
  • use the same name and email address you are going to use with github.
  • i would highly recommend using a long-term personal email rather than your institutional email so it’s easy to carry your portfolio of work on github on with you after you graduate.

setup a github account

some advice from happy git with r:

  • incorporate your actual name
  • reuse your username from other contexts
  • pick an appropriate username you will be comfortable revealing to future bosses
  • shorter is better
    • be as unique as possible in as few characters as possible
  • make it timeless
    • don’t highlight your current university, employer, or place of residence since these can all change
  • all lowercase is recommended

the form for creating an account on github

set up an ssh key

we recommend setting up ssh (secure shell) key based authentication with github.

this allows your computer to be automatically authenticated when you communicate with github.

follow the instructions here:
https://happygitwithr.com/ssh-keys.html

set up a local repository in rstudio

nativate to file → new project

select “create a git repository” which informs rstudio you want to use git.

new project wizard window in rstudio shows how you can specify the directory name, create the project as a subdirectory in a folder, and a checkbox for create a git repository

with the new project wizard window in rstudio, you can specify the directory dame for your project, create your project as a subdirectory of another folder, and use the provided checkbox to indicate that you’d like to initialize the project as a git repository.

git panel in rstudio

rstudio adds a git tab in your environment/history panel.

this panel is a point-and-click interface to:

  • review your changes
  • stage changes
  • write commits
  • push and pull commits
  • view the commit history
  • navigate branches

the git panel in rstudio shows changes you've made with buttons to see the changes and commit them

the git panel in rstudio shows changes you’ve made and lets you see the changes you’ve made, commit them, and push them to github in a user-friendly interface.

setting up a remote repository

navigate to github.com → login → new repository
and fill out the form.

create an initial commit and push

# let your local repository know about the remote repository
git remote add origin git@github.com:ctesta01/examplerepository.git 

git branch -m main # use "main" as the default branch
git add .gitignore # add a file to the staging area
git commit -m "initial commit" # name your commit 
git push -u origin main # push your commit to the remote repository

the output from running the above code;  it shows the commit being pushed to github

these are the above commands being run in the terminal, along with the output produced. you can see that git will report on how many objects and how many bytes are being uploaded.

congrats!
you’re using github!

dancing characters from charlie brown

readmes

  • a readme serves as an introduction to and documentation for your repository.

  • like any documentation, feel free to start small and document as you develop!

  • you can learn more about reamdes from github here: docs.github.com/…

basic workflow overview

the basic workflow for making updates to a git repository is done in three steps:

  1. making changes to your files
  2. adding them to the staging area
  3. commit these changes with a explanatory message

a diagram showing a workflow of successive commits

this figure from happy git with r shows examples of commits made in a sequence. each commit is accompanied by an id, a message, and the differences between two commits are referred to as a “diff”.

key commands

git status

git status is a basic command that displays the current state of the working directory.

it’s a good idea to always run git status before changing your code because there may be something you want to commit or address first.

the rstudio git panel displays most of what is displayed in the output of git status.

example output of git status showing the new readme.md and .rproj file

git status from the terminal

example git panel showing the new readme.md and .rproj file

the git panel in rstudio showing new, untracked files

git add

  • adds changes to the staging area (also called the index)

terminal version:
example of how to use the commandline to add readme.md to the staging area

git add readme.md adds any changes to the readme.md (including its creation) to the staging area. git status shows us what changes are staged to be included in the next commit.

rstudio panel version:

adding files can be done in the git panel in rstudio just by checking the box next to them

one can add files to the git staging area in the rstudio git panel just by checking the checkbox next to each file in the staged column.

git commit

  • records changes to the repository from the staging area

terminal version:
a screenshot of calling `git commit -m 'add new readme'` on the command line

git commit -m 'add new readme' creates a commit message (or basically a name) for the set of changes that were on the staging area and bundles them up together as a commit.

rstudio panel version:

first you click the commit button

with the changes you want to make staged, click the commit button. you’ll have a chance then to view what changes you’ve made. when you’re sure you want to commit, you can write a commit message, click commit, and then push.

rstudio panel version:

a screenshot of using the git panel in rstudio to commit

optimal commit messages đź“®

  • capitalize the first word and do not end in punctuation.
  • use imperative mood in the subject line.
    • example: “add fix for data reading error”
  • specify the type of commit. it is recommended and beneficial to have a consistent set of words to describe your changes.
    • example: bugfix, update, refactor, bump, and so on.
  • the first line should ideally be no longer than 50 characters.
    • be direct! try to eliminate filler words and phrases.
      • examples: though, maybe, i think, kind of.

optimal commit messages đź’Ś

to develop thoughtful commits, consider the following:

  • why have i made these changes?
  • what effect have my changes made?
  • why was the change needed?
  • what are the changes in reference to?

keep a changelog

a changelog is a file that contains a curated, informative history of your project’s updates.

  • a changelog allows people to easily see key development and changes in your project.
  • read more about changelogs at https://keepachangelog.com

kermit rushing to update the changelog on his project before a major deadline.

git push

  • sends local, committed changes to remote repository

terminal version:
Calling git push on the terminal sends the commits from our local computer to the remote server

Calling git push on the terminal sends the commits from our local computer to the remote server.

RStudio Panel version:

First you have to click Commit and this window will pop open letting you know your commit has been made.

After you click Commit, a window will open showing you the git command that RStudio ran to create your commit. If there are no errors, you can close the window and then click Push.

After you close that window, you'll see that RStudio lets you know you have a commit ready to be pushed

git diff

  • git diff shows how files differ between their current state and a different version.

terminal version:

git diff on the terminal will print (in highlighted colors) what has been added or removed from the referenced file since the last commit.

RStudio Panel version:

in the rstudio git panel you can click on the diff button and it will pop open a window showing the changes that have been made to the file since the last commit.

git pull

  • download and integrate code from one repository to another (i.e., to get up to date with a remote server)

terminal version:

calling git pull on the terminal will download any commits from the remote server and apply them to your codebase. you may get hints from git that you can be more explicit about your git configuration as to how you’d like to apply commits if there are conflicts.

RStudio Panel version:

in the rstudio git panel you can click on the pull button and it will pop open a window showing the changes that have been made to the file as a result of pulling any commits from the remote server.

workflows for collaboration

Hodu Tip! with two Dogs

you can use the git clone command to copy a repository from a remote server (like GitHub) onto your computer.

git clone git@github.com:ctesta01/ExampleRepository.git or
git clone https://github.com/ctesta01/ExampleRepository.git

workflows for collaboration

  • small changes – can be reasonable to work on same branch.
  • large changes – create a branch and pull request.
  • outside developer – fork repository and pull request.
  • remember – when working with collaborators, communication is key!

diagram of a remote repository with code going back and forth between three developers, each of whom push and pull from the remote repository

In the centralized workflow depicted, all developers synchronize their work through a main shared repository (like one on GitHub).

git branches

  • Branches let you continue to work on your code without affecting the main line of development
  • Creating branches for bug fixes and feature development prevents unstable code from disrupting your project or workflow.

terminal version:

git checkout -b new_branch creates a new branch named new_branch that diverges from the most recent commit on whichever branch one was on, in this case the main branch.

RStudio Panel version:

In the RStudio Git panel you can click on the New Branch button (which sometimes appears as just an icon to the left of the current branch name) and it will pop open a window allowing you to specify what you’d like to name your branch.

git forks

  • forking crates a copy of a repository which allows development that will not affect the main project.
  • forking is particularly useful when you would like to develop on another person’s project.
  • forking can also be useful for versions of a project that are being developed in parallel without intention of future merge.

A screenshot with an arrow indicating the fork button on GitHub for the example repository

pull requests

  • A pull request alerts the repository owner that changes have been made.
  • Gives a chance to review and test the code before adding it into the repository.
  • Optimizing language in pull requests is also important!

the Compare and Create Pull Request button that appears on GitHub

GitHub automatically will create a Compare & Pull Request button when it sees that you’ve pushed changes onto a new branch. If you click that, it will open up a form where you can document why you’re submitting a pull request.

the form for creating a pull request, including a title and body

key takeaways

a helpful cheatsheet

key takeaways

  • git will help you version control your code, similar to track changes in MS Word, and collaborate with others.
  • GitHub is an online platform that will let you freely host your git repositories, share them with others, and is one way to host a professional portfolio of code-based projects.
  • A typical individual workflow will look like:
    1. create your R Project and a repository on GitHub
    2. connect them
    3. make changes to the project
    4. add changes to the staging area (with git add)
    5. commit them (git commit)
    6. push those (git push), and continue making commits until the project is finished.

Watch me Diff, Watch me Rebase from the Happy Git and GitHub for the useR book

Some helpful resources:

some hard-won advice

  • Try out new git procedures in a dummy repository.
  • If you “break” something (or everything), keep calm, there is almost always a fix.

An XKCD comic: This is git. It tracks collaborative work on projects through a beautiful distributed graph theory tree model. Cool, how do we use it? No idea. Just memorize these shell commands and type them to sync up. If you get errors, save your work elsewhere, delete the project, and download a fresh copy.

time for a live demo 🤞