Git for version control

Some motivating examples

  • While working on your analysis code, you accidentally erase the first 35 lines of the script. You only discover this three days later, when you restart R and try to run the script from the top.

  • The code you wrote last week is suddenly giving you different results. A bunch of variables are missing, and the standard errors are huge. When you’re talking with your collaborator, he says that he was cleaning the data a few days ago and everything seemed fine. When you go to check, you discover that he’s replaced your 827 survey responses with 6 group summaries.


Piled Higher and Deeper comic on file names. Source: https://phdcomics.com/comics.php?f=1531

You’re working on a paper with two coauthors. You prepare the final draft to send for submission: paper final.docx. But one of your coauthors discovers a typo. Now it’s paper final fixed typo.docx. Another realizes six references are missing. paper final fixed typo refs.docx. That’s getting confusing so you change it to paper 2 Aug 2021.docx. Once it comes back from review you need to make revisions. Now you have paper 30 Jan 2022.docx and, after your collaborators make their changes, paper 12 February 2022 DJH.docx and paper 12 February 20222 final.docx.

Version control

  • Basic idea: Tools for tracking and reversing changes to code over time
  • Useful for identifying and reversing breaking changes
  • Implementations upload to cloud, track who contributes code, control who can suggest vs. actually change code
  • Good for collaboration, publishing code

Git

  • One of many version control systems
  • Very popular in part thanks to GitHub, which provides free hosting git repositories

Gitting started

Initial commit

Location of the new local repository button in GitKraken
  • Install GitKraken and go through the configuration steps
    • Optional: Preferences -> UI Customization -> Theme -> GitKraken Light
  • Create a new local repository
    • New Tab -> Repositories -> Create a Repository -> Laptop symbol

Creating a new local repository in learning-git with GitKraken
  • Name the new repo learning-git
  • Outside of GitKraken, create a folder for materials for this class, and then a subfolder for livecoding exercises. Use this livecoding folder for “Initialize in.”

Open your OS’s file browser to the new repo
  • GitKraken creates a new folder, learning-git, with the file README.md
    • md is Markdown, a plain-text format that can be read by any text editor
  • The context (right-click) menu for a file has shortcuts to open the file or show it in your OS’s file browser
  • For now, open the file in GitKraken’s built-in text editor
    • Click on the file to bring up the viewer
    • Then click “Edit this file”
  • Add some text and save (command/control-S)
  • GitKraken automatically opens up the file stage pane

Tracking changes

Making your initial commit
  • Tracking changes to a file involves two steps: Staging and committing
  • Stage: You can stage files one at a time, or use the “Stage all changes” button
    • This tells git that we want to store these changes to the file in its archive
  • Commit: Type a message in the comment field and click Commit
    • This tells git to go ahead and do the archiving process
  • The commit is now displayed in the history panel

Make a few more changes to the file. Practice adding text, removing text, and committing them. Note how the changes accumulate in the history panel, and that you can view each change using Diff View.

Time travel

  • We can checkout previous commits to work with old versions of our files
  • In the example, suppose I made a commit with a mistake (my code stopped working or whatever)
  • In the history panel, right-click on a previous commit and select Checkout this commit

Checking out an old commit to travel through time

Trying (and failing) to change the past. My current HEAD commit will disappear as soon as I check out main.
  • We can make changes to the file in the same way as usual,
  • But when we go to stage, GitKraken warns us that we’re in an undetached head state
  • To see what this means, try making a change to the file, adding and committing it, then checking out the commit with the main or master tag

The garden of forking branches

  • To actually change the past, we’ll use a branch
    • Branches allow git to track multiple distinct “timelines” for files
    • For example, most major software projects will have separate “dev” (development) and “release” branches
    • Individual branches will also be created for work on specific areas of the project
    • This allows each area of active work to be isolated from work happening in other areas

Merging fixing-mistake into main
  • After checking out the previous commit, click on Branch in the toolbar
    • Name your new branch fixing-mistake (no spaces!)
  • Start to work on fixing the mistake in the file, then add and commit as usual
  • Now checkout main. Notice:
    • Your commits on fixing-mistake don’t disappear
    • The state of your file changes to the main version
    • The History panel shows the split between the two branches
  • After we’ve finished fixing the mistake, we want to merge these changes back into main
    • Make sure you’re on main
    • Right-click on fixing-mistake and select “Merge fixing-mistake into main”

  • GitKraken will show the commit pane, with a warning about Merge Conflicts
    • This just means that the files you’re combining have conflicting histories, and git needs you to manually sort out what to keep and what to throw away
Important

git is now in a special conflict-resolution state.

Until you resolve the conflicts and finish the merge, a lot of standard git functionality either won’t work at all or will cause weird problems.

If git starts giving you a bunch of weird errors, check to see if you’re in the middle of a merge and need to resolve conflicts.


GitKraken’s merge conflict resolution tool
  • GitKraken will pull up a conflict resolution tool
  • Choose which versions of conflicted lines to keep, and/or make edits directly in the bottom pane
  • The output file will be automatically staged
  • Hit “Commit and Merge” to complete the merge
  • Main now points to the merge commit

Working with GitHub remotes

The remote origin lives on GitHub. Source: https://happygitwithr.com/common-remote-setups.html#ours-you

A remote is a copy of a repository that lives on a server somewhere else

Working with your own repos

  • Look for the Remote section of left panel. Hover over the 0/0 and click to start creating a new remote
  • Select GitHub, your account, and then click “Create remote and push local refs”
  • GitKraken will take a moment to push (upload) the repo, then display a notification
  • Click to view the repo on GitHub
  • The URL should be https://github.com/[username]/learning-git

  • GitHub also provides a simple text editor
  • Make a change or two, then look for the green “Commit changes…” button
  • GitKraken may check in GitHub, and pick up the new commits on the remote
    • If not: Confirm the changes show up on GitHub. Then, in GitKraken, bring up the command palette (control/command + P), type “fetch,” and select “Fetch All.”
  • But the remote commits aren’t automatically loaded locally
  • Click “Pull” to download them to your local copy

Lab: Working with someone else’s repos

  • GitHub lets you download someone else’s repo (clone), and modify it locally, but not upload directly.
  • You can suggest a change to someone else’s code by submitting a pull request, which first requires forking the repository.

Forking copies a repository to your GitHub account. Then you clone the copy to your local machine. You can push to your remote copy as usual. You can suggest changes to the original using a pull request. Source: https://happygitwithr.com/fork-and-clone.html

The fork button is near the upper-right corner of a GitHub repository page. I wasn’t able to find a keyboard shortcut for this. :-(
  • Start with the repo for this week’s lab: https://github.com/data-science-methods/lab-1-git
  • Fork: Look for the fork button in the upper-right
  • After forking, notice you’re now in your copy of the lab: https://github.com/[username]/lab-1-git.

Cloning the lab repo. The screenshot shows two copies of the lab: the original course version, and the fork to my personal account.
  • Clone: After creating the fork, you need to download a copy to your machine.
  • In GitKraken, open a New Tab, then click “Clone a repo”
    • Select GitHub.com
    • Recommended: Change “Where to clone to” to a specific folder for all the labs for this class
    • Type “lab-1-git” into the text box. (Or just use the dropdown.)
    • If GitKraken tells you that it can’t clone a private repo, this probably means you haven’t finished getting access to the GitHub Student Developer Pack. Back out of the cloning process, finish with GitHub, and then try cloning again.
  • After opening the repo, click the “View All Files” checkbox or open the folder in your OS file browser
  • Open lab.html in your web browser and lab.Rmd in RStudio