Readable code is reliable code

Reading

  • Bryan (2018)
  • “What to Look for in a Code Review” (n.d.)
  • Lamb (2018)
  • Wickham (n.d.)

Replicability, reproducibility, reliability

replicability
Qualitatively similar results when we repeat the experiment
(gather new data; may or may not use the same code)
reproducibility
Quantitatively identical outputs when we run the same code on the same data
(same data, same code)
reliability
The code performs the analysis that we think it’s performing
(compare construct validity)

Checking reliability

reliability
The code performs the analysis that we think it’s performing

Can reliability be checked computationally?

Writing code is writing

  • Multiple audiences
    • Collaborators
    • (Some) reviewers and readers of the paper
    • Peers who want to analyze and extend your methods
    • Yourself in six months

Any code of your own that you haven’t looked at for six or more months might as well have been written by someone else. (“Eagleson’s Law”)

  • Your code is readable to the extent that people can easily assess reliability:
    • predict,
    • diagnose, and
    • extend your code

Code style

iNéwritteNélanguagEáconventionSéfoRîpunctuatioNøcapitalizatioNîaiDéco
mprehensioNébYéindicatinGéstructurE


  • iNéwritteNélanguagEáconventionSéfoRîpunctuatioNøcapitalizatioNîaiDéco
    mprehensioNébYéindicatinGéstructurE
    • this is what it’s like to read poorly-styled code
    • conventions only work if they’re shared conventions
  • Style guides provide shared conventions for readable code
    • In-line spacing makes it easier to pick out distinguish functions, operators, and variables in a line
    • Returns distinguish arguments in a function call
    • Indentation corresponds to structure of complex expressions
    • Common conventions for naming, assignment reduce cognitive load
  • Tidyverse style guide: https://style.tidyverse.org/ (Wickham n.d.)

Highlights from the Tidyverse style guide

  • Place all package() calls at the top of the script
  • Limit your code to 80 characters per line
  • Use at least 4 spaces for indenting multiline expression
    • Control-I in RStudio will do automagic indenting
  • In multiline function calls, 1 argument = 1 line
long_function_name <- function(a = "a long argument",
                               b = "another argument",
                               c = "another long argument") {
    # As usual code is indented by four spaces.
}

Spaces: Let your code b r e a t h e

  • Always put spaces after commas, and never before (like English)
  • No space between a function name and the parentheses (like math)
  • Spaces on both sides of infix operators (==, +, -, <-, =)
  • Pipes %>% |> should have a space before and be at the end of the line

Code blocks

When you put a block of code in curly braces {}:

  • { should be the last character on a line
  • } should be the first character on the line
if (y == 0) {
  if (x > 0) {
    log(x)
  } else {
    message("x is negative or zero")
  }
} else {
  y^x
}

Boolean variables vs. control flow

  • Functions that return vectors:
    • &, |, ==, ifelse(), dplyr::if_else()
  • Functions that return a single value:
    • &&, ||, identical
  • if (x) a else b only looks at the first (hopefully single) value of x

Code review is peer feedback on your writing

  • May be done formally as part of journal review
    • But this is super rare
  • Informally, with collaborators
  • Informally, as a stage of writing

Goals of code review

  • Make the code more reliable
  • Integrate your code into the collaborative code base
  • Learning and growth for both author and reviewer

Examples of things that will promote/frustrate these goals?

(Lamb 2018)

What to look for in code review

  • Good things: What does the code do well?
  • Design: The code communicates the author’s intentions.
  • Functionality: The code does what the author intended.
  • Complexity: The code isn’t more complex than it needs to be.
  • Over-engineering: The developer isn’t implementing things they might need in the future but don’t know they need now.
  • Tests: Code has appropriate, well-designed unit tests [or uses assertions].
  • Naming: The developer used clear names for everything.
  • Comments: Comments are clear and useful, and mostly explain why instead of what.
  • Style: The code follows the group’s / a standard style guide.
  • Documentation: Does the code explain what functions or long pipes do and how they’re used?

(adapted from “What to Look for in a Code Review” n.d.)

  • Setup: The documentation/README explains where to get the data and any special software installation.
  • Dependencies: All of the required packages are listed at the top of the script.
  • Reproducibility: The code runs, and produces the output identified elsewhere (comments, paper text)

Labs 5-6

  • Lab 5: Code review of Zhou et al. (2021)
    • Work in pairs
    • Write a rubric first
    • Don’t worry about reproducibility
    • Start working on this Thursday, even if you’re not finished with lab 4
  • Lab 6: Reproducibility check of Zhou et al. (2021)
    • Specifically, the values reported in Table 1

Project: Code Review

https://data-science-methods.github.io/project.html#code-review-and-reproducibility-check

  • I’ll assign you to review someone else’s EDA code
  • Fork and clone the code
  • Submit your feedback using a PR

References