<- function(a = "a long argument",
long_function_name b = "another argument",
c = "another long argument") {
# As usual code is indented by four spaces.
}
Readable code is reliable code
Reading
Replicability, reproducibility, reliability
- replicability
-
Qualitatively similar results when we repeat the experiment
(gather new data; may or may not use the same code) - reproducibility
-
Quantitatively identical outputs when we run the same code on the same data
(same data, same code) - reliability
-
The code performs the analysis that we think it’s performing
(compare construct validity)
Checking reliability
- reliability
- The code performs the analysis that we think it’s performing
Can reliability be checked computationally?
Writing code is writing
- Multiple audiences
- Collaborators
- (Some) reviewers and readers of the paper
- Peers who want to analyze and extend your methods
- Yourself in six months
Any code of your own that you haven’t looked at for six or more months might as well have been written by someone else. (“Eagleson’s Law”)
- Your code is readable to the extent that people can easily assess reliability:
- predict,
- diagnose, and
- extend your code
Code style
iNéwritteNélanguagEáconventionSéfoRîpunctuatioNøcapitalizatioNîaiDéco
mprehensioNébYéindicatinGéstructurE
- iNéwritteNélanguagEáconventionSéfoRîpunctuatioNøcapitalizatioNîaiDéco
mprehensioNébYéindicatinGéstructurE- this is what it’s like to read poorly-styled code
- conventions only work if they’re shared conventions
- Style guides provide shared conventions for readable code
- In-line spacing makes it easier to pick out distinguish functions, operators, and variables in a line
- Returns distinguish arguments in a function call
- Indentation corresponds to structure of complex expressions
- Common conventions for naming, assignment reduce cognitive load
- Tidyverse style guide: https://style.tidyverse.org/ (Wickham n.d.)
Highlights from the Tidyverse style guide
- Place all
package()
calls at the top of the script - Limit your code to 80 characters per line
- Use at least 4 spaces for indenting multiline expression
- Control-I in RStudio will do automagic indenting
- In multiline function calls, 1 argument = 1 line
Spaces: Let your code b r e a t h e
- Always put spaces after commas, and never before (like English)
- No space between a function name and the parentheses (like math)
- Spaces on both sides of infix operators (
==
,+
,-
,<-
,=
) - Pipes
%>%
|>
should have a space before and be at the end of the line
Code blocks
When you put a block of code in curly braces {}
:
{
should be the last character on a line}
should be the first character on the line
if (y == 0) {
if (x > 0) {
log(x)
else {
} message("x is negative or zero")
}else {
} ^x
y }
Boolean variables vs. control flow
- Functions that return vectors:
&
,|
,==
,ifelse()
,dplyr::if_else()
- Functions that return a single value:
&&
,||
,identical
if (x) a else b
only looks at the first (hopefully single) value ofx
Code review is peer feedback on your writing
- May be done formally as part of journal review
- But this is super rare
- Informally, with collaborators
- Informally, as a stage of writing
Goals of code review
- Make the code more reliable
- Integrate your code into the collaborative code base
- Learning and growth for both author and reviewer
Examples of things that will promote/frustrate these goals?
What to look for in code review
- Good things: What does the code do well?
- Design: The code communicates the author’s intentions.
- Functionality: The code does what the author intended.
- Complexity: The code isn’t more complex than it needs to be.
- Over-engineering: The developer isn’t implementing things they might need in the future but don’t know they need now.
- Tests: Code has appropriate, well-designed unit tests [or uses assertions].
- Naming: The developer used clear names for everything.
- Comments: Comments are clear and useful, and mostly explain why instead of what.
- Style: The code follows the group’s / a standard style guide.
- Documentation: Does the code explain what functions or long pipes do and how they’re used?
(adapted from “What to Look for in a Code Review” n.d.)
- Setup: The documentation/README explains where to get the data and any special software installation.
- Dependencies: All of the required packages are listed at the top of the script.
- Reproducibility: The code runs, and produces the output identified elsewhere (comments, paper text)
Labs 5-6
Project: Code Review
https://data-science-methods.github.io/project.html#code-review-and-reproducibility-check
- I’ll assign you to review someone else’s EDA code
- Fork and clone the code
- Submit your feedback using a PR