Introduction

Land acknowledgements

Campus land acknowledgment

We pause to acknowledge all local indigenous peoples, including the Yokuts and Miwuk, who inhabited this land. We embrace their continued connection to this region and thank them for allowing us to live, work, learn, and collaborate on their traditional homeland. Let us now take a moment of silence to pay respect to their elders and to all Yokuts and Miwuk people, past and present.

Instructor’s land acknowledgment

UC Merced and the City of Merced are on the traditional territory of the Yokut people. This land was stolen by Spanish, Mexican, and American settlers through acts of slavery and genocide. In addition, UC Merced is strongly associated with Ahwahne, known as Yosemite Valley. This valley was the traditional home of the Ahwahnechee people, who were the victims of some especially horrific, state-sponsored genocidal acts. For more on the history of Ahwahne, see https://tinyurl.com/y879jw8s. For more information on land acknowledgments, see https://native-land.ca.

About the instructor

Dan Hicks is a philosopher turned data scientist turned philosopher.

I use they/them pronouns and identify as nonbinary. I grew up in Placerville, about two hours north of Merced in the Sierra Foothills. One branch of my family came to California during the Gold Rush, so I identify heavily as a Californian and have some complicated feelings about the genocide. I finished my PhD in philosophy of science at Notre Dame in 2012. After that I worked in a series of research positions in academia and the federal government. During 2015-2019 I was using data science methods at least half-time. I joined the faculty at UC Merced in Fall 2019.

What this course isn’t, and is

Is Not:

  • a statistics course (in the way you think)
  • a general introduction to software engineering
  • a basic introduction to R
  • an introduction to machine learning or AI

Is:

  • an introduction to data science
  • about exploratory data analysis, data management, reproducibility — and a little philosophy of science
  • habituation to some good software engineering practices that are especially valuable for data science work

Learning outcomes

By the end of the course, students will be able to

  1. Apply concepts from software engineering and philosophy of science to methodological decisions in data science (CIS PLO 2 and 4; PSY PLO 2 and 6), such as

    • debugging techniques and functional programming
    • exploratory data analysis and the data-phenomenon-theory distinction
    • reproducibility vs. replicability
    • data justice
  2. Use exploratory data analysis techniques and tools to identify potential data errors and potential phenomena for further analysis (CIS PLO 2 and 4; PSY PLO 2 and 6)

  3. Manage data, analysis, and outputs for reproducibility using practices such as data management, clear directory structure, self-documenting code, version control, and code review (CIS PLO 2, 3, 4; PSY PLO 2, 4, 6)

  4. Identify ethical responsibilities to data providers and subjects and take these responsibilities into account during data collection, analysis, and communication (CIS PLO 2 and 3; PSY PLO 4 and 6)

  5. Identify and justify key methodological decisions, analysis practices, exploratory data analysis findings, and ethical responsibilities to data subjects in written and oral media (CIS PLO 3; PSY PLO 4)

Prerequisites

This course assumes basic competence with introductory R.

“Introductory R”

Lessons 1-6 of the Carpentries “R for Social Scientists” curriculum

  • Installing R and packages
  • Working in the R Studio IDE
  • Common data types
  • Reading and writing CSV files
  • Tidyverse R: mutate(), filter(), select(); plotting with ggplot2
“Basic competence”
Given time and a reference (cheatsheet, Stack Exchange, partner, mentor) you can figure out how to solve a problem

Course materials

Requirements

Class time
Mix of lecture/live coding, seminar-style discussion, and work time
Labs
6, done in pairs/small groups and submitted via GitHub
Semester-long project
Practicing ideas from the course on a data set of your choice

References