Introduction
Land acknowledgements
Campus land acknowledgment
We pause to acknowledge all local indigenous peoples, including the Yokuts and Miwuk, who inhabited this land. We embrace their continued connection to this region and thank them for allowing us to live, work, learn, and collaborate on their traditional homeland. Let us now take a moment of silence to pay respect to their elders and to all Yokuts and Miwuk people, past and present.
Instructor’s land acknowledgment
UC Merced and the City of Merced are on the traditional territory of the Yokut people. This land was stolen by Spanish, Mexican, and American settlers through acts of slavery and genocide. In addition, UC Merced is strongly associated with Ahwahne, known as Yosemite Valley. This valley was the traditional home of the Ahwahnechee people, who were the victims of some especially horrific, state-sponsored genocidal acts. For more on the history of Ahwahne, see https://tinyurl.com/y879jw8s. For more information on land acknowledgments, see https://native-land.ca.
About the instructor
Dan Hicks is a philosopher turned data scientist turned philosopher.
I use they/them pronouns and identify as nonbinary. I grew up in Placerville, about two hours north of Merced in the Sierra Foothills. One branch of my family came to California during the Gold Rush, so I identify heavily as a Californian and have some complicated feelings about the genocide. I finished my PhD in philosophy of science at Notre Dame in 2012. After that I worked in a series of research positions in academia and the federal government. During 2015-2019 I was using data science methods at least half-time. I joined the faculty at UC Merced in Fall 2019.
- Email: dhicks4@ucmerced.edu
- Student hours: By appointment: https://doodle.com/mm/danhicks/office-hours
- Website: https://dhicks.github.io/
What this course isn’t, and is
Is Not:
- a statistics course (in the way you think)
- a general introduction to software engineering
- a basic introduction to R
- an introduction to machine learning or AI
Is:
- an introduction to data science
- about exploratory data analysis, data management, reproducibility — and a little philosophy of science
- habituation to some good software engineering practices that are especially valuable for data science work
Learning outcomes
By the end of the course, students will be able to
Apply concepts from software engineering and philosophy of science to methodological decisions in data science (CIS PLO 2 and 4; PSY PLO 2 and 6), such as
- debugging techniques and functional programming
- exploratory data analysis and the data-phenomenon-theory distinction
- reproducibility vs. replicability
- data justice
Use exploratory data analysis techniques and tools to identify potential data errors and potential phenomena for further analysis (CIS PLO 2 and 4; PSY PLO 2 and 6)
Manage data, analysis, and outputs for reproducibility using practices such as data management, clear directory structure, self-documenting code, version control, and code review (CIS PLO 2, 3, 4; PSY PLO 2, 4, 6)
Identify ethical responsibilities to data providers and subjects and take these responsibilities into account during data collection, analysis, and communication (CIS PLO 2 and 3; PSY PLO 4 and 6)
Identify and justify key methodological decisions, analysis practices, exploratory data analysis findings, and ethical responsibilities to data subjects in written and oral media (CIS PLO 3; PSY PLO 4)
Prerequisites
This course assumes basic competence with introductory R.
- “Introductory R”
-
Lessons 1-6 of the Carpentries “R for Social Scientists” curriculum
- Installing R and packages
- Working in the R Studio IDE
- Common data types
- Reading and writing CSV files
- Tidyverse R:
mutate()
,filter()
,select()
; plotting withggplot2
- “Basic competence”
- Given time and a reference (cheatsheet, Stack Exchange, partner, mentor) you can figure out how to solve a problem
Course materials
- Course website: https://data-science-methods.github.io/
- All readings are linked on the schedule
- Lecture notes takes you to a pretty version of my slides and notes for class
- Cheatsheets might be useful, depending on how much R background you have
Requirements
- Class time
- Mix of lecture/live coding, seminar-style discussion, and work time
- Labs
- 6, done in pairs/small groups and submitted via GitHub
- Semester-long project
- Practicing ideas from the course on a data set of your choice