COMP 5960/6960 - 090 Programming for BioMedical Data Science (R)
University of Utah
Semester: Spring 2026
Time: Self-Paced
Location: Online
Instructor: Tingying He (tingyinghe@sci.utah.edu)
Faculty Coordinator: Jeff Phillips (jeff.m.phillips@utah.edu)
General Information
This course will provide an introduction to programming in R with topics and pace designed for biomedical students interested in data science. Prior programming experience is not required. Students will learn how to write code for handling data, focusing on dataframe representations. Using these common representations, students will learn to prepare data for analysis starting from various formats, visualize its contents, and perform basic analysis to evaluate the data veracity. This course is structured as a series of stackable short-courses, where students need to complete 4 short courses in the semester to fulfill requirements for this credit-earning course.
Short Courses
This is a composite course consisting of the four self-paced online short courses, listed below.
You should complete all four of these courses sequentially.
If you finish a course early, you are welcome to start the next course (you do not need to wait until after the previous course’s due date).
1. Introduction to R for Data Analysis
- Length: 5.5 hours of pre-recorded lectures, 3 projects
Description: This course introduces the R programming language and is designed for beginners who are new to R and coding. Specific topics covered include using RStudio and writing documents with quarto along with basic coding principles, defining variables, vectors, and data frames, pipes, data manipulation with dplyr, and data visualization with ggplot2.
Course Learning Objectives:
- Perform operations with character, numeric, logical, and Boolean type objects in R
- Use the dplyr library functions select, mutate, filter, summarize, and group_by to manipulate, and summarize tabular data
- Use the ggplot2 library to create customizable data visualizations, including histograms, bar charts, and scatterplots
2. Advanced R for Data Analysis
- Length: 4 hours of pre-recorded lectures, 4 projects
Description: You’ve learned the basics of R and the tidyverse, and now you’re ready to conduct more sophisticated data manipulation and analysis. This course is designed to elevate your R programming expertise, building on the foundations laid in the Introduction to R for Data Analysis course. You’ll learn advanced techniques and powerful tools that will transform your data workflows and analytical capabilities, such as creating custom R functions to automate your data science pipeline, reducing code redundancy with map functions, refactoring column variables, joining multiple data frames, and reshaping data. This course utilizes the R programming language, and it is assumed that learners have taken our “Introduction to R for Data Analysis” course or have equivalent experience.
Course Learning Objectives:
- Write your own custom R functions
- Reduce redundancy in your code using iteration techniques with the purrr package
- Refactor column variables
- Reshape your data frames
- Join multiple data frames together
3. Data Cleaning with R
- Length: 6 hours of pre-recorded lectures, 6 projects
Description: Real-world data can never perfectly capture the real world, and it rarely arrives in a ready-for-analysis format. Before your data is ready for analysis, it is critical that it is formatted appropriately, and you have ensured that it is representing the reality it was originally designed to capture to the greatest extent possible. The process of molding a dataset into a format that satisfies these criteria is known as “data cleaning”. While it may seem like a boring process, data cleaning is arguably the most important stage of the entire data science life cycle, since it ensures that you have a clear understanding of how real-world information is represented in your data as well as its limitations. Since every dataset is “messy” in its own way, the process of data cleaning will be unique to every dataset. This course introduces a series of steps that can be used to help you to understand your data and create a custom data cleaning procedure for any dataset, with a focus on biomedical data, such as electronic health records and health survey data. Specific topics covered include identifying and addressing missing values, handling data quality issues, such as invalid and inconsistent values, reshaping data into a “tidy” format, and creating a custom function in R that provides a reproducible and modifiable data cleaning pipeline for your project. This course utilizes the R programming language, and it is assumed that learners have taken our “Introduction to R for Data Analysis” course, or equivalent. It is recommended that students have also taken our “Advanced R for Data Analysis” course, but this is not a requirement.
Course Learning Objectives:
- Data collection procedure and data dictionaries
- Loading and pre-formatting data in R
- Identifying and handling missing values
- Identifying and handling invalid and inconsistent values
- Converting data to a tidy data format
- Creating and implementing a reproducible data cleaning pipeline
4. Statistics for Data Analysis
- Length: 7.5 hours of pre-recorded lectures, 4 projects
Description: This course is designed to introduce you to the main statistical techniques you’ll need for analyzing real-world data using R. Statistics involves learning about a population by examining data from a sample from the population. This course will focus on applying common statistical methods to biomedical data. For example, you might use the skills from this course to analyze patient information from hospital records to understand health trends in the broader population or evaluate the effects of a new drug on a small group of patients to assess its impact on all potential patients. It is recommended that students have also taken our “Advanced R for Data Analysis” and “Data Cleaning with R” courses, but these are not requirements.
Course Learning Objectives:
- Apply statistical thinking
- Perform statistical inference
- Calculate and interpret confidence intervals
- Conduct hypothesis testing
- Implement and interpret linear regression models
Grading
Grade: Each short course contains a number of projects that are at the end of each “module”.
Your grade for each short course will be based on the average of your individual project scores.
Your grade for the entire composite course will correspond to the average score you receive across the four short courses.
Late policy: If you do not complete a short-course by its due date (see above), you will lose 2% of your grade from that short-course per day, unless instructor permission is granted.
Final exam: There is no final exam for this course.