Data Science Team (CSCI 4802/5802) Spring 2026

New: the course is now variable (1 or 3) credit! See below.

When: Tuesdays and Thursdays 2-3:15pm
(1-credit students need only attend Tuesdays, but are welcome to join Thursdays as well)
Where: ECES 112
Professor: Rafael Frongillo
Course managers: Livia Betti, Max Conway

Team website: codata.colorado.edu
Communication: Slack
Team signups: Team sheet
Assignments/grades: Canvas

Course Description

In this unique choose-your-own-adventure course, students gain hands-on experience applying data science techniques and machine learning algorithms to real-world problems. (Theoretical projects are also in scope.) Students work in small teams on projects of their choosing, which could include outreach efforts or entering competitions. Project teams collaborate with the course staff to choose their learning outcomes and goals, submit periodic progress reports, and give several short presentations. The course also features a speaker series to meet researchers and data scientists from academia and industry, and a student-led tutorial series.

Prerequisites: linear algebra or permission of instructor.

Motivation

Data science is one of the most powerful modern tools to understand and impact the world. It is also one of the fastest-growing sectors of our economy, and there is a great demand for data scientists with practical experience applying statistical techniques and machine learning algorithms to real data. Several courses in the CS curriculum develop core techniques, in the areas of machine learning, statistical modeling, network science, numerical analysis, and data science more broadly. While these courses often include a hands-on project, no course specifically focuses on putting this myriad of tools to work on real data and developing intuition for when to apply certain techniques over others. Moreover, so many topics that frequently arise in practice, such as data imputation and data imbalance, or even the suite of techniques used in a particular domain, are not covered in standard courses. This course fills these gaps, by allowing students to dive deeply into any topic, and develop their own learning objectives and outcomes. Projects result in valuable hands-on experience or a more theoretical learning journey that may not align with any of our existing courses.

Topics

Up to you! Students will collaborate with course staff to choose their learning outcomes and goals. Fun recent examples include automatic music generation, analyzing usage data from Boulder BCycle, predicting movie box office earnings, and predicting the performance of materials in engineering tasks. On Thursdays, we will also have short tutorials on topics relevant to the current slate of projects or data science more broadly. A non-exhaustive list of topics is as follows.

Basic Concepts: classification and regression, prediction vs causation, regularization and overfitting.
Algorithms: linear regression, logistic regression, support vector machines, boosting, decision trees and forests, neural networks, gradient and stochastic gradient descent.
Practical Techniques: ensemble methods and aggregation, tradeoffs in regularization, and parameter and hyperparameter tuning, data imputation techniques, cross-validation.
Software and Tools: tutorials on several modern data science software packages.
Context and Industry Practice: via weekly presentations from practicing data scientists, students will learn about techniques actually used in industry and academia, and which algorithms work well for which problems.

Meet a Data Scientist (MADS) Series

Every Tuesday meeting (Spring 2026: 2pm), we begin by hearing from a practicing data scientist or data science researcher. Talks are usually about 20 minutes, and give students a glimpse of the breadth of data science in practice, beyond classroom projects and competitions, and the diversity in the types of data, techniques, and applications. Researchers often share techniques they are developing, or how they use existing techniques to advance scientific knowledge; practicing data scientists often share their experiences about what works and when it works. Talks are typically aimed toward beginner and intermediate students, with some technical details for the more advanced students.

Assessment

The general requirement for the course is to do something cool and tell us about it. Typically you will have the option of joining some larger team-wide effort, like a competition or an outreach project, or to form your own small group of 1-3 students to work on a project of your choosing, subject to approval. The specifics of team-wide efforts like competitions or outreach will change from semester to semester. To document your progress, you will submit three written reports detailing and reflecting on what you have done and/or plan to do. Since every student starts from a different place, we will grade you based on how well you document your learning and growth, rather than a specific target.

The class will be a mix of four types of enrollees, based on how many credits (1 or 3) and which section (4802 or 5802). Expectations differ among each of these groups. Many of these expectations are detailed in the specific assignments; at a high level, of course, more is expected of 3-credit than 1-credit, and of 5802 than 4802. Final project write-ups must both a theory and ethics component, to underscore why (and when) their chosen techniques work, and what the potential harms are and how to mitigate them. These components will be more substantial for 3-credit students.

Students taking the course for 1 credit are expected to attend Tuesday meetings. Students taking the course for 2 credits [note: not currently an option, but if interested please contact the instructor] are expected to attend Tuesday meetings and the tutorials (first 25 minutes) on Thursday meetings. Finally, 3-credit students are expected to attend both full meetings (where the remainder of the course times are for individual project development).

For more information about the team, please visit the team website.