Course Syllabus
This course is open to graduate students, as well as advanced undergraduates with sufficient mathematical maturity. Undergraduate registration requires approval---please submit your consent request online AND contact nsrebro+31120@ttic.edu via email to secure approval. If you are a Northwestern University or UIC graduate student, you may register to this course via IDEAL. after obtaining instructor approval from nsrebro+31120@ttic.edu.
TTIC Students: in order to register for the class you must register through the UChicago system. It is not sufficient to register on populi.
This course is not intended as an introductory machine learning course, and is recommended mostly for students familiar with machine learning problems and methods. Although all formulations and methods are carefully introduced and defined, very few examples of empirical behavior are explored, as the students will be assumed to have seen this previously. We will also go over basic statistical learning theory (union bounds and VC theory) very quickly in the first few lectures, as it is expected most students have seen this before. That said, mathematically inclined students that prefer learning about concepts through definitions and theorems rather then examples and experimentation, are interested in exploring the mathematical aspects of machine learning, and can understand mathematical material quickly, can choose to take this course even without having previously completed an introductory machine learning course.
Lectures: Tuesday and Thursday - 11am-12:20, TTIC 530
Tutorials: Thursday 1:30-2:30pm, TTIC 529
Instructor: Nati Srebro (nsrebro+31120@ttic.edu). Office hours: Tuesday 1:30-2pm, Thursday 12:30-1pm, TTIC 503.
TAs: Gene Li (gene@ttic.edu). Office hours: Wednesday 2-3pm, TTIC 413 (the collaboration space in the SE corner of building)
Contact: Please use ttic31120-staff@ttic.edu for queries about registration, accommodations, special arrangements, extensions, grading issues, and other individual circumstances. If you have any question about class material or homework please use the Canvas forums. We will only answer questions about the class material, questions about lectures, or questions about the homework, in the forums, and not over email, so that other students can also benefit from the response. If you need assistance with the homework, please come to the TA office hours.
Course Description
The purpose of this course is to gain a deeper understanding of machine learning by formalizing learning mathematically, studying both statistical and computational aspects of learning, and understanding how these two aspects are inseparable. The course is intended both for students interested in using machine learning methods and that would like to understand such methods better so as to use them more effectively, as well as for students interested in the mathematical aspects of learning or that intend on rigorously studying or developing learning algorithms.
We will discuss classic results and recent advances in statistical learning theory, touch on computational learning theory, and explore the relationship with stochastic optimization and online regret analysis. Our emphasis will be on concept development and on obtaining a rigorous quantitative understanding of machine learning. We will also study techniques for analyzing and proving performance guarantees for learning methods.
Past incarnation: 2020, 2018, 2016.
Prerequisites
- Mathematical maturity, as obtain, e.g., in a rigorous analysis course.
- Discrete Math (specifically combinatorics and asymptotic notation)
- Probability Theory
- Introduction to Machine Learning
- Algorithms; Basic Complexity Theory (NP-Hardness)
Familiarity with Convex Optimization, Computational Complexity and background in Statistics can be helpful, but is not required.
Specific Topics
-
The Statistical Model (Learning Based on an IID Samples):
- The PAC (Probably Approximately Correct) and Agnostic PAC models.
- Generalized Learning and relationship to Stochastic Optimization
- VC Theory: Cardinality, VC dimension, a the growth function
- Description Length Bounds, Structural Risk Minimization
- Uniform Learning and No-Free Lunch Theorems
- PAC-Bayes
- Compression Bounds
- Scale sensitive classes: Fat Shattering Dimension, Covering Numbers and Radamacher Averages
- Tight Characterization of Learning in terms of the VC and Fat Shattering Dimensions
- Relative bounds, "Optimistic" rates, and Local Rademacher Analysis
- Boosting, including three different views of AdaBoost (traditional PAC view; learning a linear predictor with coordinate descent; boosting the margin and l1 as its dual)
-
Online Learning, Online Optimization and Online Regret:
- The Perceptron Rule and Online Gradient Descent
- Experts and the Winnow Rule
- Strong convexity, stability, and online learning in Banach spaces as a unifying theory.
- Bregman Divergence and Online Mirror Descent
- Online to Batch Conversion
-
Computational Lower Bounds:
- Computational Hardness of Proper Learning
- Cryptographic Hardness of Learning
-
Machine Learning and Optimization
- Stochastic Gradient Descent as a Learning Algorithm
- Optimization and Generalization Guarantees via Stability
- Early Stopping and Algorithmic (Implicit) Bias
-
Deep Learning through the lens of Learning Theory
Course notes from 2020 version of the course can be found here.
Requirements and Grading
- Problem sets. About 5 problem sets, some of which introducing additional material not covered in class. Although not required for the grade (see details below), they are an integral part of the class and much of the learning is done through the problem sets.
- Final Exam.
- Scribing, editing a scribed lecture, or writing a short note or survey or a topic extending the lecture material.
- Peer-grading: Each student is required to participate in grading one problem set throughout the quarter. This should roughly take about 2-3 hours in total.
- Optional challenge problems on some problem sets.
Additional information and links: Grading, Requirement and Problem Sets
Recommended Texts
We will not be following a specific text, but some of the material is covered in:
- Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. (Links to an external site.) Cambridge University Press, 2014.
- Shai Shalev-Shwartz. Online Learning and Online Convex Optimization (Links to an external site.) Foundations and Trends® in Machine Learning: Vol. 4: No. 2, pp 107-194. 2012.
- O. Bousquet, S. Boucheron, and G. Lugosi. Introduction to statistical learning theory. (Links to an external site.) Advanced Lectures on Machine Learning, pp. 169-207. Springer Berlin Heidelberg, 2004.
- Michael J. Kearns and Umesh Virkumar Vazirani. An Introduction to Computational Learning Theory. (Links to an external site.) MIT Press, 1994.
- Dmitry Panchenko's course on Statistical Learning Theory (Links to an external site.). Lecture notes (Links to an external site.).
Course Summary:
Date | Details | Due |
---|---|---|