CS 599 E1: Algorithms for Machine Learning

[Home] [Schedule]

The schedule is tentative and subjects to change (e.g. snow days).
The lecture slides and other materials will be posted on Piazza.

Date	Topic	Notes	Recommended reading
Class 1 (Sep 2)	Course overview, logistics, intro to language models		(Bengio+ 2003) paper, MLP implementation
Class 2 (Sep 4)	Supervised learning, architectures, transformers		(Vaswani+ 2017) paper, GPT implementation
Class 3 (Sep 9)	Intro to GPUs, flash attention		(Dao+ 2022) paper, GPUs
Class 4 (Sep 11)	Variants of attention for time/memory optimization		SVD (ch3), DeepSeek-V2 paper
Class 5 (Sep 16)	Backpropagation algorithm	HW1 posted	Backprop notes (ch7.4), implementation
Class 6 (Sep 18)	Backprop module formulas
Class 7 (Sep 23)	Optimization algorithms: SGD, adaptive optimizers		Adam
Class 8 (Sep 25)	Optimization algorithms: low memory and other recent algorithms		low memory, Muon
Class 9 (Sep 30)	Distributed training on GPUs: data parallelism, DeepSpeed/ZeRO, FSDP		GPUs, DeepSpeed/ZeRO, FSDP
Class 10 (Oct 2)	Distributed training on GPUs: model and activation parallelism	HW2 posted	(Narayanan+ 2021), (Korthikanti+ 2022)
Class 11 (Oct 7)	Nearest-neighbor search, locality sensitive hashing		LSH
Class 12 (Oct 9)	Kernel density estimation	Send title of paper you intend to present	KDE
Oct 14	No class, Monday schedule
Class 13 (Oct 16)	Graph-based nearest neighbor search on CPUs and GPUs		HNSW, CAGRA
Class 14 (Oct 21)	Sparse approximation of attention		Sparse FlashAttention, KDEformer, DeepSeek-v3.2, DeepSeek Native Sparse Attention
Class 15 (Oct 23)	Mixture of experts		Switch Transformer, DeepSeek-V3
Class 16 (Oct 28)	Structured state space models		S4, S4D
Class 17 (Oct 30)	Mamba and hybrid models	Project proposal due	Mamba, Mamba2, Nemotron
Class 18 (Nov 4)	Finetuning, fast inference		LoRA, HydraLORA, Speculative Decoding
Class 19 (Nov 6)	Quantization (clustering, hashing e.g. RaBitQ, KV-cache and model weight compression)		Residual Quantization, Qinco2, RaBitQ, CommVQ, AQLM
Paper presentations
Class 20 (Nov 11)	CG, OO: DeepSeek-v3
Class 21 (Nov 13)	YK: LLaMA: Open and Efficient Foundation Language Models	Project progress report due
Class 22 (Nov 18)	DL: Faster Causal Attention Over Large Sequences Through Sparse Flash Attention , LG: Hashing-Based-Estimators for Kernel Density in High Dimensions
Class 23 (Nov 20)	JH: DeepSeek-OCR, YT: Shampoo: Preconditioned Stochastic Tensor Optimization
Class 24 (Nov 25)	SB: Reducing Activation Recomputation in Large Transformer Models
Nov 27	No class, Thanksgiving
Class 25 (Dec 2)	WL: Decoupled Weight Decay Regularization, MS: Gluon: Making Muon & Scion Great Again!
Class 26 (Dec 4)	VK: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, EC: RoarGraph
Class 27 (Dec 9)	XF, ZC: Matryoshka Quantization	Project final report due