Instructor: Prof. Alina Ene (CDS 1027)
Office hours: Monday 2:30pm - 3:30pm, Tuesday/Thursday 2:15pm - 3:15pm, in CDS 1027
Class Time: Tuesday/Thursday 3:30pm - 4:45pm
Class Room: MCS B33
Discussion Forum: Piazza
Assignment Submission: Gradescope
Course policies: You must adhere to the CAS Academic Conduct Code. You must acknowledge all your collaborators and sources on every submitted work, including homework solutions, project reports, and code. You may not use LLMs to solve homework assignments or write any text or code that you submit for the project.
Acknowledgements: The course is co-developed with Huy Nguyen who is teaching a parallel course at NEU. The course materials build on materials from courses such as Stanford CS336,Stanford CS224N. The specific references/credits are in the lecture slides (posted on Piazza).
This seminar course focuses on the design of efficient algorithms for building modern machine learning models at scale. We will aim to cover topics such as adaptive gradient descent algorithms, dimensionality reduction techniques, algorithms for nearest neighbor search and retrieval augmented generation, and algorithms for training and fine-tuning foundational models. The course will emphasize recent algorithmic developments for state of the art deep learning models and highlight directions for future research.
Students are expected to do some homework (25%), present a research paper (25%), and complete a research project (50%).
Students are expected to present a foundational paper or a recent research paper related to topics discussed in the course. For longer papers it is possible to have a team of two working on the same paper. For most of the topics below, the instructor will present the background to set the stage, which will be followed by students' presentations on more recent developments. Here are some suggestions to start. More papers will be added depending on the students' interests and serendipitous discovery.
Low rank methods for adaptive optimizers e.g. Lean and Mean Adaptive Optimization via Subset-Norm and Subspace-Momentum with Convergence Guarantees; APOLLO: SGD-like Memory, AdamW-level Performance
Matryoshka Quantization (a recent work on quantization with a nested structure, which is a recent popular design for maintaining multiple models simultaneously at different scales)
You will prepare your homework solutions and project reports using LaTeX, and submit PDFs to Gradescope. LaTeX is a scientific document preparation system; most CS technical publications are prepared using this tool. Great editors exist on most platforms, such as TexShop for Mac and TeXstudio for several platforms. An alternative to setting up LaTex on your machine is to use Overleaf. The not so short introduction to Latex is a good reference to get you started.
In this course we will be using Python for some of the homework exercises. We recommend that you download a Python distribution such as Anaconda.