Zhenxun Zhuang

Zhenxun Zhuang

Ph.D. Student in Computer Science

Hello! I am a Ph.D. student in the Department of Computer Science at Boston University. Being a member of the Optimization and Machine Learning Lab, I am advised by Prof. Francesco Orabona. My research focuses on designing algorithms for non-convex optimization problems, with special interests on Stochastic Gradient Descent (SGD) and its many variants. Apart from proving theoretical guarantees for these algorithms, I am also very passionate about investigating their empirical performance in fields like deep learning.

  • Email:zxzhuang [at] bu dot edu
  • Address: Room 131, 111 Cummington Mall,Boston, MA, 02115, U.S.
download cv


  • 2018-Now

    Boston University, MA, U.S.

    Ph.D. in Computer Science

    Adviser: Francesco Orabona

  • 2016-2018

    Stony Brook University, NY, U.S.

    Ph.D. in Computer Science

    Adviser: Francesco Orabona

  • 2012-2016

    University of Science and Technology of China
    Hefei, Anhui, China

    B.Eng. in Electronic Information Engineering

    Thesis: Prediction & Transform Combined Intra Coding in HEVC

    Adviser: Feng Wu

Boston University Castle


Exponential Decay

Exponential Step Sizes for Non-Convex SGD.

Xiaoyu Li*, Zhenxun Zhuang*, Francesco Orabona.

arXiv Preprint, Feb, 2020

Paper        Code


No-regret Non-convex Online Meta-learning

No-regret Non-convex Online Meta-learning.

Zhenxun Zhuang, Yunlong Wang, Kezi Yu, and Songtao Lu.

Proceedings of the 45th International Conference on Acoustics, Speech, and Signal Processing, May, 2020

Online Meta-learning

Online Meta-learning on Non-convex Setting.

Zhenxun Zhuang, Kezi Yu, Songtao Lu, Lucas Glass, Yunlong Wang.

NeurIPS Workshop on Meta-Learning (MetaLearn 2019), Dec, 2019

Non-Convex Optimization

Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization

Zhenxun Zhuang, Ashok Cutkosky, and Francesco Orabona.

Proceedings of the 36th International Conference on Machine Learning, Jun, 2019

Paper        Code


SGD with Exponentially Decaying Step Sizes

PyTorch implementation of the paper: Exponential Step Sizes for Non-Convex Optimization.

Stochastic Gradient Descent (SGD) is a popular tool in training large-scale machine learning models. Its performance, however, is highly variable, depending crucially on the choice of the step sizes. Accordingly, a variety of strategies on tuning the step sizes have been proposed. Yet, most of them lack a theoretical guarantee, whereas those backed by theories often do not shine in practice. Regarding this, we introduce the exponential step sizes, a novel strategy that is simple to use and enjoys both theoretical and empirical support. In particular, we prove its almost optimal convergence rate for stochastic optimization of smooth non-convex functions. Furthermore, in the case where the PL condition holds, this strategy can automatically adapt to the level of noise without knowing it. Finally, we empirically verified on real-world datasets with deep learning architectures that, requiring only two hyperparameters to tune, it bests or matches the performance of various finely-tuned state-of-the-art strategies including Adam and cosine decay.

Check my project.

Stochastic Gradient Descent with Online Learning

PyTorch implementation of SGDOL from the paper: Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization.

Non-convex optimization has attracted lots of attention in recent years, and many algorithms have been developed to tackle this problem. Many of these algorithms are based on the Stochastic Gradient Descent (SGD) proposed by Robbins & Monro over 60 years ago. SGD is intuitive, efficient, and easy to implement. However, it requires a hand-picked parameter, the stepsize, for (fast) convergence, which is notoriously tedious and time-consuming to tune. Over the last several years, a plethora of adaptive gradient-based algorithms have emerged to ameliorate this problem. They have proved efficient in reducing the labor of tuning in practice, but many of them lack theoretic guarantees even in the convex setting. In this project, I implemented the SGDOL algorithm with self-tuned stepsizes that guarantees convergence rates that are automatically adaptive to the level of noise.

Check my project.

Interactive Data Visualization System for University Rankings

An interactive web-based data visualization system using D3.js for analyzing the Times Higher Education World University Ranking.

More or less, we all like rankings, and we all make rankings. One of the most popular subject for ranking is reputation of universities. Although people have been arguing over their credibility and validity for years, university rankings are still considered as an important factor during the application phase. However, there have already been tens of various national and international ranking systems avaiable right now, and all of them somehow disagree with each other. This is because different ranking systems emphasize different things and use different ranking mechanisms. Therefore, before referencing a rank, we should be aware of which factors are included and how they affect the results. In this project, I do not intend to develop a novel ranking system; instead, I try to analyse the Times Higher Education World University Rankings to discover some interesting patterns.

Check my project.

More to come soon...