E-mail: sunxm at bu dot edu
Hi, I am currently a Ph.D. student in Computer Science at Boston University starting from 2019 Spring, supervised by Prof. Kate Saenko. In 2021 Summer, I joined Google Cloud to work with Clayton Mellina, Xiao Bian and Kihyuk Sohn. In 2019 and 2020 Summer, I am very honored to work with Rogerio Feris and Rameswar Panda at IBM Watson Health. Previously, I received my M.S. in ECE from University of Michigan, Ann Arbor and received B.ENG. in Communication Engineering from Beijing University of Posts and Telecommunications.
I am interested in the deep learning and computer vision. My recent research is focused on the multi-task learning and deep generative models.
CV / GitHub / Google Scholar
- Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Naigang Wang, Bowen Pan, Kailash Gopalakrishnan, Aude Oliva, Rogerio Feris, Kate Saenko, "All at Once Network Quantization via Collaborative Knowledge Transfer", arXiv preprint arXiv:2103.01435, 2021.
- Ping Hu, Ximeng Sun, Kate Saenko, Stan Sclaroff. "Weakly-supervised Compositional Feature Aggregation for Few-shot Recognition", arXiv preprint arXiv:1906.04833, 2019.
- Huijuan Xu, Bingyi Kang, Ximeng Sun, Jiashi Feng, Kate Saenko, Trevor Darrell. "Similarity R-C3D for Few-shot Temporal Activity Detection", arXiv preprint arXiv:1812.10000, 2018.
Conferences and Journals
- Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Aude Oliva, Rogerio Feris, Kate Saenko. "Dynamic Network Quantization for Efficient Video Inference". International Conference on Computer Vision (ICCV), 2021.
pdf / project page / code
Overview: Motivated by the effectiveness of quantization for boosting efficiency, in this paper, we propose a dynamic network quantization framework, that selects optimal precision for each frame conditioned on the input for efficient video recognition. Specifically, given a video clip, we train a very lightweight network in parallel with the recognition network, to produce a dynamic policy indicating which numerical precision to be used per frame in recognizing videos. We train both networks effectively using standard backpropagation with a loss to achieve both competitive performance and resource efficiency required for video recognition. Extensive experiments on four challenging diverse benchmark datasets demonstrate that our proposed approach provides significant savings in computation and memory usage while outperforming the existing state-of-the-art methods.
- Ximeng Sun, Rameswar Panda, Rogerio Feris, Kate Saenko. "AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning". Neural Information Processing Systems (NeurIPS), 2020.
pdf / project page / code
Overview: AdaShare is a novel and differentiable approach for efficient multi-task learning that learns the feature sharing pattern to achieve the best recognition accuracy, while restricting the memory footprint as much as possible. Our main idea is to learn the sharing pattern through a task-specific policy that selectively chooses which layers to execute for a given task in the multi-task network. In other words, we aim to obtain a single network for multi-task learning that supports separate execution paths for different tasks.
- Ximeng Sun, Huijuan Xu, Kate Saenko. "TwoStreamVAN: Improving Motion Modeling in Video Generation". IEEE Winter Conference on Applications of Computer Vision (WACV), 2020.
arXiv / demo / code / dataset
Overview: We propose TwoStreamVAN to output a realistic video given an input action label by progressively generating and fusing motion and content features at multiple scales using adaptive motion kernels. In addition, to better evaluate video generation models, we design a new synthetic human action dataset SynAction to bridge the difficulty gap between overcomplicated human action datasets and simple toy datasets.
- Ximeng Sun, Ryan Szeto, Jason Corso. "A Temporally-Aware Interpolation Network for Video Frame Inpainting". Asian Conference on Computer Vision (ACCV), 2018.
paper / demo / code
Ryan Szeto, Ximeng Sun, Kunyi Lu, Jason Corso. "A Temporally-Aware Interpolation Network for Video Frame Inpainting". IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019 Nov 6
paper / code
Overview: We propose the first deep learning solution to video frame inpainting. We devise a pipeline composed of two modules: a bidirectional video prediction module and a temporally-aware frame interpolation module. Our experiments demonstrate that our approach produces more accurate and qualitatively satisfying results than a state-of-the-art video prediction method and many strong frame inpainting baselines.
- Xingchao Peng, Zijun Huang, Ximeng Sun, Kate Saenko. "Domain Agnostic Learning with Disentangled Representations". International Conference on Machine Learning (ICML), 2019.
- Rameswar Panda, Chun-Fu Chen, Quanfu Fan, Ximeng Sun, Kate Saenko, Aude Oliva, Rogerio Feris. "AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition". International Conference on Computer Vision (ICCV), 2021.