Nataniel Ruiz

I am a first year PhD candidate at Boston University in the Image & Video Computing group advised by Prof. Stan Sclaroff. I am interested in computer vision, machine learning, statistics and representation learning.

In 2018 I was a Spring/Summer intern at the NEC-Labs Media Analytics Department, where I worked with Dr. Manmohan Chandraker and Dr. Samuel Schulter. I graduated from Georgia Tech in Fall 2017 with a M.Sc. in Computer Science specializing in Machine Learning, advised by Prof. James Rehg at the Center for Behavioral Imaging.

Previously, I obtained my B.Sc. and M.Sc. in Data Science from Ecole Polytechnique. I also worked as an intern at MIT CSAIL with Dr. Kalyan Veeramachaneni and Dr. Lalana Kagal.

nruiz9 [at]  |  Resume  |  Google Scholar  |  GitHub  |  LinkedIn


I explored several topics in computer vision including face and gesture analysis, scene understanding, first person vision, instructional video understanding and mobile computer vision. During my internship at NEC Labs I worked on topics related to self-driving car perception, visual data simulation and reinforcement learning.

Learning To Simulate
N. Ruiz, S. Schulter, M. Chandraker
In submission to the International Conference on Learning Representations (ICLR), 2019

We propose an algorithm that automatically learns parameters of a simulation engine to generate training data for a machine learning model in order to maximize performance. We present experiments on a toy example, an object counting vision task and on semantic segmentation for traffic scenes both on simulated and real evaluation data.

Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency
E. Chong, N. Ruiz, R. Wang, Y. Zhang, A. Rozga, J.M. Rehg
European Conference on Computer Vision (ECCV), 2018

We train a multi-task network to predict gaze direction and visual attention heatmaps on images.

Fine-Grained Head Pose Estimation Without Keypoints
N. Ruiz, E. Chong, J.M. Rehg
Computer Vision and Pattern Recognition Workshop (CVPRW), 2018
code  /  video demo  /  bibtex

By using a deep network trained with a binned pose classification loss and a pose regression loss on a large dataset we obtain state-of-the-art head pose estimation results which generalize to different domains.

Learning to Localize and Align Fine­-Grained Actions to Sparse Instructions
M. Hahn, N. Ruiz, J.B. Alayrac, I. Laptev, J.M. Rehg
In submission to the Winter Conference on Applications of Computer Vision (WACV), 2019

We present a framework which, given an instructional video, can localize atomic action segments and align them to the appropriate instructional step using object recognition and natural language.

Detecting Gaze Towards Eyes in Natural Social Interactions and Its Use in Child Assessment
E. Chong, K. Chanda, Z. Ye, A. Southerland, N. Ruiz, R.M. Jones, A. Rozga, J.M. Rehg
UbiComp and Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), 2017

We introduce the Pose-Implicit CNN, a novel deep learning architecture that predicts eye contact while implicitly estimating the head pose. The model is trained on a dataset comprising 22 hours of 156 play session videos from over 100 children, half of whom are diagnosed with Autism Spectrum Disorder.

Dockerface: an Easy to Install and Use Faster R-CNN Face Detector in a Docker Container
N. Ruiz, J.M. Rehg
arXiv Preprint, 2017
code  /  bibtex

In order to help the wider scientific community, we release a pre-trained deep learning face detector which is easy to download and use on images and video.


N. Ruiz
video demo  /  app apk

Real-time object detection on Android using the YOLO network with TensorFlow.