Andrea Burns

Andrea Burns is a second year PhD student at Boston University in the Image and Video Computing Group. She is advised by Professor Kate Saenko and Professor Bryan A. Plummer. Her interests lie in the intersection of computer vision and natural language processing, also known as "vision and language." Andrea is interested in improving higher level reasoning in machine learning models, and working with multimodal input is one way to incorporate realistic task difficulty. Her research has always been interdisciplinary in nature and she hopes to collaborate with creative thinkers.

Andrea is currently taking CS537: Random Computing and is auditing CS591: Deep Learning. Her previous coursework includes CS655: Computer Networks (Fall 2019), CS591: Introduction to Natural Language Processing, CS520: Programming Languages (Spring 2019), CS591: Advanced Optimization Algorithms, and CS585: Image and Video Computing (Fall 2018). She also has experience with machine learning and has taken a sequence of courses (undergraduate + graduate level) at Tulane University, where she obtained a Bachelor of Science in Mathematics and Computer Science. In her undergraduate studies she minored in French and continued her passion for dance by taking classes and starting her own organization.

She will excitedly be participating in a research internship with the Machine Intelligence group at Google during the upcoming summer. She is open to research that falls under the umbrella of machine learning, computer vision, natural language processing, speech technologies, and human-computer interfaces. Andrea is also interested in the use of vision-language tools for assistive technologies and hopes to work on these applications in industry after her PhD.

aburns4 [at]  |  CV  |  LinkedIn  |  Google Scholar


I explored several topics in computer vision and natural language processing including visually enhanced word embeddings, image captioning, visual speech recognition, sentiment analysis, and more. Below I include published works; other research projects can be found in the project section below.

Language Features Matter: Effective Language Representations for Vision-Language Tasks
Andrea Burns, Reuben Tan, Kate Saenko,Stan Sclaroff, Bryan A. Plummer
International Conference on Computer Vision (ICCV), 2019
Project Page

We rigorously analyze different word embeddings, language models, and embedding augmentation steps on five common VL tasks: image-sentence retrieval, image captioning, visual question answering, phrase grounding, and text-to-clip retrieval. Our experiments provide some striking results; an average embedding language model outperforms an LSTM on retrieval-style tasks; state-of-the-art representations such as BERT perform relatively poorly on vision-language tasks. From this comprehensive set of experiments we propose a set of best practices for incorporating the language component of VL tasks. To further elevate language features, we also show that knowledge in vision-language problems can be transferred across tasks to gain performance with multi-task training. This multi-task training is applied to a new Graph Oriented Vision-Language Embedding (GrOVLE), which we adapt from Word2Vec using WordNet and an original visual-language graph built from Visual Genome, providing a ready-to-use vision-language embedding.

Multispectral Imaging for Improved Liquid Classification in Security Sensor Systems
Andrea Burns, Waheed U. Bajwa
SPIE, 2018

Multispectral imaging can be used as a multimodal source to increase prediction accuracy of many machine learning algorithms by introducing additional spectral bands in data samples. This paper introduces a newly curated Multispectral Liquid 12-band (MeL12) dataset, consisting of 12 classes: eleven liquids and an "empty container" class. The usefulness of multispectral imaging in classification of liquids is demonstrated through the use of a support vector machine on MeL12 for classification of the 12 classes. The reported results are both encouraging and point to the need for additional work to improve liquid classification of harmless and dangerous liquids in high-risk environments, such as airports, concert halls, and political arenas, using multispectral imaging.

  • Grace Hopper Conference Award, Boston University
  • Invited participant for the Grad Cohort Workshop of the CRA-W
  • Dean's Fellowship Fall 2018, Boston University
  • The Academic Achievement Award Scholarship 2014-18, Tulane University
  • Dean’s List 2014-18, Tulane University
  • The Elsa Freiman Angrist Scholarship 2015-18, Tulane University
  • Friezo Family Found Greater New York Area Scholarship 2015-18, Tulane University


Leveraging Depth for Improved Visual Relation Detection
Andrea Burns, Tammy Qiu, Bryan A. Plummer

Work in progress. Incorporating depth into rgb-only Visual Relationship Detector predicate classifier to compare rgb and rgb-d feature performance with inferred and ground truth depth. Using word embeddings to extrapolate information to classes with few samples.

Automating Web Tasks Across Environment and Ability
Andrea Burns, Kate Saenko, Bryan A. Plummer

Work in progress. Building mobile application task dataset, to be used with environment-agnostic reinforcement learning policy for the purpose of automating web navigation tasks across different environments. A feasibility classifier and action-oriented captioning model will be built to provide tools for low-vision or blind users.

Supervised Machine Learning with Abstract Templates
Andrea Burns
Project Video

Implemented logistic regression and perceptron algorithms by creating abstract supervised learning templates in ATS.

Visual Speech Recognition Survey
Andrea Burns
Presentation Slides  

Compared feature representation performing VSR of the AVLetters dataset with Hu moments, Zernike moments, HOG descriptors, and LBP-TOP features. Investigated frame-level and video-level classification using an SVM classifier in SciKitLearn.

Multimodal Sentiment Analysis for Voice Message Systems
Andrea Burns, Chloe Chen, Mackenna Barker

Created a multimodal machine learning model to learn the urgency of a voice message after categorizing it into four emotions: anger, fear, joy, and sadness. Used Python’s SciKitLearn and SDK libraries to apply emotion classification and unsupervised intensity regression on audio and text data.