Andrea Burns is a third year PhD student at Boston University in the Image and Video Computing Group. She is advised by Prof. Kate Saenko and
Prof. Bryan A. Plummer. Her primary research topics include representation learning and the intersection of computer vision and natural language processing (vision and language). Andrea is interested in improving high level reasoning
in machine learning models, as robust methods and common sense reasoning are necessary to many applications. Her research is interdisciplinary in nature and she hopes to study reliable vision-language methods for assistive technology.
Andrea's graduate coursework includes:
CS537: Random Computing
CS591: Deep Learning
CS655: Computer Networks
CS591: Introduction to Natural Language Processing
CS520: Programming Languages
CS591: Advanced Optimization Algorithms
CS585: Image and Video Computing
She just concluded a research internship with the Robust Perception team at Google Cambridge
Fall 2020, and will be continuing part time as a Student Researcher. She is open to research that falls under the umbrella of machine learning, computer vision,
natural language processing, speech technologies, and human-computer interaction.
I have explored several topics in computer vision and natural language processing including
visually enhanced word embeddings, multilingual language representations, image captioning,
visual speech recognition, sentiment analysis, and more.
Below I include published works; other research projects can be found in the project section below.
Current multilingual vision-language models either require a large number of additional parameters
for each supported language, or suffer performance degradation as languages are added. In this paper,
we propose a Scalable Multilingual Aligned Language Representation (SMALR) that represents many
languages with few model parameters without sacrificing downstream task performance. SMALR learns
a fixed size language-agnostic representation for most words in a multilingual vocabulary,
keeping language-specific features for few. We use a novel masked cross-language modeling loss to align
features with context from other languages. Additionally, we propose a cross-lingual consistency module
that ensures predictions made for a query and its machine translation are comparable. The effectiveness
of SMALR is demonstrated with ten diverse languages, over twice the number supported in vision-language
tasks to date. We evaluate on multilingual image-sentence retrieval and outperform prior work by 3-4%
with less than 1/5th the training parameters compared to other word embedding methods.
We rigorously analyze different word embeddings, language models, and embedding augmentation steps
on five common VL tasks: image-sentence retrieval, image captioning, visual question answering, phrase
grounding, and text-to-clip retrieval. Our experiments provide some striking results; an average
embedding language model outperforms an LSTM on retrieval-style tasks; state-of-the-art representations
such as BERT perform relatively poorly on vision-language tasks. From this comprehensive set of experiments
we propose a set of best practices for incorporating the language component of VL tasks. To further
elevate language features, we also show that knowledge in vision-language problems can be transferred
across tasks to gain performance with multi-task training. This multi-task training is applied to a
new Graph Oriented Vision-Language Embedding (GrOVLE), which we adapt from Word2Vec using WordNet
and an original visual-language graph built from Visual Genome, providing a ready-to-use vision-language embedding.
Multispectral imaging can be used as a multimodal source to increase prediction accuracy of many machine learning algorithms by introducing additional spectral bands in data samples. This paper introduces a newly curated Multispectral Liquid 12-band (MeL12) dataset, consisting of 12 classes: eleven liquids and an "empty container" class. The usefulness of multispectral imaging in classification of liquids is demonstrated through the use of a support vector machine on MeL12 for classification of the 12 classes. The reported results are both encouraging and point to the need for additional work to improve liquid classification of harmless and dangerous liquids in high-risk environments, such as airports, concert halls, and political arenas, using multispectral imaging.
Awards
Grace Hopper Conference Award, Boston University
Invited participant for the Grad Cohort Workshop of the CRA-W
Dean's Fellowship Fall 2018, Boston University
The Academic Achievement Award Scholarship 2014-18, Tulane University
Dean’s List 2014-18, Tulane University
The Elsa Freiman Angrist Scholarship 2015-18, Tulane University
Friezo Family Found Greater New York Area Scholarship 2015-18, Tulane University
Automating Web Tasks Across Environment and Ability Andrea Burns, Kate Saenko, Bryan A. Plummer
Work in progress. Building mobile application task dataset, to be used with environment-agnostic reinforcement learning policy for the purpose of automating web
navigation tasks across different environments. A feasibility classifier
and action-oriented captioning model will be built to provide tools for
low-vision or blind users.
Supervised Machine Learning with Abstract Templates Andrea Burns Project Video
Implemented
logistic regression and perceptron algorithms by creating abstract
supervised learning templates in ATS.
Compared feature representation
performing VSR of the AVLetters dataset with Hu moments, Zernike
moments, HOG descriptors, and LBP-TOP features. Investigated
frame-level and video-level classification using an SVM classifier in
SciKitLearn.
Multimodal Sentiment Analysis for Voice Message Systems
Andrea Burns, Chloe Chen, Mackenna Barker
Poster
Created a
multimodal machine learning model to learn the urgency of a voice
message after categorizing it into four emotions: anger, fear, joy, and
sadness. Used Python’s SciKitLearn and SDK libraries to apply emotion
classification and unsupervised intensity regression on audio and text
data.