Vasili Ramanishka, Ph.D. student

I joined Image and Video Computing research group at Boston University in 2016 after the transfer from University of Massachusetts Lowell. My current research interests are focused on applications of machine learning to image and language understanding. I am advised by Prof. Kate Saenko.

Caption-Guided Visual Saliency

Our approach can produce spatial or spatiotemporal heatmaps for both given input sentences or sentences predicted by the video captioning model. Unlike recent efforts that introduce explicit "attention" layers to selectively attend to certain inputs while generating each word, our approach recovers saliency without the overhead of explicit attention layers and can be used to analyze a variety of existing model architectures and improve their design.

Video to Text

We explore models which allow to produce natural language descriptions for in-the-wild video. The task has important applications in video indexing, human-robot interaction, and describing movies for the blind. Our team VideoLAB was ranked 3rd in ACM Multimedia 2016 Grand Challenge.

Semantic Textual Similarity

The task has been developed over the past years with the idea of capturing the degree of equivalence in the underlying semantics conveyed by two snippets of text. This simple formulation has many potential applications, such as language modeling, machine translation, and information extraction. Our team was ranked 6th out of 40 participants during SemEval-2016 competition.