Jongin Kim
I am a first-year PhD student in Computer Science at Boston University, where I am advised by Prof. Derry Wijaya.
My research interest is Multilingual/Crosslingual NLP.
My goal is to develop NLP systems that work well across different languages, so that langauge technologies are easily accessible to diverse and/or disadvantaged users who may need them most.
I have done some
research towards this research direction. (
publications)
Also, I plan to further
broaden my research topics in the future while I pursue my Ph.D. !
Publications
2021
-
Analysis of Zero-Shot Crosslingual Learning between English and Korean
for Named Entity Recognition
Jongin Kim, Nayoung Choi, Seunghyun S. Lim, Jungwhan Kim, Soojin Chung, Hyunsoo Woo, Min Song, and Jinho D. Choi
In Proceedings of the EMNLP Workshop on Multilingual Representation Learning (MRL), 2021
Anthology
Paper
Presentation
-
FantasyCoref: Coreference Resolution on Fantasy Literature Through Omniscient Writer’s Point of View
Sooyoun Han, Sumin Seo, Minji Kang, Jongin Kim, Nayoung Choi, Min Song, and Jinho D. Choi
In Proceedings of the EMNLP Workshop on Computational Models of Reference, Anaphora and Coreference (CRAC), 2021
Anthology
Paper
Presentation
Future Research Directions
- Efficient Way of Building Dataset for Low-Resource languages
- Ways to efficiently build monolingual labeled dataset for low-resource languages
(Particularly, data that contain knowledge that cannot be transferred from other high-resoure languages)
- Active Learning, Human-in-the-loop machine learning
- Ways to efficiently crowdsource/collect Parallel texts
(for Multilingual Neural Machine Translation)
- Annotation using Image or GIFs as pivots
- Collect Parallel(Comparable) texts from Web and Filter out noise
- Multilingual Benchmark Datasets that enable more comprehensive evaluation of Multilingual Models
- Inclusion of typologically diverse languages
- Inclusion of more challenging NLU/NLG tasks
- Novel Methods for Pre-Training Multilingual Language Models for more accurate alignment accross different languages (for better multilingual representations)
- Multilingual subword tokenizer
- Encourage explicit attention between languages
- Augmenting the model with linguistic knowledge
(= incorporating linguistic knowledge into the model)
- Explore Ways to improve Cross-Lingual Transfer Learning
- Improving zero-shot cross-lingual transfer (Direct Model Transfer)
- Intermediate task training
- Overcoming Word Order Difference
- Annotation Projection
- Applying Cross-Lingual Transfer Learning to Other Tasks
(Would like to explore ways to apply Cross-lingual Transfer Learning to more challenging Tasks)
- Crossingual IR, including QA, Text Summarization
- Multilingual Neural Machine Translation