Welcome to Wenda Qin's web page



My name is Wenda Qin, a third year PhD student of computer science, currently studied in Boston University (BU). My research interest is mainly computer vision and also natural language processing.

I am advised by Professor Margrit Betke. I am currently working on several research projects, including ear recognition for Zambian infants and document analysis in Arabic. I am also collaborating with people who work in differnt areas and research topics, in order to help people with health issue or disablility, which is also our research goal.

I was also a teaching assistant (ta) and grader for multiple courses held in Boston University. I've been teaching as ta for course "Artifical Intelligence" for two semesters and "Computer Graphics" for one semester in the past two years. And I will continue to be a ta in the course Computer Graphics in the next semester in 2020 fall.


Arabic Document Analysis

My current main research interest is on bringing new technology and attempts to the area of document analysis, especially for documents in Arabic. The final goal is to create a scanning device such that people with visually impairment could read newspaper and books in paper form through the device, in real time. In order to accomplish that, the device needs to: 1. split the document image into proper segments; 2. understand the logical purpose of these segments (is the segment a title? or a page number?); 3. extract text/non-text information from these segment accurately; 4. organize these information so that people could understand what is written on the document page.

We are currently working on creating a better segmentation system to split the document image into segments correctly. This problem has been investigated for more than 20 years while the traditional algorithm is vastly limited when required to be a general solution to all documents in all languages and layouts. In recent years, methods based on deep learning provide a much greater potential for the task. It enables document segmentation to be done by learning from a great amount of data instead of heuristic rules. This has been proved by the great performances of deep-learning based model segmenting documents and even scene photos in real life. The main difficulty for our research, comparing to those have been accomplished, is the lack of available training data. The difference between Latin, Eastern Asian languages and Arabic limits the direct use of model and dataset.

So far, we have built a system based on deep object detection model for the segmentation task of Arabic document. A visualized example of our system's segmentation are shown as pictures above. In the coming months, we are working on building a larger Arabic document training set and establish a better segmentation system based on such dataset.

Ear Study of Zambian Infant Cohort

I am also collaborating with a public health team in Boston University and University of Zambia. The team is currently working on building a better health information system in Zambia, especially for the local infants. One of the tasks for building such a system is: identifying a child with his/her ear from other children's ears. This is an interesting and meaningful research project because personal health information loss is a big problem in Zambian health system while ear information can be collected easily by using mobile phone with camera while keeping infant's identity private as people are usually unable to distinguish other people if they could only see the ears.

There are many challenges in such a task: first, the number of available images used for testing is limited, and they have to be collected for a very long time; Second, the ears of children will grow as the time progress, so the ear might look different in the next time when their photos are taken; Third, the Internet condition in the rural areas of Zambia is not stable, and the computing source is limited (mostly an android phone), so cheap and efficient system needs to be built instead of fancy model and system developed in a giant computer.

At the moment, we have adopted Scale-Invariant Feature Transform (SIFT) to be the core algorithm for ear recognition system. It is also deployed on an Android application so that the local team in Zambia is able to use and test the ear recognition system. We also introduce multiple traditional computer vision techniques to improve the accuracy and stability of the recognition system, so that the system could work decently well in different environments, regardless the ear photo is taken in the museum of science in Boston or a clinic in the rural area near Lusaka. As our next goal, we are analyzing the influence of ear growth of the children given to the recognition algorithm, and try to find a solution to it.

Other Collaborating Research

I am also working with my colleage, Yi Zheng on text recognition in natural scene. We are working on improving text recognition system by introducing additional information besides image features.

In the coming semester, I will also be a member in a research project that uses ultrasound pictures for pneumonia diagnosis with Professor Christopher Gill in BU.


  • Lauren Etter, Alinani Simukanga, Wenda Qin, Rachel Pieciak, Chris Gill, Lawrence Mwananyanda, Caroline Carbo, Margrit Betke, Jackson Phiri, 2020: “Project SEARCH (Scanning EARs for Child Health): Validating an Ear Biometric Tool for Patient Identification in Zambia” submitted to Proceedings of the National Academy of Sciences of the United States of America (PNAS), September, 2020, 18 pages.
  • Randa Elanwar, Wenda Qin, and Margrit Betke, 2020: “Extracting text from scanned Arabic books: A large-scale benchmark dataset and a fined-tuned Faster-R-CNN model”, submitted to International Journal on Document Analysis and Recognition (IJDAR), August, 2020, 10 pages.
  • Yi Zheng, Wenda Qin, Derry Wijaya and Margrit Betke, 2020: “LAL: Linguistically Aware Learning for Scene Text Recognition.” accepted by ACM Multimedia Conference (ACM 2020). October, 2020, 9 pages.
  • Randa Elanwar, Wenda Qin, and Margrit Betke, 2017: “Making Scanned Arabic Documents Machine-accessible Using an Ensemble of SVM Classifiers”, International Journal on Document Analysis and Recognition (IJDAR), September, 2017, 25 pages
  • Wenda Qin, Randa Elanwar, and Margrit Betke, 2017: “LABA: Logical Layout Analysis of Book Page Images in Arabic Using Multiple Support Vector Machines”, 2nd IEEE Int. Workshop on Arabic and derived Script Analysis and Recognition (ASAR 2018). December, 2017, 5 pages.
  • Liannan Lin, Chuan Long and Wenda Qin, 2016: “Analysis of Stanford University’s Human-computer Interaction (HCI) Course System”, 11th International Conference on Computer Science & Education 2016 (ICCSE 2016). August, 2016, 6 pages.