I am a first year PhD student in the Department of Computer Science at Boston University. I gained my Bachelor of Engineering degree in South China University of
Technology in 2016 and I got my master degree in 2018 in Boston University. My main research interest is
computer vision. Machine learning is also one of my interest areas since it has great influence on computer vision nowadays and it is powerful and helpful in solving
computer vision problems. My major of undergraduate is software engineering and I have interneship experience in software company. So I am also a student familiar
with IT project management and practical software development.
In addition to computer science, I am a big fan of history and culture both in Asia and the western world. I have completed a history class of Medieval England in Boston University and wrote a research report about the diet of people in Medieval England. (Take a look!)
I work with Professor Margrit Betke in Boston University and Dr. Randa Elanwar in Electronics Research Institute, Egpyt and held a layout analysis competition of the BCE-Arabic benchmarking dataset on the 2nd IEEE International Workshop on Arabic and derived Script Analysis and Recognition (ASAR 2018). in 2018. Here's the webpage if you are interested.
Previously, we are doing research on the layout analysis of Arabic document images. Arabic document is the less-focused type of documents comparing to those most popular languages like Chinese and English. They are different from other languages. For example, characteristics will be combined with each other in a word. So you will see a lot of lines and dots in Arabics. Many races like Persian and Arabs are using Arabics while they are using it in different ways. Also, they are written from right to left and the decoration of the pages is usually very fancy and thus sometimes causes trouble in distinguishing them with texts. To overcome the difficulties, basic vision techniques can be used to pre-processed the image to get rid of the noise and irrelevant objects in the image. Learning techniques like SVM and neural network can be implemented to capture the characteristics of Arabics and thus recognize the text and image areas on the image. The logical functionalities of different text areas can be classified based on their position, sizes and intensity pattern. The automation of analysing Arabic documents has great values in bringing conveniences to the people using Arabics and especially for the disabled. In addition, it provides a possibility to make it easier for people all over the world to understand the beautiful Arabic culture from its modern and ancient documents.
( The left image is a raw Arabic document image, the image in the right is the classification result of the logical layout of the left image processed by our developed system, such as title, caption, picture, paragraph and page number. )
I am currently working on creating new Arabic document images dataset used for physical and logical layout analysis using crowd sourcing platform Figure-Eight. We hope to create a more challenging dataset for document images written in Arabic with more versatile layout and different scan quality.
( A self-developed tool for combining, comparing different worker's labeling results on the same image and output them in XML format)I am also working with an inter-discipline team both in Boston University and University of Zambia which focuses on creating child health information based on ear recognition with ear images taken in laboratory context.