I am a M.S student in the Department of Computer Science at Boston University. I gained my Bachelor of Engineering degree in South China University of
Technology in 2016. My master program of computer science started at September 2016 and I am planning to finish it by June 2018. My main research interest is
computer vision. Machine learning is also one of my interest areas since it has great influence on computer vision nowadays and it is powerful and helpful in solving
computer vision problems. My major of undergraduate is software engineering and I have interneship experience in software company. So I am also a student familiar
with IT project management and practical software development.
In addition to computer science, I am a big fan of history and culture both in Asia and the western world. I have completed a history class of Medieval England in Boston University and wrote a research report about the diet of people in Medieval England. (Take a look!) Besides, I do readings about ancient Chinese philosophy and I am always willing to share my understanding of the brilliant wisdom from ancient China to everyone interested.
I am working with Professor Margrit Betke in Boston University and Dr. Randa Elanwar in Electronics Research Institute, Egpyt and holding a layout analysis competition of the BCE-Arabic benchmarking dataset on the 2nd IEEE International Workshop on Arabic and derived Script Analysis and Recognition (ASAR 2018). Here's the webpage if you are interested.
Previously, we are doing research on the layout analysis of Arabic document images. Arabic document is the less-focused type of documents comparing to those most popular languages like Chinese and English. They are different from other languages. For example, characteristics will be combined with each other in a word. So you will see a lot of lines and dots in Arabics. Many races like Persian and Arabs are using Arabics while they are using it in different ways. Also, they are written from right to left and the decoration of the pages is usually very fancy and thus sometimes causes trouble in distinguishing them with texts. To overcome the difficulties, basic vision techniques can be used to pre-processed the image to get rid of the noise and irrelevant objects in the image. Learning techniques like SVM and neural network can be implemented to capture the characteristics of Arabics and thus recognize the text and image areas on the image. The logical functionalities of different text areas can be classified based on their position, sizes and intensity pattern. The automation of analysing Arabic documents has great values in bringing conveniences to the people using Arabics and especially for the disabled. In addition, it provides a possibility to make it easier for people all over the world to understand the beautiful Arabic culture from its modern and ancient documents.
( The left image is a raw Arabic document image, the image in the right is the classification result of the logical layout of the left image processed by our developed system, such as title, caption, picture, paragraph and page number. )
For the scheduled last semester of my master program, I am planning to continue my unfinished work on the crowdsourcing project of collecting labeled data from crowdsourcing workers on the Amazon Turk platform. Although crowdsourcing is somewhat not directly related to the knowledge of computer vision, it provides the foundation of benmarking and of course, training data for machine learning in computer vision. The work is harder than one expect to because crowdsourcing requires efforts from different angles and focuses: standard for evaluating and improving the crowdsourcing task and its interfaces, mechanisms to prevent spammers, etc. Before leaving our hands to the intelligent computer, crowdsourcing is an unavoidable work for researchers who care about their data source and data qualities.
( A self-developed tool for combining, comparing different worker's labeling results on the same image and output them in XML format)
Here are the coursework or project reports from one of my favourite class "Image and Video Computing". The code is written in OpenCV 3.1 and in C++. Most of them are using the combination of several basic and "pure" techniques of computer vision.
Image Processing Using OpenCV
Object Recognition by Template Matching Part I: Bottle Cup Recognition
Object Recognition by Template Matching Part II: Gesture Recognition
Object Detection by Active Contour Model: Fish Objects within Aquarium Video
Object Detection by Active Contour Model: Eels Detection in Water Tank Video
Final Project Report: Traffic Monitoring By Video, Vehicles Tracking and Vehicle Data Analysing
Here are the coursework or project reports from the class of "Artificial Intelligence".