Problem Definition
In this project, I am planning to detect whether an image contains any food, especially Kenyan food. Considering that training a classifier is computationally cheaper and easier, it would be a better choice to train a food/non-food classifier instead of an object detector.
Background Survey
Food recognition is one of the most popular topic and application in deep learning area and some state-of-the-art models are able to outperform human nowadays. However in real world application, people should not assume every image contains food and we only want to recognize food among images that contain them. Therefore, food/non-food classification becomes one of the most important prerequisite of successfully recognizing a food.
There are several popular food/non-food datasets used by most food/non-food classifier, such as IFD and FCD datasets. IFD is short for Instagram Food/Non-Food Dataset, which is collected on Instagram, containing 5,000 food and 5,000 non-food images, while FCD is short for Food-101 and Caltech-256 Dataset, containing 25,000 food images and 25,000 non-food images. Both datasets share a common advantage that they contain a large number of food images and diverse non-food images so that trained model is less likely to be confused by rare non-food images.
However, we should also notice that food should be diverse enough to train an accurate food detector since people around the globe have different diatery habit and even diverse definition about food but unfortunately most popluar food datasets contain only European and East Asian food. Noting that Indian Ocean Rim countries such as India, Kenya and some Middle East countries have some quite different food, I am planning to set up a Kenyan Food Dataset(KFD) and try to imporve the performance of food/non-food classification when processing exotic food images.
Baseline
Hokuto et al. proposed a CNN-NIN model training on IFD anf FCD datasets in: Highly Accurate Food/Non-Food Image Classification based on a Deep Convolutional Neural Network and achieved accuracy 95.1% on IFD and 96.1% on FCD respectively. However this paper is published in 2015 and in 2019 we are able to train data on better model. By merging FCD and IFD datasets I set up a baseline dataset called FNF, I then trained a ResNet 101 on it and the accuracy reach 98.3%.
Method
I collected Kenyan images of food using Instagram API module and scrape about 20 kinds of different popular Kenyan food using hashtag searching and downloaded randomly 60,000 images posted in Kenya as non-food images. After manually inspection, I set up a Kenyan Food Dataset(KFD) containing 30,000 food images and 30,000 non-food images. Then I trained a ResNet 101 on KFD using Tensorflow slim module. The network is pretrained on ImageNet and use Adam as optimizer.
Experiments
I conducted both single dataset evaluation and cross dataset evaluation to evaluate the performance of my model on datasets.
- In the first part I will run the model on FNF dataset to observe the performance on baseline dataset.
- In the second part I will test the performance of model on KFD dataset.
- In the third part I will evaluate perforamnce of model across FNF and KFD dataset.
- To simulate real-world situation when facing diverse food, in the fourth part I set up a new dataset called BUFD which is the combination of KFD and FNF so I evaluate the performance of model on the BUFD to observe how performance vary when training on different dataset.
Results
1) Error rate of model training on FNF dataset.
ResNet 101 was trained on FNF dataset for about 30 epochs and tested on FNF dataset.
Error rate on FNF dataset
Training Set | Testing Set | ||
Error rate | 0.11% | 1.65% |
2) Error rate of model training on KFD dataset.
ResNet 101 was trained on KFD dataset for about 30 epochs and tested on KFD dataset.
Error rate on KFD dataset
Training Set | Testing Set | ||
Error rate | 0.44% | 0.62% |
3) Cross dataset evaluation
ResNet 101 was trained on FNF dataset for about 30 epochs and tested on KFD dataset.
Error rate on KFD dataset
Training Set | Testing Set | ||
Error rate | 2.32% | 2.18% |
ResNet 101 was trained on KFD dataset for about 30 epochs and tested on FNF dataset.
Error rate on FNF dataset
Training Set | Testing Set | ||
Error rate | 3.36% | 3.33% |
4) Performance of model on the combination of FNF and KFD.
ResNet 101 was trained on BUFD dataset for about 38 epochs and tested on BUFD dataset.
Error rate on BUFD dataset
Training Set | Testing Set | ||
Error rate | 0.50% | 1.11% |
ResNet 101 was trained on FNF dataset for about 38 epochs and tested on BUFD dataset.
Error rate on BUFD dataset
Training Set | Testing Set | ||
Error rate | 1.33% | 1.34% |
ResNet 101 was trained on KFD dataset for about 38 epochs and tested on BUFD dataset.
Error rate on BUFD dataset
Training Set | Testing Set | ||
Error rate | 1.92% | 2.19% |
ResNet 101 was trained on BUFD dataset for about 38 epochs and tested on FNF dataset.
Error rate on FNF dataset
Training Set | Testing Set | ||
Error rate | 0.61% | 0.66% |
ResNet 101 was trained on BUFD dataset for about 38 epochs and tested on KFD dataset.
Error rate on KFD dataset
Training Set | Testing Set | ||
Error rate | 0.52% | 0.37% |
Discussion
- From both the first and the second experiment, we can find that training on KFD and FNF datasets are very efficient and easy to converge to a satisfying point.
- For the third experiment, when we evaluate model across datasets, we can observe that error rate increase significantly. When testing on KFD, error rate of model trained on FNF increase from 0.11% to 2.32% for training set and 1.65% to 2.18% on test set respectively. On the other hand, error rate of model trained on KFD increase from 0.44% to 3.36% on training set and 0.62% to 3.33% on test set when testing on FNF. This result shows the fact that because of the difference of food between two datasets, the performance of models trained on only one of them will be obviously undermined, which means each dataset has some kind of features that the other dataset doesn't cover.
- In the fourth experiment, I combine two datasets together and try to create a more diverse food dataset, namely BUFD. From the comparison between three experiments testing on BUFD we can find that the error rate decrease when training set is more diverse, from 2.19% and 1.34% to 1.11%.
- Additionally, we may also find in the last two tables that error rate is significantly lower when we test the model trained on BUFD on both FNF and KFD datasets compared to testing the model trained on FNF and KFD respectively, from 1.65% to 0.66% for FNF and from 0.62% to 0.37% for KFD. This may indicates that a more diverse dataset is conducive to train a more accurate classifier.
Conclusions
In this project, I proposed a new dataset collected by myself, which is dedicated to detect food from Kenya and other Indian Ocean Rim counties and the experiments above prove the effectiveness of this dataset. Also, I found that a more diverse food image dataset is helpful to improve the performance of recognizing not only augmented part but also the original part of the dataset.
Credits and Bibliography
Kagaya H, Aizawa K. Highly accurate food/non-food image classification based on a deep convolutional neural network[C]//International Conference on Image Analysis and Processing. Springer, Cham, 2015: 350-357. Merler M, Wu H, Uceda-Sosa R, et al. Snap, Eat, RepEat: a food recognition engine for dietary logging[C]//Proceedings of the 2nd international workshop on multimedia assisted dietary management. ACM, 2016: 31-40.