BU-action Datasets

Datasets
This web page is home for three image action datasets: BU101, BU101-unfiltered, BU203-unfiltered

Citation
If you use our datasets, please cite this work:

S. Ma, S. A. Bargal, J. Zhang, L. Sigal, S. Sclaroff.
"Do Less and Achieve More: Training CNNs for Action Recognition Utilizing Action Images from the Web."
arXiv, 2015. pdf

Contact
Please contact Shugao Ma if you have any questions.

BU101
1:1 class correspondence with UCF101

Download Images

BU101-unfiltered
1:1 class correspondence with UCF101

Download urls

BU203-unfiltered
1:1 class correspondence with ActivityNet

Download urls

BU101

#Classes: 101 #Images: ~23.8K Average #Images Per Class: 235 Corresponding video dataset: UCF101

This is the largest web action image dataset to-date. This dataset is more than double the size of the largest previous action image dataset, the Stanford40 dataset, both in the number of images and the number of actions. It consists of ~23.8K action images that correspond to the 101 action classes in the UCF101 video dataset. The action categories are divided into five types: Human-Object Interaction, Body-Motion Only, Human-Human Interaction, Playing Musical Instruments, Sports. For each action class, we automatically download images from the Web (Google, Flickr, etc.) using corresponding key phrases, e.g. pushup training for the class pushup, and then manually remove irrelevant images or drawings and cartoons. We also include 2769 images of relevant actions from the Standford40 dataset. Each class has at least 100 images and most classes have 150-300 images. Class statistics can be found here.

BU101-unfiltered

#Classes: 101 #Images: ~204K Average #Images Per Class: 2017 Corresponding video dataset: UCF101

This is a crawled dataset of web action images. It consists of ~204K images, with an average number of 2017 images per class. It is in one-to-one correspondence with the classes of the UCF101 video action dataset. The action categories are divided into five types: Human-Object Interaction, Body-Motion Only, Human-Human Interaction, Playing Musical Instruments, Sports. These crawled images are not manually labeled; we refer to them as unfiltered images. For each action class, we automatically download images from the Web (Google, Flickr, etc.) using corresponding key phrases, e.g. pushup training for the class pushup. Class statistics can be found here.

BU203-unfiltered

#Classes: 203 #Images: ∼387K Average #Images Per Class: 1909 Corresponding video dataset: ActivityNet

This is a crawled dataset of web action images. It consists of ∼387K images, with an average number of 1909 images per class. It is in one-to-one correspondence with the classes of the ActivityNet video action dataset. The action categories are divided into these main types: Personal Care, Working, Eating and Drinking, Socializing and Leisure, Household, Sports and Excercises, Caring and Helping. These crawled images are not manually labeled; we refer to them as unfiltered images. For each action class, we automatically download images from the Web (Google, Flickr, etc.) using corresponding key phrases, e.g. pushup training for the class pushup. Class statistics can be found here.

Copyright Notice: These materials are presented to ensure timely dissemination of scholarly and technical work, and for academic research purpose only. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.