Temporally Distributed Networks for Fast Video Semantic Segmentation

Ping Hu1 Fabian Caba Heilbron2 Oliver Wang2 Zhe Lin2 Stan Sclaroff1 Federico Perazzi2
1. Boston University        2. Adobe Research

Abstract


    [Code]  [ Arxiv]  [ Poster]

We present TDNet, a temporally distributed network designed for fast and accurate video semantic segmentation. We observe that features extracted from a certain high-level layer of a deep CNN can be approximated by composing features extracted from several shallower subnetworks. Leveraging the inherent temporal continuity in videos, we distribute these sub-networks over sequential frames. Therefore, at each time step, we only need to perform a lightweight computation to extract a sub-features group from a single sub-network. The full features used for segmentation are then recomposed by the application of a novel attention propagation module that compensates for geometry deformation between frames. A grouped knowledge distillation loss is also introduced to further improve the representation power at both full and sub-feature levels. Experiments on Cityscapes, CamVid, and NYUD-v2 demonstrate that our method achieves state-of-the-art accuracy with significantly faster speed and lower latency


Overview

Citation

@article{hu2020tdnet,
  title={Temporally Distributed Networks for Fast Video Semantic Segmentation},
  author={Hu, Ping and Caba, Fabian and Wang, Oliver and Lin, Zhe and Sclaroff, Stan and Perazzi, Federico},
  journal={CVPR},
  year={2020}
}