BIST: Bayesian-Inspired Space-Time Superpixels

Purdue University

TL;DR: BIST can estimate space-time superpixels at over 60 frames per second on 240x320 images, while the next fastest method (TSP) runs at about 2 frame per second. In addition, the shape and number of superpixels is qualitatively similar to the recent BASS method.

Abstract

This paper presents Bayesian-inspired Space-Time Superpixels (BIST): a fast, state-of-the-art method to compute space-time superpixels. BIST is a novel extension of a single-image Bayesian method named BASS, and it is inspired by hill-climbing to a local mode of a Dirichlet-Process Gaussian Mixture Model (DP-GMM). The method is only Bayesian-inspired, rather than actually Bayesian, because it includes heuristic modifications to the theoretically correct sampler. Similar to existing methods, BIST can adapt the number of superpixels to an individual frame using split-merge steps. A key novelty is a new temporal coherence term in the split step, which reduces the chance of splitting propagated superpixels. This term enforces temporal coherence in propagated regions, and unconstrained adaptation in disoccluded regions. A hyperparameter determines the strength of this new term, which does not require special tuning to return consistent results across multiple videos. The wall-clock runtime of BIST is over twice as fast as BASS and over 30 times faster than the next fastest space-time superpixel method with open-source code.

How it Works

BIST superpixels are computed with three major steps: (a) The Shift and Fill Step, (b) Boundary Updates, and (c) Splits/Merges/Relabeling. The following sequences illustrate BIST in action, detailing every step. For a more complete collection of results, including a vizualization of BASS, please see this link on YouTube. The animation slows down the method, which runs at about 10 - 15 ms per frame on images with resolution 240x320 and about 39 ms per frame on images with resolution 480x940.

Examples

We compare BIST to a recent single-image method named BASS and a space-time superpixel method named TSP. The top-left video shows the groundtruth segmentation for the object of interest, while the other videos show the superpixels intersecting the original input. In the top-right, the space-only BASS superpixels serve as the ideal type of superpixels. In the bottom-right, the space-time TSP superpixels show that superpixels can track an object over time. In the bottom-left, the proposed BIST method achieves qualitativley similar superpixels to BASS and tracking capabilities like TSP. More sequences are available here.

Benchmark Results

BIST achieves state-of-the-art results on standard superpixel benchmarks and is the fastest method with open-source code. Please see this resource for a detailed description of each benchmark.

A Temporally Coherent Split Step Controls the Number of Superpixels

Naively using the BASS split step in the space-time case leads to using several hundred more superpixels than in the single-image case. The proposed split step term introduces a hyperparameter to control the number of new superpixels. A larger parameter value reduces the number of superpixels used to represent the image. The number of BIST superpixels matches the number of BASS superpixels when the hyperparameter is set to 4.0.

The Relabeling Hyperparameter Controls the Temporal Extent

The Shift & Fill step explains rigid-body motion, but methods can be erroneous and struggle when there is a sudden change of an object's pixel intensity. The relabeling step can correct for improperly propogated superpixels. The superpixel mean appearance from the previous frame and the shifted mean location are compared with the current superpixel mean appearance and location. If this difference exceeds a threshshold value, the superpixel is relabeled as a new one. This threshold hyperparameter has a significant impact on the temporal extent, which can be long (blue), medium (purple), or short (red).

Additional Examples of BIST in Action