BIST: Bayesian-Inspired Space-Time Superpixels

Purdue University

TL;DR: BIST can estimate space-time superpixels at over 60 frames per second on 240x320 images, while the next fastest method (TSP) runs at about 2 frame per second. In addition, the shape and number of superpixels is qualitatively similar to the recent BASS method.

Abstract

This paper presents an efficient method to compute space-time superpixels and an application of the superpixels called superpixel convolution. The space-time superpixel method extends a single-image Bayesian method named BASS. Our approach, named Bayesian-inspired Space-Time Superpixels (BIST), is inspired by hill-climbing to a local mode of a Dirichlet-Process Gaussian Mixture Model conditioned on the previous frame's superpixel information. The method is only Bayesian-inspired, rather than actually Bayesian, because the split/merge steps are treated as a classification problem rather than derived from a Gibbs sampling update. However, this heuristic reduces the number of split/merge steps from several hundred per frame to only a few. BIST is over twice as fast as BASS and over 10 times faster than other space-time superpixel methods with favorable (and sometimes superior) quality. Specifically, BIST runs at 60 frames per second while TSP runs at about 2 frames per second. Additionally, to garner interest in superpixels, this paper demonstrates their use within deep neural networks. We present a superpixel-weighted convolution layer for single-image denoising that outperforms standard convolution by over 1.5 dB PSNR.

How it Works

BIST superpixels are computed with three major steps: (a) The Shift and Fill Step, (b) Boundary Updates, and (c) Splits/Merges/Relabeling. The following sequences illustrate BIST in action, detailing every step. For a more complete collection of results, including a vizualization of BASS, please see this link on YouTube. The animation slows down the method, which runs at about 10 - 15 ms per frame on images with resolution 240x320 and about 39 ms per frame on images with resolution 480x940.

Examples

We compare BIST to a recent single-image method named BASS and a space-time superpixel method named TSP. The top-left video shows the groundtruth segmentation for the object of interest, while the other videos show the superpixels intersecting the original input. In the top-right, the space-only BASS superpixels serve as the ideal type of superpixels. In the bottom-right, the space-time TSP superpixels show that superpixels can track an object over time. In the bottom-left, the proposed BIST method achieves qualitativley similar superpixels to BASS and tracking capabilities like TSP. More sequences are available here.

Benchmark Results

BIST achieves state-of-the-art results on standard superpixel benchmarks and is the fastest method with open-source code. Please see this resource for a detailed description of each benchmark.

A Temporally Coherent Split Step Controls the Number of Superpixels

Naively using the BASS split step in the space-time case leads to using several hundred more superpixels than in the single-image case. The proposed split step term introduces a hyperparameter to control the number of new superpixels. A larger parameter value reduces the number of superpixels used to represent the image. The number of BIST superpixels matches the number of BASS superpixels when the hyperparameter is set to 4.0.

The Relabeling Hyperparameter Controls the Temporal Extent

The Shift & Fill step explains rigid-body motion, but methods can be erroneous and struggle when there is a sudden change of an object's pixel intensity. The relabeling step can correct for improperly propogated superpixels. The superpixel mean appearance from the previous frame and the shifted mean location are compared with the current superpixel mean appearance and location. If this difference exceeds a threshshold value, the superpixel is relabeled as a new one. This threshold hyperparameter has a significant impact on the temporal extent, which can be long (blue), medium (purple), or short (red).

Additional Examples of BIST in Action