Using Self-Similarity to Design Deep Neural Network Modules for Images and Videos

Streaming Space-Time Superpixels

Temporally consistent, deformable regions can be used to improve estimates of 3D structure from 2D images, and are important for superpixel-enabled deep learning modules (attention and convolution). This project proposes a fast (about 10 ms/frame) and temporally consistent method to stream superpixels across frames of a video. The left figure depicts our novel algorithm propogating superpixels from frame t to frame t+1. The leftmost image is frame t and the rightmost image is frame t+1. The second image (a gif) shows each superpixel being shifted according to the optical flow. Red color indicates regions of overlapping superpixels (overlaps) and blue color indicates regions with no pixel values (holes). The third image (another gif) shows the holes and overlaps being iteratively filled with our algorithm. Once filled, this image matches frame t+1.

Superpixel Neighborhood Attention

[pdf, code] Images contain objects with deformable boundaries, such as the contours of a human face, yet attention operators act on square windows. This mixes features from perceptually unrelated regions, which can degrade the quality of a denoiser. This paper proposes using superpixel probabilities to re-weight the local attention map. If images are modeled with latent superpixel probabilities, we show our re-weighted attention module matches the theoretically optimal denoiser. The left image shows that NA mixes information from the unrelated blue region, Hard-SNA improperly rejects pixels from the adjacent orange regions, and SNA correctly selects the all the orange pixels and rejects the blue pixels.

Space-Time Attention with a Shifted Non-Local Search

[pdf, code] Computing attention maps for videos is challenging due to the motion of objects between frames. Small spatial inaccuracies significantly impact the attention module's quality. Recent works propose using a deep network to correct these small inaccuracies. In this project, we efficiently implement a space-time grid search which outperforms existing deep neural network alternatives. The image on the left shows a no-shift search, a search using a deep network from related works, and our proposed shifted non-local search.