Machine Learning Seminar

Machine Learning Seminar

Sagie Benaim


Tel-Aviv University

Learning the ‘speediness’ of videos and genearting novel videos from a single sample

In the first part of the talk I will present 'SpeedNet: Learning the Speediness in Videos'. We wish to automatically predict the "speediness" of moving objects in videos: whether they move faster, at, or slower than their "natural" speed. The core component in our approach is SpeedNet: a novel deep network trained to detect if a video is playing at normal rate, or if it is sped up. SpeedNet is trained on a large corpus of natural videos in a self-supervised manner, without requiring any manual annotations. We show how this single, binary classification network can be used to detect arbitrary rates of speediness of objects. We demonstrate prediction results by SpeedNet on a wide range of videos containing complex natural motions, and examine the visual cues it utilizes for making those predictions. Importantly, we show that through predicting the speed of videos, the model learns a powerful and meaningful space-time representation that goes beyond simple motion cues. We demonstrate how those learned features can boost the performance of self-supervised action recognition, and can be used for video retrieval. Furthermore, we also apply SpeedNet for generating time-varying, adaptive video speedups, which can allow viewers to watch videos faster, but with less of the jittery, unnatural motions typical to videos that are sped up uniformly. In the second part of the talk I will present 'Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample'. We consider the task of generating diverse and novel videos from a single video sample. Recently, new hierarchical patch-GAN based approaches were proposed for generating diverse images, given only a single sample at training time. Moving to videos, these approaches fail to generate diverse samples, and often collapse into generating samples similar to the training video. We introduce a novel patchbased variational autoencoder (VAE) which allows for a much greater diversity in generation. Using this tool, a new hierarchical video generation scheme is constructed: at coarse scales, our patch-VAE is employed, ensuring samples are of high diversity. Subsequently, at finer scales, a patch-GAN renders the fine details, resulting in high quality videos. Papers: Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample. NeurIPS 2020. SpeedNet: Learning the Speediness in Videos. CVPR 2020. * Sagie Benaim is a Ph.D. student in Tel-Aviv University under the supervision of Professor Lior Wolf. Zoom link:

Date: Sun 29 Nov 2020

Start Time: 11:30

End Time: 12:30

ZOOM Meeting | Electrical Eng. Building