(2020) European Conference on Computer Vision (ECCV) . Abstract
When a very fast dynamic event is recorded with a low-framerate camera, the resulting video suffers from severe motion blur (due to exposure time) and motion aliasing (due to low sampling rate in time). True Temporal Super-Resolution (TSR) is more than just Temporal-Interpolation (increasing framerate). It can also recover new high temporal frequencies beyond the temporal Nyquist limit of the input video, thus resolving both motion-blur and motion-aliasing effects that temporal frame interpolation (as sophisticated as it maybe) cannot undo. In this paper we propose a "Deep Internal Learning" approach for true TSR. We train a video-specific CNN on examples extracted directly from the low-framerate input video. Our method exploits the strong recurrence of small space-time patches inside a single video sequence, both within and across different spatio-temporal scales of the video. We further observe (for the first time) that small space-time patches recur also across-dimensions of the video sequence - i.e., by swapping the spatial and temporal dimensions. In particular, the higher spatial resolution of video frames provides strong examples as to how to increase the temporal resolution of that video. Such internal video-specific examples give rise to strong self-supervision, requiring no data but the input video itself. This results in Zero-Shot Temporal-SR of complex videos, which removes both motion blur and motion aliasing, outperforming previous supervised methods trained on external video datasets.
(2012) International Conference on Information Science and Applications (ICISA). Abstract
This paper presents a novel approach and interface to interactive image segmentation. Our interface uses sparse and inaccurate boundary cues provided by the user to produce a multi-layer segmentation of the image. Using boundary cues allows our interface to utilize a single “boundary brush” to produce a multi-layer segmentation, making it appealing for devices with touch screen user interface. Our method utilizes recent advances in clustering to automatically recover the underlying number of layers without explicitly requiring the user to specify this input.
(2012) arXiv. Abstract
Discrete energy minimization is a ubiquitous task in computer vision, yet is NP-hard in most cases. In this work we propose a multiscale framework for coping with the NP-hardness of discrete optimization. Our approach utilizes algebraic multiscale principles to efficiently explore the discrete solution space, yielding improved results on challenging, non-submodular energies for which current methods provide unsatisfactory approximations. In contrast to popular multiscale methods in computer vision, that builds an image pyramid, our framework acts directly on the energy to construct an energy pyramid. Deriving a multiscale scheme from the energy itself makes our framework application independent and widely applicable. Our framework gives rise to two complementary energy coarsening strategies: one in which coarser scales involve fewer variables, and a more revolutionary one in which the coarser scales involve fewer discrete labels. We empirically evaluated our unified framework on a variety of both non-submodular and submodular energies, including energies from Middlebury benchmark.
(2012) NIPS Workshop on Optimization for Machine Learning. Abstract
Current state-of-the-art discrete optimization methods struggle behind when it comes to challenging contrast-enhancing discrete energies (i.e., favoring different labels for neighboring variables). This work suggests a multiscale approach for these challenging problems. Deriving an algebraic representation allows us to coarsen any pair-wise energy using any interpolation in a principled algebraic manner. Furthermore, we propose an energy-aware interpolation operator that efficiently exposes the multiscale landscape of the energy yielding an effective coarse-to-fine optimization scheme. Results on challenging contrast-enhancing energies show significant improvement over state-of-the-art methods.
(2012) Doctoral dissertation, Weizmann Institute of Science, Israel. Abstract
In this thesis I explore challenging discrete energy minimization problems that arise mainly in the context of computer vision tasks. This work motivates the use of such "hard-to-optimize" non-submodular functionals, and proposes methods and algorithms to cope with the NP-hardness of their optimization. Consequently, this thesis revolves around two axes: applications and approximations. The applications axis motivates the use of such "hard-to-optimize" energies by introducing new tasks. As the energies become less constrained and structured one gains more expressive power for the objective function achieving more accurate models. Results show how challenging, hard-to-optimize, energies are more adequate for certain computer vision applications. To overcome the resulting challenging optimization tasks the second axis of this thesis proposes approximation algorithms to cope with the NP-hardness of the optimization. Experiments show that these new methods yield good results for representative challenging problems.
(2011) IEEE International Conference On Computer Vision (ICCV). Abstract
This paper introduces a new formulation for discrete image labeling tasks, the Decision Tree Field (DTF), that combines and generalizes random forests and conditional random fields (CRF) which have been widely used in computer vision. In a typical CRF model the unary potentials are derived from sophisticated random forest or boosting based classifiers, however, the pairwise potentials are assumed to (1) have a simple parametric form with a pre-specified and fixed dependence on the image data, and (2) to be defined on the basis of a small and fixed neighborhood. In contrast, in DTF, local interactions between multiple variables are determined by means of decision trees evaluated on the image data, allowing the interactions to be adapted to the image content. This results in powerful graphical models which are able to represent complex label structure. Our key technical contribution is to show that the DTF model can be trained efficiently and jointly using a convex approximate likelihood function, enabling us to learn over a million free model parameters. We show experimentally that for applications which have a rich and complex label structure, our model achieves excellent results.
(2011) arXiv. Abstract
Clustering is a fundamental task in unsupervised learning. The focus of this paper is the Correlation Clustering functional which combines positive and negative affinities between the data points. The contribution of this paper is two fold: (i) Provide a theoretic analysis of the functional. (ii) New optimization algorithms which can cope with large scale problems (>100K variables) that are infeasible using existing methods. Our theoretic analysis provides a probabilistic generative interpretation for the functional, and justifies its intrinsic "model-selection" capability. Furthermore, we draw an analogy between optimizing this functional and the well known Potts energy minimization. This analogy allows us to suggest several new optimization algorithms, which exploit the intrinsic "model-selection" capability of the functional to automatically recover the underlying number of clusters. We compare our algorithms to existing methods on both synthetic and real data. In addition we suggest two new applications that are made possible by our algorithms: unsupervised face identification and interactive multi-object segmentation by rough boundary delineation.
(2010) 2010 Ieee Conference On Computer Vision And Pattern Recognition (Cvpr). p. 33-40 Abstract
Given very few images containing a common object of interest under severe variations in appearance, we detect the common object and provide a compact visual representation of that object, depicted by a binary sketch. Our algorithm is composed of two stages: (i) Detect a mutually common (yet non-trivial) ensemble of 'self-similarity descriptors' shared by all the input images. (ii) Having found such a mutually common ensemble, 'invert' it to generate a compact sketch which best represents this ensemble. This provides a simple and compact visual representation of the common object, while eliminating the background clutter of the query images. It can be obtained from very few query images. Such clean sketches may be useful for detection, retrieval, recognition, co-segmentation, and for artistic graphical purposes.
(2009) 2009 Ieee 12Th International Conference On Computer Vision (Iccv). p. 349-356 Abstract
Methods for super-resolution can be broadly classified into two families of methods: (i) The classical multi-image super-resolution (combining images obtained at subpixel misalignments), and (ii) Example-Based super-resolution (learning correspondence between low and high resolution image patches from a database). In this paper we propose a unified framework for combining these two families of methods. We further show how this combined approach can be applied to obtain super resolution from as little as a single image (with no database or prior examples). Our approach is based on the observation that patches in a natural image tend to redundantly recur many times inside the image, both within the same scale, as well as across different scales. Recurrence of patches within the same image scale (at subpixel misalignments) gives rise to the classical super-resolution, whereas recurrence of patches across different scales of the same image gives rise to example-based super-resolution. Our approach attempts to recover at each pixel its best possible resolution increase based on its patch redundancy within and across scales.
What Is a Good Image Segment? A Unified Approach to Segment Extraction(2008) Computer Vision - Eccv 2008, Pt Iv, Proceedings. 5305, p. 30-44 Abstract
There is a huge diversity of definitions of "visually meaningful" image segments, ranging from simple uniformly colored segments, textured segments, through symmetric patterns, and up to complex semantically meaningful objects. This diversity has led to a wide range of different approaches for image segmentation. In this paper we present a single unified framework for addressing this problem - "Segmentation by Composition". We define a good image segment as one which can be easily composed using its own pieces, but is difficult to compose using pieces from other parts of the image. This non-parametric approach captures a large diversity of segment types, yet requires no pre-definition or modelling of segment types, nor prior training. Based on this definition, we develop a segment extraction algorithm - i.e., given a single point-of-interest, provide the "best" image segment containing that point. This induces a figure-ground image segmentation, which applies to a range of different segmentation tasks: single image segmentation, simultaneous co-segmentation of several images, and class-based segmentations.