The Power of DINO-ViT Features

Vision Transformers (ViT) are novel and powerful backbones for image analysis. When trained in a self-distillation manner (DINO), they yield intriguing representations locally and globally.  In this project, we set out to explore these representations, find their strengths and advantages, and how they give rise to new applications while performing more straightforward tasks (e.g., segmentation, point correspondences) in a zero-shot manner.


Shir Amir, Yossi Gandelsman, Shai Bagon, and Tali Dekel "Deep vit features as dense visual descriptors" ECCVW 2022. [project page]

Narek Tumanyan, Omer Bar-Tal, Shai Bagon, and Tali Dekel "Splicing ViT Features for Semantic Appearance Transfer" CVPR 2022 [project page]

Amit Aflalo, Shai Bagon, Tamar Kashti, and Yonina Eldar "DeepCut: Unsupervised Segmentation using Graph Neural Networks Clustering" 2022 [project page]

Assessment of COVID-19 in Lung Ultrasound by Combining Anatomy and Sonographic Artifacts Cues using Deep Learning

When assessing severity of COVID-19 from lung ultrasound (LUS) frames, both anatomical phenomena (e.g., the pleural line, presence of consolidations), as well as sonographic artifacts, such as A-lines and B-lines are of importance. While ultrasound devices aim to provide an accurate visualization of the anatomy, the orientation of the sonographic artifacts differ between probe types. This difference poses a challenge in designing a unified deep artificial neural network capable of handling all probe types.

In this work we improve upon Roy et al (2020): We train a simple deep neural network to assess the severity of COVID19 from LUS data. To address the challenge of handling both linear and convex probes in a unified manner we employed two strategies: First, we augment the input frames of convex probes with a ``rectified” version in which A-lines and B-lines assume a horizontal/vertical aspect close to that achieved with linear probes. Second, we explicitly inform the network of the presence of important anatomical features and artifacts. We use a known Radon-based method for detecting the pleural line and B-lines and feed the detected lines as inputs to the network.

[Michael Roberts, Oz Frank, Shai Bagon, Yonina C. Eldar, and Carola-Bibiane Schönlieb. "AI and Point of Care Image Analysis for COVID-19." In Artificial Intelligence in Covid-19, pp. 85-119. Springer, Cham, 2022.]

[Oz Frank, Nir Schipper, Mordehay Vaturi, Gino Soldati, Andrea Smargiassi, Riccardo Inchingolo, Elena Torri, Tiziano Perrone, Federico Mento, Libertario Demi, Meirav Galun, Yonina C. Eldar, Shai Bagon Integrating Domain Knowledge Into Deep Networks for Lung Ultrasound With Applications to COVID-19 IEEE Transactions on Medical Imaging (2021)]

[A recorded talk at the Acoustics Virtually Everywhere, The 179th Meeting of the Acoustical Society of America]

Across Scales & Across Dimensions: Temporal Super-Resolution using Deep Internal Learning

When a very fast dynamic event is recorded with a low-framerate camera, the resulting video suffers from severe motion blur (due to exposure time) and motion aliasing (due to low sampling rate in time). True Temporal Super-Resolution (TSR) is more than just Temporal-Interpolation (increasing framerate). It also recovers new high temporal frequencies beyond the temporal nyquist limit of the input video, thus resolving both motion-blur and motion-aliasing. In this paper we propose a "Deep Internal Learning" approach for true TSR. We train a video-specific CNN on examples extracted directly from the low-framerate input video. Our method exploits the strong recurrence of small space-time patches inside a single video sequence, both within and across different spatio-temporal scales of the video. We further observe (for the first time) that small space-time patches recur also across-dimensions of the video sequence - i.e., by swapping the spatial and temporal dimensions. In particular, the higher spatial resolution of video frames provides strong examples as to how to increase the temporal resolution of that video. Such internal video-specific examples give rise to strong self-supervision, requiring no data but the input video itself. This results in Zero-Shot Temporal-SR of complex videos, which removes both motion blur and motion aliasing, outperforming previous supervised methods trained on external video datasets.

[project page] [github]

InGAN: Capturing and Remapping the "DNA" of a Natural Image

Generative Adversarial Networks (GANs) typically learn a distribution of images in a large image dataset, and are then able to generate new images from this distribution. However, each natural image has its own internal statistics, captured by its unique distribution of patches. In this paper we propose an "Internal GAN" (InGAN) - an image-specific GAN - which trains on a single input image and learns its internal distribution of patches. It is then able to synthesize a plethora of new natural images of significantly different sizes, shapes and aspect-ratios - all with the same internal patch-distribution (same "DNA") as the input image. In particular, despite large changes in global size/shape of the image, all elements inside the image maintain their local size/shape. InGAN is fully unsupervised, requiring no additional data other than the input image itself. Once trained on the input image, it can remap the input to any size or shape in a single feedforward pass, while preserving the same internal patch distribution. InGAN provides a unified framework for a variety of tasks, bridging the gap between textures and natural images.

[project page] [github]

Discrete Energy Minimization


Matlab code implementing discrete multiscale optimization presented in:
Shai Bagon and Meirav Galun A Unified Multiscale Framework for Discrete Energy Minimization (arXiv'2012),
and Shai Bagon and Meirav Galun A Multiscale Framework for Challenging Discrete Optimization (NIPS Workshop on Optimization for Machine Learning 2012).



Correlation clustering

Matlab code implementing optimization algorithms presented in:
Shai Bagon and Meirav Galun Large Scale Correlation Clustering Optimization (arXiv'2011).
May be applicable to other graph partitioning problems as well.



Sketch the Common

Matlab implementing the sketching part of Shai Bagon, Or Brostovsky, Meirav Galun and Michal Irani's Detecting and Sketching the Common (CVPR 2010).

[project page] [github]

Matlab Wrappers


Matlab wrapper to Veksler, Boykov, Zabih and Kolmogorov's implementation of Graph Cut algorithm. Use the following citation if you use this software. There is a simple example of image segmentation using GraphCuts.



Robust P^n

Matlab wrapper to Lubor Ladicky, Pushmeet Kohli and Philip Torr's Minimizing Robust Higher Order Potentials using Move Making Algorithms. This software is for research purposes only. Use the following citations in any resulting publication.
Note: This wrapper provides an additional functionality of varying weights for the nodes participating in a higher order potential as described in the tech. report.



Approximate Nearest Neighbors

Matlab class providing interface to ANN library of David Mount and Sunil Arya.



EDISON mean-shift Segmentation

Matlab wrapper for EDISON mean-shift image segmentation.