Aspects of visual intelligence

Atoms of recognition

This project aims to explore the recognition capacity of tiny image patches of objects, object parts and object interactions, we call MIRC - MInimal Recognizable Configurations. These tiny images are minimal in the sense that any cropped or down-scaled sub-images of them, are no longer recognizable for human observers. The project addresses both psychophysical aspects of this capacity, as well as computational recognition mechanisms to support such a capacity.
What takes the brain so long: Object recognition at the level of minimal images develops for up to seconds of presentation time

Rich empirical evidence has shown that visual object recognition in the brain is fast and effortless, with relevant brain signals reported to start as early as 80 ms. Here we study the time trajectory of the recognition process at the level of minimal recognizable images (termed MIRC). Subjects were assigned to one of nine exposure conditions: 50 up to 2000 ms with or without masking, and unlimited time. The subjects were not limited in time to respond after presentation. The results show that in the masked conditions, recognition rates develop gradually over an extended period, e.g. average of 18% for 200 ms exposure and 45% for 500 ms, increasing significantly with longer exposure even above 2 secs. When presented for unlimited time (until response), MIRC recognition rates were equivalent to the rates of full-object images presented for 50 ms followed by masking. What takes the brain so long to recognize such images? We discuss why processes involving eye-movements, perceptual decision-making and pattern completion are unlikely explanations. Alternatively, we hypothesize that MIRC recognition requires an extended top-down process complementing the feed-forward phase.
Scene understanding

Visual scene understanding is a fundamental cognitive ability, which involves integration of multiple processes at different levels, including depth and spatial relations, object detection and recognition of actions and interactions.
Trying to answer a question such as, "What happens in this image?", a child is able to answer something like, "A boy was climbing on a tree trunk to help a small kitten down, while two other children and a dog were watching."
We are interested in understanding the computational mechanisms underlying this remarkable cognitive ability in humans, and aim to develop computational models based on these mechanisms towards artificial human-level scene understanding.
Variable resolution

The ability to recognize and segment-out small objects and object parts in a cluttered background is challenging for current visual models, including deep neural networks. In contrast, humans master this ability, once fixated at the target object. In this project we explore a variable resolution model for object recognition, inspired by the human retinal variable resolution system. Given restricted computational resources, the model acquires visual information at a high resolution around the fixation point, on the expense of lower resolution at the periphery. We evaluate the efficiency of the model by comparing its performance with an alternative constant resolution model. We test the model's ability to 'fixate' on the target by applying the model iteratively to a set of test images, and compare the results with human fixation trajectory given the same image stimuli.
To Which Out-Of-Distribution Object Orientations Are DNNs Capable of Generalizing?

The capability of Deep Neural Networks (DNNs) to recognize objects in orientations outside the distribution of the training data, ie. out-of-distribution (OoD) orientations, is not well understood. For humans, behavioral studies showed that recognition accuracy varies across OoD orientations, where generalization is much better for some orientations than for others. In contrast, for DNNs, it remains unknown how generalization abilities are distributed among OoD orientations. In this paper, we investigate the limitations of DNNs' generalization capacities by systematically inspecting patterns of success and failure of DNNs across OoD orientations. We use an intuitive and controlled, yet challenging learning paradigm, in which some instances of an object category are seen at only a few geometrically restricted orientations, while other instances are seen at all orientations. The effect of data diversity is also investigated by increasing the number of instances seen at all orientations in the training set. We present a comprehensive analysis of DNNs' generalization abilities and limitations for representative architectures (ResNet, Inception, DenseNet and CORnet). Our results reveal an intriguing pattern -- DNNs are only capable of generalizing to instances of objects that appear like 2D, ie. in-plane, rotations of in-distribution orientations.
Adaptive parts model project

This project aims to explore the adaptation of a visual recognition system in a dynamically changing environment. Given an initial model of an object category from a certain viewing direction, we suggest a mechanism to extend recognition capabilities to new viewing directions, by observing unlabeled natural video streams.

Aspects of visual intelligence

Atoms of recognition

What takes the brain so long: Object recognition at the level of minimal images develops for up to seconds of presentation time

Scene understanding

Variable resolution

To Which Out-Of-Distribution Object Orientations Are DNNs Capable of Generalizing?

Adaptive parts model project