Scene Gist Categorization from Perceptual Relations

Humans has this remarkable ability to comprehend visual scenes rapidly and accurately. Whether we quickly change television channels, browse photo albums, or simply trying to cross the road, our visual system is working with superb efficiency, accuracy, and speed to extract the meaning of each scene. Examine the rapid sequence of scenes on the right. Is it fair to say that you can indeed grasp the gist of most?

But what characterizes visual processing underlying this visual categorization process? In this project we focus on one aspect of this question related to prior knowledge about the perceptual relations between the different scene categories. To date, computational algorithms for scene categorization rarely consider the possible effect of such perceptual relations. However, even intuitively, when our visual system observes a bedroom scene for a fraction of a second and "deliberates" how to categorize it, what possibly comes to mind in addition to "bedroom" are perhaps classes like "living room" or "kitchen". It appears as if our visual system does not even consider possibilities such as "coast" or "highway", or more generally, scenes which are perceptually "distant" from the observable reference class. Put differently, prior knowledge about the perceptual relations between the different categories of scenes may help facilitate better, more efficient, and faster categorization.




Online Experiment

Play and Contribute

The formal manifestation of the basic idea described above requires the extraction of the perceptual relations between scene classes. While we use controlled lab experiments to collect these data for small number of classes (see papers), doing the same with larger sets becomes infeasible without an enormous number of subjects, something that may only be possible by harnesting the power of the web. We therefore invite you to play our perceptual game and contribute to this data collection ins a Pair-Matching Categorization (PMC) experiment. Please click on the image on the left to begin. Please note that the web-based experiment requires a web browser and just 4-5 short minutes of your time. (Unfortunately, at this point the software is supported on Windows platforms only). We greatly appreciate your participation and we thank you for helping the collection of these data.




Papers and Presentations


Who and Where...
This research is a joint work by Ilan Kadar and Ohad Ben-Shahar of the Computer Science Department, Ben-Gurion University of The Negev, Beer Sheva, Israel. Different parts of it have been presented in various computational and biological vision conferences, including the CVPR 2012, VSS 2011, and VSS 2012.


This work was funded in part by the European Commission in the 7th Framework Programme (CROPS GA no 246252). We also thank the generous support of the Frankel fund, the Paul Ivanier center for Robotics Research and the Zlotowski Center for Neuroscience at Ben-Gurion University.