Unsupervised Scene “Understanding”
For very large scientific datasets with many image classes and objects, producing the ground-truth data for supervised (trained) algorithms can represent a substantial, and potentially expensive, human effort. In these situations there is scope for the use of unsupervised approaches, such as clustering, which can model collections of images and automatically summarise their content without human training.
To explore how modelling context effects clustering results, I derived several new algorithms that simultaneously cluster images and segments (super-pixels) within images. These algorithms also model collections of photos such as photo albums. Images are defined by whole-scene descriptors and the distribution of “objects” (segment clusters) within them. The images and segments are clustered using this joint representation, which is also more interpretable by people. The intuition behind this approach is that by knowing something about the type of scene (image cluster), object detection (segment clustering) can be improved. That is, we are likely to find trees in a forest. Additionally, by knowing about the distribution and co-occurrence of objects in an image, we have a better idea of the type of scene (cows and grass most likely make a rural scene).
These algorithms for unsupervised scene understanding outperform other unsupervised algorithms for segment and scene clustering. This is because of how they model context. These algorithms were even found to be competitive with state of the art supervised and semi-supervised approaches to scene understanding, as well as being scalable to larger datasets. See my ICCV paper, CVIU article and my thesis (ch. 5 & 6) for more information.