Date

Clustering Images Over Many Datasets

Large image collections are frequently partitioned into distinct but related groups, such as photo albums from distinct environments that contain similar scenes. For example, a hiking holiday album may contain many images of forests and maybe a few villages. Whereas a conference trip album may have many urban scenes and images of people, with perhaps a few images of park-land. These groups, or albums, may be thought of as providing context for the images they contain.

I have formulated and applied a latent Dirichlet allocation-like algorithm to this problem. It shares image clusters between groups or albums, and keeps the proportion of clusters (mixture-weights) specific to each group, thereby modelling the context of the group. By doing this, the algorithm is actually better at finding clusters, and is often faster when dealing with large datasets, than regular mixture model based approaches. See my thesis (ch.4) for more information.

Album clusters