Doctoral defence: Tarun Khajuria "Scene understanding in human and computer vision“

Tarun Khajuria
  • 28 Apr 2026
  • 10:15–13:15
  • Delta Study Building (Narva mnt 18-1017), and online
  • UT Institute of Computer Science
Doctoral defence

On 28 April at 10:15, Tarun Khajuria will defend his doctoral thesis “Scene understanding in human and computer vision” to obtain the degree of Doctor of Philosophy (in Computer Science).

Supervisor
Assoc. Prof. Jaan Aru, University of Tartu

Opponents
Prof. Tim Kietzmann, University of Osnabrück (Germany)
Assist. Prof. Stephane Deny, Aalto University (Finland)

Summary
Humans have the ability to flexibly interpret the same visual scene in multiple ways. For example, in a cinema hall, we can identify individual seats as chairs, bean bags, or couches, while also perceiving them as part of the larger structure of rows and sections that define walkable paths. This flexibility also enhances the robustness of our visual system under challenging conditions by utilising structure to infer missing information, while also allowing us to ignore irrelevant objects in the scene to avoid spurious associations. In this way, when trying to understand a scene, human vision is not just about passively receiving information from the environment. Rather, it involves actively collecting information to make sense of the scene.

In this thesis, we explored the similarities and differences between human and machine vision in terms of active understanding of the scene. For this, we first designed a challenging vision task, hiding objects in images inspired by star constellations, where human participants solving this task reported iteratively refining multiple hypotheses during their solving process. We then explored how this process can be replicated using computer models. Specifically, we tested a method that generates possible interpretations of objects in constellation images and examined how well it mirrors human perception. Finally, we tested many AI models to examine how they process multiple objects in natural scenes. This analysis highlighted shortcomings in the representation of less important background objects in these models, which helps make more optimal use of them in AI systems.

Overall, the findings of this thesis offer insights into how people make sense of uncertain visual information and suggest ways in which computer models can be tested and designed to mimic this ability. These results can contribute to our understanding of human perception and help advance artificial vision systems beyond simple pattern recognition.

  • 28 Apr 2026
  • 10:15–13:15
  • Delta Study Building (Narva mnt 18-1017), and online
  • UT Institute of Computer Science
Doctoral defence