My diploma thesis: GHR-VQA
GHR-VQA: Graph-guided Hierarchical Relational Reasoning for Video Question Answering
For my diploma thesis, I worked on Video Question Answering, under the co-supervision of Prof. Maragos (NTUA) and Dr. Pitsikalis (Deeplab).
After extensive literature review, I discovered the need for more lightweight representations of semantic information, so I transformed videos into spatio-temporal scene graphs, as a more structured, compact, and comprehensive representation. By encoding these representations into embeddings with Graph Neural Networks, we create rich and compact representations for each video that are passed through a hierarchical conditional relation network to answer video-based questions.
Our approach increases accuracy scores across benchmarks and produces more semantically reasonable results compared to state-of-the-art methods.
This work was submitted to the 33rd European Signal Processing Conference (EUSIPCO 2025) and the pre-print can be found here.