2025 |
Manojlovska, Anastasija; Ramachandra, Raghavendra; Spathoulas, Georgios; Struc, Vitomir; Grm, Klemen Interpreting Face Recognition Templates using Natural Language Descriptions Proceedings Article In: Proceedings of IEEE/CFV Winter Conference on Applications in Computer Vision - Workshops (WACV-W) 2025, pp. 1-10, Tucson, USA, 2025. Abstract | Links | BibTeX | Tags: CLIP, explainability, face recognition, natural language, symbolic representations, xai @inproceedings{Anastasija_WACV25, Explainable artificial intelligence (XAI) aims to ensure an AI system's decisions are transparent and understandable by humans, which is particularly important in potentially sensitive application scenarios in surveillance, security and law enforcement. In these and related areas, understanding the internal mechanisms governing the decision-making process of AI-based systems can increase trust and consequently user acceptance. While various methods have been developed to provide insights into the behavior of AI-based models, solutions capable of explaining different aspects of the models using Natural Language are still limited in the literature. In this paper, we therefore propose a novel approach for interpreting the information content encoded in face templates, produced by state-of-the-art (SOTA) face recognition models. Specifically, we utilize the Text Encoder from the Contrastive Language-Image Pretraining (CLIP) model and generate natural language descriptions of various face attributes present in the face templates. We implement two versions of our approach, with the off-the-shelf CLIP text-encoder and a fine-tuned version using the VGGFace2 and MAADFace datasets.~Our experimental results indicate that the fine-tuned text encoder under the contrastive training paradigm increases the attribute-based explainability of face recognition templates, while both models provide valuable human-understandable insights into modern face recognition models. |