2025 |
Soltandoost, Elahe; Plesh, Richard; Schuckers, Stephanie; Peer, Peter; Struc, Vitomir Extracting Local Information from Global Representations for Interpretable Deepfake Detection Proceedings Article In: Proceedings of IEEE/CFV Winter Conference on Applications in Computer Vision - Workshops (WACV-W) 2025, pp. 1-11, Tucson, USA, 2025. Abstract | Links | BibTeX | Tags: CNN, deepfake DAD, deepfakes, faceforensics++, media forensics, xai @inproceedings{Elahe_WACV2025, The detection of deepfakes has become increasingly challenging due to the sophistication of manipulation techniques that produce highly convincing fake videos. Traditional detection methods often lack transparency and provide limited insight into their decision-making processes. To address these challenges, we propose in this paper a Locally-Explainable Self-Blended (LESB) DeepFake detector that in addition to the final fake-vs-real classification decision also provides information, on which local facial region (i.e., eyes, mouth or nose) contributed the most to the decision process.~At the heart of the detector is a novel Local Feature Discovery (LFD) technique that can be applied to the embedding space of pretrained DeepFake detectors and allows identifying embedding space directions that encode variations in the appearance of local facial features. We demonstrate the merits of the proposed LFD technique and LESB detector in comprehensive experiments on four popular datasets, i.e., Celeb-DF, DeepFake Detection Challenge, Face Forensics in the Wild and FaceForensics++, and show that the proposed detector is not only competitive in comparison to strong baselines, but also exhibits enhanced transparency in the decision-making process by providing insights on the contribution of local face parts in the final detection decision. |
2024 |
Dragar, Luka; Rot, Peter; Peer, Peter; Štruc, Vitomir; Batagelj, Borut W-TDL: Window-Based Temporal Deepfake Localization Proceedings Article In: Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing (MRAC ’24), Proceedings of the 32nd ACM International Conference on Multimedia (MM’24), ACM, 2024. Abstract | Links | BibTeX | Tags: CNN, deepfake DAD, deepfakes, deeplearning, detection, localization @inproceedings{MRAC2024, The quality of synthetic data has advanced to such a degree of realism that distinguishing it from genuine data samples is increasingly challenging. Deepfake content, including images, videos, and audio, is often used maliciously, necessitating effective detection methods. While numerous competitions have propelled the development of deepfake detectors, a significant gap remains in accurately pinpointing the temporal boundaries of manipulations. Addressing this, we propose an approach for temporal deepfake localization (TDL) utilizing a window-based method for audio (W-TDL) and a complementary visual frame-based model. Our contributions include an effective method for detecting and localizing fake video and audio segments and addressing unbalanced training labels in spoofed audio datasets. Our approach leverages the EVA visual transformer for frame-level analysis and a modified TDL method for audio, achieving competitive results in the 1M-DeepFakes Detection Challenge. Comprehensive experiments on the AV-Deepfake1M dataset demonstrate the effectiveness of our method, providing an effective solution to detect and localize deepfake manipulations. |