2024 |
Dragar, Luka; Rot, Peter; Peer, Peter; Štruc, Vitomir; Batagelj, Borut W-TDL: Window-Based Temporal Deepfake Localization Proceedings Article In: Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing (MRAC ’24), Proceedings of the 32nd ACM International Conference on Multimedia (MM’24), ACM, 2024. Abstract | Links | BibTeX | Tags: CNN, deepfake DAD, deepfakes, deeplearning, detection, localization @inproceedings{MRAC2024, The quality of synthetic data has advanced to such a degree of realism that distinguishing it from genuine data samples is increasingly challenging. Deepfake content, including images, videos, and audio, is often used maliciously, necessitating effective detection methods. While numerous competitions have propelled the development of deepfake detectors, a significant gap remains in accurately pinpointing the temporal boundaries of manipulations. Addressing this, we propose an approach for temporal deepfake localization (TDL) utilizing a window-based method for audio (W-TDL) and a complementary visual frame-based model. Our contributions include an effective method for detecting and localizing fake video and audio segments and addressing unbalanced training labels in spoofed audio datasets. Our approach leverages the EVA visual transformer for frame-level analysis and a modified TDL method for audio, achieving competitive results in the 1M-DeepFakes Detection Challenge. Comprehensive experiments on the AV-Deepfake1M dataset demonstrate the effectiveness of our method, providing an effective solution to detect and localize deepfake manipulations. |