2026 |
Larue, Nicolas; Štruc, Vitomir; Peer, Peter; Vu, Ngoc-Son Learning the Manifold of Authenticity: Hybrid-Curvature Representation Learning for Generalizable Deepfake Detection Journal Article In: IEEE Access, pp. 1–14, 2026, ISBN: 2169-3536. Abstract | Links | BibTeX | Tags: deep learning, deepfake, deepfake DAD, deepfake detection, hyperbolic learning, media forensics @article{AccessHyperbolic,The practical utility of deepfake detectors is crippled by a crisis of generalization: models that perform well on known manipulation techniques consistently fail when faced with unseen forgeries.We argue this failure stems from a fundamental geometric mismatch. Existing methods implicitly assume that the manifold of authentic faces can be modeled in a space of uniform curvature, typically Euclidean, which inade-quately captures the complex, multi-scale structure of facial features. This paper validates the hypothesis that authentic faces lie on a manifold whose geometry is inherently hybrid, requiring both angular compactness (a spherical property) and hierarchical organization (a hyperbolic property). To resolve this geometric mismatch, we introduce a novel detector, CTrue, that learns a unified, hybrid-curvature representation of facial authenticity. Trained exclusively on real faces via self-supervised learning, our method simultaneously projects facial embeddings onto two complementary manifolds: a hypersphere to enforce compactness and a hyperbolic space to model the natural feature hierarchy. A single set of mathematically-optimal prototypes acts as a ‘‘geometric bridge’’, unifying the learning objectives in both spaces. At inference, a composite score measures an embedding’s deviation from this learned manifold. On challenging cross-dataset and cross-manipulation benchmarks, our method achieves competitive generalization under a strictly pristine-only training setting, showing that hybrid-curvature representations provide an effective and data-efficient alternative for deepfake detection. |
2025 |
Batagelj, Borut; Kronovšek, Andrej; Štruc, Vitomir; Peer, Peter Robust cross-dataset deepfake detection with multitask self-supervised learning Journal Article In: ICT Express, pp. 1-5, 2025. Abstract | Links | BibTeX | Tags: deepfake, deepfake DAD, deepfake detection, multi-task learning, segmentation @article{DeepFake2025,Deepfake detection is increasingly critical due to the rise of manipulated media. Existing methods often require extensive datasets and struggle with interpretability issues. To address these issues, this study introduces a novel one-class approach for detecting and localizing deepfake artifacts in videos, using authentic images to generate manipulated data for training. By integrating segmentation and leveraging convolutional neural networks with visual transformers, the method predicts both the presence and location of the generated manipulations. Experiments on seven deepfake datasets and emerging diffusion-based manipulations show that our approach consistently outperforms existing methods, demonstrating superior accuracy and localization capabilities. |
Soltandoost, Elahe; Plesh, Richard; Schuckers, Stephanie; Peer, Peter; Struc, Vitomir Extracting Local Information from Global Representations for Interpretable Deepfake Detection Proceedings Article In: Proceedings of IEEE/CFV Winter Conference on Applications in Computer Vision - Workshops (WACV-W) 2025, pp. 1-11, Tucson, USA, 2025. Abstract | Links | BibTeX | Tags: CNN, deepfake DAD, deepfakes, faceforensics++, media forensics, xai @inproceedings{Elahe_WACV2025,The detection of deepfakes has become increasingly challenging due to the sophistication of manipulation techniques that produce highly convincing fake videos. Traditional detection methods often lack transparency and provide limited insight into their decision-making processes. To address these challenges, we propose in this paper a Locally-Explainable Self-Blended (LESB) DeepFake detector that in addition to the final fake-vs-real classification decision also provides information, on which local facial region (i.e., eyes, mouth or nose) contributed the most to the decision process.~At the heart of the detector is a novel Local Feature Discovery (LFD) technique that can be applied to the embedding space of pretrained DeepFake detectors and allows identifying embedding space directions that encode variations in the appearance of local facial features. We demonstrate the merits of the proposed LFD technique and LESB detector in comprehensive experiments on four popular datasets, i.e., Celeb-DF, DeepFake Detection Challenge, Face Forensics in the Wild and FaceForensics++, and show that the proposed detector is not only competitive in comparison to strong baselines, but also exhibits enhanced transparency in the decision-making process by providing insights on the contribution of local face parts in the final detection decision. |
2024 |
Dragar, Luka; Rot, Peter; Peer, Peter; Štruc, Vitomir; Batagelj, Borut W-TDL: Window-Based Temporal Deepfake Localization Proceedings Article In: Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing (MRAC ’24), Proceedings of the 32nd ACM International Conference on Multimedia (MM’24), ACM, 2024. Abstract | Links | BibTeX | Tags: CNN, deepfake DAD, deepfakes, deeplearning, detection, localization @inproceedings{MRAC2024,The quality of synthetic data has advanced to such a degree of realism that distinguishing it from genuine data samples is increasingly challenging. Deepfake content, including images, videos, and audio, is often used maliciously, necessitating effective detection methods. While numerous competitions have propelled the development of deepfake detectors, a significant gap remains in accurately pinpointing the temporal boundaries of manipulations. Addressing this, we propose an approach for temporal deepfake localization (TDL) utilizing a window-based method for audio (W-TDL) and a complementary visual frame-based model. Our contributions include an effective method for detecting and localizing fake video and audio segments and addressing unbalanced training labels in spoofed audio datasets. Our approach leverages the EVA visual transformer for frame-level analysis and a modified TDL method for audio, achieving competitive results in the 1M-DeepFakes Detection Challenge. Comprehensive experiments on the AV-Deepfake1M dataset demonstrate the effectiveness of our method, providing an effective solution to detect and localize deepfake manipulations. |