Department of Computer Science and Engineering, Sipna College of Engineering and Technology Amravati, Maharashtra, India.
World Journal of Advanced Research and Reviews, 2026, 30(02),1140-1151
Article DOI: 10.30574/wjarr.2026.30.2.1169
Received on 22 March 2026; revised on 06 May 2026; accepted on 09 May 2026
The exponential proliferation of synthetic media, colloquially known as "deepfakes," driven by advanced Generative Adversarial Networks (GANs) and diffusion models, presents a formidable challenge to digital forensics, personal privacy, and societal trust. While Convolutional Neural Networks (CNNs) have historically served as the cornerstone for detecting such manipulations, they frequently exhibit limitations regarding generalization to unseen manipulation algorithms and robustness against real-world distortions. This paper introduces DeepShield, an industry-grade, full-stack deepfake detection web application powered by a fine-tuned SigLIP2 (Sigmoid Loss for Image-Image Pre-training) vision-language encoder. Unlike traditional CNN-based approaches that rely solely on hierarchical spatial feature extraction, the proposed model utilizes a transformer-based architecture pre-trained with sigmoid loss, enabling the capture of global semantic context and subtle texture inconsistencies.
The system was evaluated on the prithiv ML mods/Open Deepfake-Preview dataset, achieving an overall accuracy of 94.44%. The model demonstrated exceptional performance, achieving a precision of 97.18% for the "Fake" class and a recall of 97.34% for the "Real" class, significantly minimizing false accusations in forensic scenarios. Furthermore, this research bridges the gap between theoretical modeling and practical application by implementing a user-centric forensic interface featuring an interactive Region of Interest (ROI) selector and temporal video analysis. Comparative analysis reveals that the proposed SigLIP2 model outperforms standard CNN architectures and existing Convolutional Vision Transformer (CViT) benchmarks, offering a robust, scalable solution for digital media authentication.
Deepfake Detection; Siglip 2; Vision Transformers; Digital Forensics; Flask; Web Application; Generative Adversarial Networks
Preview Article PDF
Chaitali Charandas Daware, V. K. Shandilya and N. P. Mohod. Deepfake Image Detection: From CNN to Vision Transformer. World Journal of Advanced Research and Reviews, 2026, 30(02), 1241-1255. Article DOI: https://doi.org/10.30574/wjarr.2026.30.2.1169