Cyber threat detection using voice and speech analysis

Kaosar Hossain 1, *, Sufia Zareen 2 and Sahadat Khandakar 3

1 Student, BSc in Computer Science, American International University-Bangladesh.
2 Student, Masters in Genetics, Osmania University, Hyderabad, India.
3 Student, BSc in Electrical Engineering, BRAC University, Bangladesh.
 
Research Article
World Journal of Advanced Research and Reviews, 2021, 10(03), 508-517
Article DOI: 10.30574/wjarr.2021.10.3.0254
 
Publication history: 
Received on 26 April 2021; revised on 19 June 2021; accepted on 28 June 2021
 
Abstract: 
Using voice interactions for fraud detection has become an active field of research as the number of fraudulent activities through this channel has increased. Current systems fail to identify threatening voices because usual and malignant voice features are not detected with the conventional methods of analysis. However, in particular, we would examine how Visions Transformer (ViT) models could be explored and their capability of complex pattern recognition in voice signals can be utilised for speech fraud detection. Voiceprints are used not just in the voice-based ID process; they also squeeze more samples out of a system that can otherwise be squirrelly at detecting when users are not who they say they are, something significant given how voiceprint use is on the ascendency for user verification in things like customer service and telecommunications settings. We address this by using ViT, specially trained on male and female voices, to detect light abnormalities in the speech pattern caused by fraud. Figure 18 shows the comparative analysis, which proves the effectiveness of the proposed method, where ViT achieved an accuracy of 95% and also outperforms classic models like CNN in terms of precision, recall, and F1-score. However, they also show signs of overfitting, which is an issue where the model performs very well on training data but fails to generalize to the validation set. In short, ViT seems to hold potential for voice fraud detection; however, additional efforts are necessary in the area of regularization, early stopping, and possibly also data augmentation to prevent overfitting and improve performance when it comes to out-of-sample accuracy.
 
Keywords: 
Voice-Based Fraud; Detection Vision Transformer (Vit); Machine Learning; Anomaly Detection; Overfitting; Precision and Recall; Data Augmentation; Model Generalization; Speech Signal Processing
 
Full text article in PDF: 
Share this