FusionNet: A parallel deep learning model for speech recognition with feature clustering

Revati Harichandra Ramteke; Seema B. Rathod

doi:10.30574/wjarr.2025.28.3.4010

Revati Harichandra Ramteke ^{1, *} and Seema B. Rathod ²

¹ Research Scholar, MTech Computer Science and Engineering, SIPNA College of Engineering and Technology, Amravati.

² Professor, Computer Science and Engineering, SIPNA College of Engineering and Technology, Amravati.

Research Article

World Journal of Advanced Research and Reviews, 2025, 28(03), 001-008

Article DOI: 10.30574/wjarr.2025.28.3.4010

DOI url: https://doi.org/10.30574/wjarr.2025.28.3.4010

Publication history

Received 22 October 2025; revised on 29 November 2025; accepted on 01 December 2025

Abstract

FusionNet is a parallel, hybrid deep-learning framework engineered for next-generation speech recognition and on-device speech-to-text processing. The system is implemented as an Android application (Java/XML) and integrated with Firebase Realtime Database to support secure, user-centric data management. Audio input undergoes a multi-stage preprocessing pipeline where MFCC, spectral, and temporal features are extracted and clustered using K-Means to group acoustically similar speech segments. These clustered representations are simultaneously processed through a dual-branch architecture: a Convolutional Neural Network (CNN) that learns spectral signatures and a Bidirectional Long Short-Term Memory (BiLSTM) network that models temporal dependencies. The fused embeddings are then classified using a Random Forest classifier, improving prediction stability in noisy or accent-variable conditions.

To enhance semantic clarity, an NLP engine supported by a generative AI model refines the raw transcriptions, corrects contextual errors, and extracts user intent. Real-time inference is achieved via TensorFlow Lite (TFLite), enabling low-latency, energy-efficient execution directly on mobile hardware without cloud dependency. FusionNet demonstrates robustness against ambient noise, speaker variability, and multilingual inputs, making it a practical and scalable solution for voice-driven applications. This hybrid architecture effectively combines clustering, parallel deep learning, classical ML classification, and generative AI reasoning to deliver an intelligent, high-accuracy speech recognition system tailored for real-world deployment.

Keywords

Speech Recognition; Fusionnet; MFCC; CNN–Bilstm; Feature Clustering; K-Means; Random Forest; NLP; Generative AI; Speech-To-Text; On-Device AI; Tensorflow Lite; Mobile Deep Learning; Firebase Realtime Database; Multilingual Processing

Download Article PDF

https://wjarr.com/sites/default/files/fulltext_pdf/WJARR-2025-4010.pdf

Preview Article PDF

How to cite this article

Revati Harichandra Ramteke and Seema B. Rathod. FusionNet: A parallel deep learning model for speech recognition with feature clustering. World Journal of Advanced Research and Reviews, 2025, 28(3), 001-008. Article DOI: https://doi.org/10.30574/wjarr.2025.28.3.4010

FusionNet: A parallel deep learning model for speech recognition with feature clustering

Revati Harichandra Ramteke ^{1, *} and Seema B. Rathod ²

Preview Article PDF

Get Certificates

Issue details

FusionNet: A parallel deep learning model for speech recognition with feature clustering

Revati Harichandra Ramteke 1, * and Seema B. Rathod 2

Preview Article PDF

Get Certificates

Issue details

Revati Harichandra Ramteke ^{1, *} and Seema B. Rathod ²