Silent Speech Recognition (SSR) addresses critical communication challenges in noisy environments and for individuals with speech impairments. This research presents a novel vision-based SSR system employing a Hybrid 3D Convolutional Neural Network (3D-CNN) and Bidirectional Long Short-Term Memory (Bi-LSTM) architecture. Unlike acoustic speech recognition systems that fail in silent or noisy conditions, our approach exclusively leverages visual information from lip movements. The system integrates automated Region-of-Interest (ROI) extraction, spatiotemporal feature learning through 3D convolution, and bidirectional temporal modeling with Connectionist Temporal Classification (CTC) loss. Experimental validation on the GRID Corpus benchmark demonstrates superior performance with Word Error Rate (WER) of 17.06% and Character Error Rate (CER) of 7.12%, representing 44.3% improvement over traditional Hidden Markov Models and 20.3% improvement over 2D- CNN baselines. Ablation studies confirm that 3D convolution con- tributes 4.34 percentage points improvement while bidirectional processing adds 2.14 points. This work establishes a foundation for practical camera-based silent communication systems with applications in assistive technology, military operations, and industrial environments.
Silent Speech Recognition, 3D Convolutional Neural Networks, Bidirectional LSTM, Visual Speech Recognition, Deep Learning, Lip Reading
IRE Journals:
Kaushal Kumar, CP Gokul, Nakul Bharadwaj, Ujjval Sharma, Dr. Nazia Tabassum "Vision-Based Silent Speech Recognition Using Hybrid 3D-CNN and Bi-LSTM Architecture" Iconic Research And Engineering Journals Volume 9 Issue 6 2025 Page 2295-2303 https://doi.org/10.64388/IREV9I6-1713196
IEEE:
Kaushal Kumar, CP Gokul, Nakul Bharadwaj, Ujjval Sharma, Dr. Nazia Tabassum
"Vision-Based Silent Speech Recognition Using Hybrid 3D-CNN and Bi-LSTM Architecture" Iconic Research And Engineering Journals, 9(6) https://doi.org/10.64388/IREV9I6-1713196