Wav2Vec Meets Conformer: A Novel Hybrid Approach for Multilingual Deepfake Audio Detection
  • Author(s): Usha Janakiraman; Priyadharshini Ambalavanan; Padmapriya S
  • Paper ID: 1712477
  • Page: 2438-2446
  • Published Date: 03-12-2025
  • Published In: Iconic Research And Engineering Journals
  • Publisher: IRE Journals
  • e-ISSN: 2456-8880
  • Volume/Issue: Volume 9 Issue 5 November-2025
Abstract

Deepfake audio refers to synthetic speech that closely mimics a person?s voice, posing risks to security and privacy. This paper proposes a hybrid detection framework combining XLS-R, a multilingual speech representation model, with the Conformer architecture, which captures both local and global audio dependencies. XLS-R extracts rich multilingual embeddings, while the Conformer leverages temporal and contextual features to distinguish genuine from AI-generated speech. Evaluation on benchmark datasets demonstrates that the proposed system achieves improved accuracy and robustness across multiple languages and acoustic conditions.

Keywords

Conformer, Deepfake Audio, Multilingual Speech Representation, XLS-R

Citations

IRE Journals:
Usha Janakiraman, Priyadharshini Ambalavanan, Padmapriya S "Wav2Vec Meets Conformer: A Novel Hybrid Approach for Multilingual Deepfake Audio Detection" Iconic Research And Engineering Journals Volume 9 Issue 5 2025 Page 2438-2446 https://doi.org/10.64388/IREV9I5-1712477

IEEE:
Usha Janakiraman, Priyadharshini Ambalavanan, Padmapriya S "Wav2Vec Meets Conformer: A Novel Hybrid Approach for Multilingual Deepfake Audio Detection" Iconic Research And Engineering Journals, 9(5) https://doi.org/10.64388/IREV9I5-1712477