Neural Machine Translation of Source Code Across Programming Languages Using Transformer Architecture

Pallavi Mahale

doi:10.64388/IREV9I12-1719036

Home / Current Issue / Paper 1719036

1719036PublishedVol 9 · Issue 12

Neural Machine Translation of Source Code Across Programming Languages Using Transformer Architecture

Pallavi Mahale

Subject area: Science,Engineering and Technology · Area of research: Neural network

DOI: https://doi.org/10.64388/IREV9I12-1719036

Abstract

The automatic translation of source code between programming languages is a critical challenge in modern software engineering, particularly for legacy system migration and cross-platform development. This paper proposes a Transformer-based Neural Machine Translation (NMT) framework specifically designed for source-to-source code translation, targeting language pairs including Python↔Java, Python↔C++, and Java↔C++. Unlike traditional rule-based transpilers, our approach leverages pre-trained code models (CodeT5+) fine-tuned on a curated multilingual parallel corpus, augmented with Abstract Syntax Tree (AST) structural embeddings to better capture code semantics. We introduce a novel post-processing semantic validation module using unit-test execution feedback and compiler signals to iteratively refine translations and maximize functional equivalence. Experimental evaluation on standard benchmarks (TransCoder-test, AVATAR, CodeNet) demonstrates state-of-the-art Computational Accuracy (CA@1) scores and significant reductions in compilation errors compared to baseline models. Our work addresses the key open challenge of semantic preservation in neural code translation, contributing both a novel architecture and a new evaluation protocol.

Keywords

Neural Machine Translation, Source Code Translation, Transformer Architecture, CodeT5, Abstract Syntax Tree, Semantic Preservation, Legacy Migration

How to cite this paper

Pallavi Mahale "Neural Machine Translation of Source Code Across Programming Languages Using Transformer Architecture" Iconic Research And Engineering Journals Volume 9 Issue 12 2026 Page 2039-2047 https://doi.org/10.64388/IREV9I12-1719036

Pallavi Mahale "Neural Machine Translation of Source Code Across Programming Languages Using Transformer Architecture" Iconic Research And Engineering Journals, vol. 9, no. 12, Jun. 2026, doi: https://doi.org/10.64388/IREV9I12-1719036

Pallavi Mahale (2026). Neural Machine Translation of Source Code Across Programming Languages Using Transformer Architecture. Iconic Research And Engineering Journals, 9(12). doi: https://doi.org/10.64388/IREV9I12-1719036

Pallavi Mahale "Neural Machine Translation of Source Code Across Programming Languages Using Transformer Architecture" Iconic Research And Engineering Journals, vol. 9, no. 12, Jun. 2026. Crossref, https://doi.org/10.64388/IREV9I12-1719036

@article{1719036,
      author = {Pallavi Mahale},
      title = {Neural Machine Translation of Source Code Across Programming Languages Using Transformer Architecture},
      journal = {Iconic Research And Engineering Journals},
      year = {2026},
      volume = {9},
      number = {12},
      pages = {2039-2047},
      issn = {2456-8880},
      url = {https://www.irejournals.com/formatedpaper/1719036.pdf},
      abstract = {The automatic translation of source code between programming languages is a critical challenge in modern software engineering, particularly for legacy system migration and cross-platform development. This paper proposes a Transformer-based Neural Machine Translation (NMT) framework specifically designed for source-to-source code translation, targeting language pairs including Python↔Java, Python↔C++, and Java↔C++. Unlike traditional rule-based transpilers, our approach leverages pre-trained code models (CodeT5+) fine-tuned on a curated multilingual parallel corpus, augmented with Abstract Syntax Tree (AST) structural embeddings to better capture code semantics. We introduce a novel post-processing semantic validation module using unit-test execution feedback and compiler signals to iteratively refine translations and maximize functional equivalence. Experimental evaluation on standard benchmarks (TransCoder-test, AVATAR, CodeNet) demonstrates state-of-the-art Computational Accuracy (CA@1) scores and significant reductions in compilation errors compared to baseline models. Our work addresses the key open challenge of semantic preservation in neural code translation, contributing both a novel architecture and a new evaluation protocol.},
      keywords = {Neural Machine Translation, Source Code Translation, Transformer Architecture, CodeT5, Abstract Syntax Tree, Semantic Preservation, Legacy Migration},
      month = {June},
      doi = {https://doi.org/10.64388/IREV9I12-1719036}
  }