The predominance of source code plagiarism in educational and expert contexts has emphasized the boundaries of outdated recognition tools that depend heavily on syntactic similarity, such as string matching and token based assessments. This research work suggested and assesses a semantic code examination based anti-plagiarism system designed to ascertain three divergent types of plagiarism: Type I (superficial changes), Type II (structural modifications), and Type III (logic-preserving transformations). The system incorporates Abstract Syntax Tree (AST) demonstrations, graph based comparison metrics, and supervised machine learning representations to decode abysmal semantic connections amongst code samples. Assessment was piloted on a scraped dataset containing 100 Python code pairs, comprising both plagiarized and non-plagiarized samples. The projected system attained high ordering performance, with a macro averaged precision of 0.92, recall of 0.88, and F1-score of 0.90. AST-based investigation reliably outclassed etymological procedures, predominantly in identifying multifaceted plagiarism: for Type III cases, the semantic method yielded an F1-score of 0.86, matched to 0.55 for string matching methods. Between comparison metrics tested, Tree Edit Distance (TED) accomplished the maximum F1-score (0.93), whereas the joined metric vector presented a stable presentation across all classes (F1-score: 0.90). The Random Forest classifier established higher effectiveness above other machine learning and rule based prototypes, achieving a macro F1-score of 0.90, with a confusion matrix representing high true positive rates across all sessions. These outcomes asserted the effectiveness of semantic and structure aware techniques in discovering varied procedures of code plagiarism and highlighted the significance of incorporating graph theoretic methods with machine learning for robust taxonomy. The verdicts advocate for wider acceptance of semantic detection systems in educational technology, software forensics, and automated code review platforms.
Code Plagiarism Detection, Abstract Syntax Tree (AST), Semantic Analysis, Graph Similarity, Machine Learning, Random Forest, Tree Edit Distance, Code Obfuscation, Multi-class Classification, Software Forensics
IRE Journals:
Idowu Olugbenga Adewumi , Samuel Eleojo Agene , Victoria Bola Oyekunle
"Design and Evaluation of an Anti-Plagiarism System Using Semantic Code Analysis" Iconic Research And Engineering Journals Volume 9 Issue 1 2025 Page 1891-1903
IEEE:
Idowu Olugbenga Adewumi , Samuel Eleojo Agene , Victoria Bola Oyekunle
"Design and Evaluation of an Anti-Plagiarism System Using Semantic Code Analysis" Iconic Research And Engineering Journals, 9(1)