The convergence of artificial intelligence (AI), speech processing, and computer graphics has given rise to digital humans that simulate real-world interactions with remarkable accuracy. This research focuses on the design and development of an AI-generated digital human capable of engaging in natural communication through speech, vision, and expression. Unlike conventional chatbots, this system incorporates multilingual speech-to-text and text-to-speech, a 3D avatar capable of lip synchronization and emotional responses, and a vision module for accessibility support. Built with a modular architecture that integrates Flask, Ollama’s Phi language model, Google Speech Recognition, YOLOv8, and Ready Player Me avatars through Three.js, the system enables virtual interactions that are inclusive, expressive, and human-like. Experimental evaluation indicates significant improvements in conversational engagement, accessibility for visually impaired users, and naturalness of avatar interactions. The proposed system represents a step towards highly interactive digital companions with real-world applicability in education, healthcare, customer support, and assistive technologies.
Artificial Intelligence, Digital Human, Human-Computer Interaction, Speech Processing, Multilingual Systems, Accessibility, Virtual Avatar.
IRE Journals:
C Shalini , Bhanupriya K , Madhu Kumar A , Faiza Siddique , Soniya Komal V
"AI Generated Digital Human for Virtual Interactions" Iconic Research And Engineering Journals Volume 9 Issue 2 2025 Page 840-845
IEEE:
C Shalini , Bhanupriya K , Madhu Kumar A , Faiza Siddique , Soniya Komal V
"AI Generated Digital Human for Virtual Interactions" Iconic Research And Engineering Journals, 9(2)