Personalized Image-to-Audio Bedtime Story Generation Using Multimodal AI with User Profiling

Krutika Sushil Nikumbh; Dr. Prakash Kene

doi:10.64388/IREV9I11-1718331

Home / Current Issue / Paper 1718331

1718331PublishedVol 9 · Issue 11

Personalized Image-to-Audio Bedtime Story Generation Using Multimodal AI with User Profiling

Krutika Sushil Nikumbh Dr. Prakash Kene

Subject area: Science,Engineering and Technology · Area of research: Multimodal AI and Personalized Storytelling

DOI: https://doi.org/10.64388/IREV9I11-1718331

Abstract

Bedtime stories help children improve imagination, communication skills, and emotional connection with parents. However, generating personalized and engaging bedtime stories daily can be challenging for parents. Existing AI-based storytelling systems mainly rely on text inputs, resulting in generic narratives that lack contextual relevance and emotional adaptation. This paper presents a multimodal framework that transforms an input image into a personalized bedtime audio story using user profile attributes such as age, preferences, and mood. The system combines image understanding, generative language models, and text-to-speech techniques to produce context-aware and emotionally adaptive narratives with calming themes. The proposed approach improves storytelling by integrating visual context and personalization, making stories more engaging and meaningful. This work highlights the potential of multimodal AI in enhancing bedtime routines and supporting child well-being.

Keywords

Multimodal AI, Personalized Storytelling, Image-to-Audio, Text-to-Speech, User Profiling

How to cite this paper

Krutika Sushil Nikumbh, Dr. Prakash Kene "Personalized Image-to-Audio Bedtime Story Generation Using Multimodal AI with User Profiling" Iconic Research And Engineering Journals Volume 9 Issue 11 2026 Page 4443-4449 https://doi.org/10.64388/IREV9I11-1718331

Krutika Sushil Nikumbh, Dr. Prakash Kene "Personalized Image-to-Audio Bedtime Story Generation Using Multimodal AI with User Profiling" Iconic Research And Engineering Journals, vol. 9, no. 11, May. 2026, doi: https://doi.org/10.64388/IREV9I11-1718331

Krutika Sushil Nikumbh, Dr. Prakash Kene (2026). Personalized Image-to-Audio Bedtime Story Generation Using Multimodal AI with User Profiling. Iconic Research And Engineering Journals, 9(11). doi: https://doi.org/10.64388/IREV9I11-1718331

Krutika Sushil Nikumbh, Dr. Prakash Kene "Personalized Image-to-Audio Bedtime Story Generation Using Multimodal AI with User Profiling" Iconic Research And Engineering Journals, vol. 9, no. 11, May. 2026. Crossref, https://doi.org/10.64388/IREV9I11-1718331

@article{1718331,
      author = {Krutika Sushil Nikumbh, Dr. Prakash Kene},
      title = {Personalized Image-to-Audio Bedtime Story Generation Using Multimodal AI with User Profiling},
      journal = {Iconic Research And Engineering Journals},
      year = {2026},
      volume = {9},
      number = {11},
      pages = {4443-4449},
      issn = {2456-8880},
      url = {https://www.irejournals.com/formatedpaper/1718331.pdf},
      abstract = {Bedtime stories help children improve imagination, communication skills, and emotional connection with parents. However, generating personalized and engaging bedtime stories daily can be challenging for parents. Existing AI-based storytelling systems mainly rely on text inputs, resulting in generic narratives that lack contextual relevance and emotional adaptation. This paper presents a multimodal framework that transforms an input image into a personalized bedtime audio story using user profile attributes such as age, preferences, and mood. The system combines image understanding, generative language models, and text-to-speech techniques to produce context-aware and emotionally adaptive narratives with calming themes. The proposed approach improves storytelling by integrating visual context and personalization, making stories more engaging and meaningful. This work highlights the potential of multimodal AI in enhancing bedtime routines and supporting child well-being.},
      keywords = {Multimodal AI, Personalized Storytelling, Image-to-Audio, Text-to-Speech, User Profiling},
      month = {May},
      doi = {https://doi.org/10.64388/IREV9I11-1718331}
  }