Different concepts in the field of Artificial Intelligence are on the rise these days, generating captions from given images, being one of them. The ability to train a machine to be provided with an image and then it being able to describe the details around the same can be used in various applications, be it robotics, or other businesses. The primary purpose of this paper is to recommend a model which describes images and provides its captions using concepts of Deep Learning and Machine translation. The model aims to detect different types of objects around an image, recognize the relationships between them and then generate the desired captions. The model, developed in Python, is trained using the Flickr 8K dataset in order to accomplish the same. The model was developed using Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM). In addition to discussing VGG16, a variant of CNN that has proven useful in our use case, the paper delves deeply into the fundamental notions of CNN. The wider use of this model is to help the masses, wherein, it can be used in image indexing to help those with visual impairments, also it can be implemented on some social networks and can be used in other applications as well.
Ishaan Taneja , Sunil Maggu "Generating Captions for Images Using Neural Networks" Iconic Research And Engineering Journals Volume 6 Issue 12 2023 Page 214-218
Ishaan Taneja , Sunil Maggu "Generating Captions for Images Using Neural Networks" Iconic Research And Engineering Journals, 6(12)