Journal Menu
By: Abarna J, Charubala A, Abinaya S, Matheswaran P, and Atchaya N.
1Assistant Professor, Department of Computer Science and Engineering, K. Ramakrishnan College of Technology, Tiruchirappalli, Tamil Nadu, India
2-5Student, Department of Computer Science and Engineering, K. Ramakrishnan College of Technology, Tiruchirappalli, Tamil Nadu, India
Generating descriptive texts for images is a complex and challenging task that requires advanced deep learning techniques. This paper presents a descriptive sentence generator for images using a combination of convolutional neural networks and recurrent neural networks. The proposed model utilizes a pretrained convolutional neural network, Xception, to extract image features, which are then processed by a long short-term memory network to generate meaningful textual descriptions. The model is trained on the Flickr8k dataset, which contains 8000 images with five different textual descriptions for each image. The integration of Xception and long short-term memory ensures effective feature extraction and sequential data processing, leading to improved accuracy in caption generation. The results demonstrate that the proposed model can generate grammatically correct and contextually relevant image captions. Further improvements can be achieved by training on larger datasets, such as Flickr16k or Flickr32k, and utilizing advanced hardware for enhanced computational efficiency. This research contributes to the field of deep learning by providing a practical approach to automatic image captioning, which can be beneficial for visually impaired individuals and various academic applications.
Convolutional neural network, deep learning, Flickr8k, LSTM, recurrent neural network, sentence generator, Xception
![]()
Citation:
Refrences:
- Wang H, Zhang Y, Yu X. An overview of image caption generation methods. Comput Intell Neurosci. 2020;2020(1):3062706.
- Krishnakumar B, Kousalya K, Gokul S, Karthikeyan R, Kaviyarasu D. Image caption generator using deep learning. Int J Adv Sci Technol. 2020;29(3s):975–80.
- Hossain MZ, Sohel F, Shiratuddin MF, Laga H. A comprehensive survey of deep learning for image captioning. ACM Comput Surv. 2019;51(6):1–36.
- Alahmadi R, Park CH, Hahn J. Sequence-to-sequence image caption generator. In: Eleventh International Conference on Machine Vision (ICMV 2018). Bellingham (WA): SPIE; 2019. p. 85–91. (Proc. SPIE; vol. 11041).
- Panicker MJ, Upadhayay V, Sethi G, Mathur V. Image caption generator. Int J Innov Technol Explor Eng (IJITEE). 2021;10(3):1–5.
- Sharma G, Kalena P, Malde N, Nair A, Parkar S. Visual image caption generator using deep learning. In: 2nd International Conference on Advances in Science & Technology (ICAST). 2019.
- Dandwate P, Shahane C, Jagtap V, Karande SC. Comparative study of Transformer and LSTM network with attention mechanism on image captioning. In: International Conference on Information and Communication Technology for Intelligent Systems. Singapore: Springer Nature; 2023. p. 527–39.
- Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. p. 779–88.
- Hasan MZ, Ahamed MS, Rakshit A, Hasan KZ. Recognition of jute diseases by leaf image classification using convolutional neural network. In: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT). Piscataway (NJ): IEEE; 2019. p. 1–5.
- Kuo CC. Understanding convolutional neural networks with a mathematical model. J Vis Commun Image Represent. 2016;41:406–13.
