Image Caption Generator Implementation

Volume: 10 | Issue: 02 | Year 2024 | Subscription
International Journal of Image Processing and Pattern Recognition
Received Date: 05/06/2024
Acceptance Date: 06/22/2024
Published On: 2024-10-08
First Page:
Last Page:

Journal Menu

By: Sejal Jain, Saloni Agarwal, Sonam Gour, Sanjivani Sharma, and Shrishti Agarwal

Abstract

Image caption generation is a software technology that takes an image as input and produces a descriptive caption in text form. In the modern era, this technology finds application in various fields. For instance, automatically generating captions for medical images aids in diagnosis and enhances reporting efficiency, helping healthcare professionals to quickly interpret complex visuals. In the realm of autonomous vehicles, image captioning enables these vehicles to understand and communicate about their surroundings, thereby improving safety and navigation. Furthermore, in journalism, generating captions for news images can enhance comprehension and engagement for readers. This paper will provide an overview of the technologies that can be used to develop an image caption generator using the Flickr8K dataset from Kaggle. The implementation includes various tools like OpenCV, which are widely utilized by leading tech companies such as Google and Microsoft. The paper also includes snapshots of the generated outputs to illustrate the model’s effectiveness. The primary aim of this implementation is to gain insights into the practical use of these tools and technologies in real-world projects.

Loading

Citation:

How to cite this article: Sejal Jain, Saloni Agarwal, Sonam Gour, Sanjivani Sharma, and Shrishti Agarwal, Image Caption Generator Implementation. International Journal of Image Processing and Pattern Recognition. 2024; 10(02): -p.

How to cite this URL: Sejal Jain, Saloni Agarwal, Sonam Gour, Sanjivani Sharma, and Shrishti Agarwal, Image Caption Generator Implementation. International Journal of Image Processing and Pattern Recognition. 2024; 10(02): -p. Available from:https://journalspub.com/publication/ijippr-v10i02-11101/

Refrences:

  1. Liu S, Bai L, Hu Y, Wang H. Image captioning based on deep neural networks. InMATEC web of conferences 2018 (Vol. 232, p. 01052). EDP Sciences.
  2. Farhadi A, Hejrati M, Sadeghi MA, Young P, Rashtchian C, Hockenmaier J, Forsyth D. Every picture tells a story: Generating sentences from images. InComputer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV 11 2010 (pp. 15-29). Springer Berlin Heidelberg.
  3. Feng Y, Lapata M. Automatic caption generation for news images. IEEE transactions on pattern analysis and machine intelligence. 2012 May 29;35(4):797-812.
  4. Chen J, Dong W, Li M. Image Caption Generator Based On Deep Neural Networks. 2014. Available from https://www.cs.ubc.ca/~carenini/TEACHING/CPSC503-19/FINAL-PROJECTS-2016/image_caption_generator_final_report.pdf
  5. Vinyals O, Toshev A, Bengio S, Erhan D. Show and tell: A neural image caption generator. InProceedings of the IEEE conference on computer vision and pattern recognition 2015 (pp. 3156-3164).
  6. Liu C, Wang C, Sun F, Rui Y. Image2Text: a multimodal caption generator. InACM Multimedia 2016 (pp. 746-748).
  7. Hochreiter S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 1998 Apr;6(02):107-16.
  8. Tanti M, Gatt A, Camilleri KP. Where to put the image in an image caption generator. Natural Language Engineering. 2018 May;24(3):467-89.
  9. Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K. Sequence to sequence-video to text. In Proceedings of the IEEE international conference on computer vision 2015 (pp. 4534-4542).
  10. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. 2014 Jun 3.
  11. Yang L, Hu H. TVPRNN for image caption generation. Electronics Letters. 2017 Oct;53(22):1471-3.
  12. Blandfort P, Karayil T, Borth D, Dengel A. Image captioning in the wild: how people caption images on Flickr. InProceedings of the workshop on multimodal understanding of social, affective and subjective attributes 2017 Oct 27 (pp. 21-29).
  13. Papineni K, Roukos S, Ward T, Zhu WJ. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics 2002 Jul (pp. 311-318).
  14. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y. Show, attend and tell: Neural image caption generation with visual attention. InInternational conference on machine learning 2015 Jun 1 (pp. 2048-2057). PMLR.
  15. Papineni K. BLEU: a method for automatic evaluation of MT. Research Report, Computer Science RC22176 (W0109-022). 2001.
  16. Banerjee S, Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. InProceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization 2005 Jun (pp. 65-72).
  17. Lin CY. Rouge: A package for automatic evaluation of summaries. InText summarization branches out 2004 Jul (pp. 74-81).
  18. Vedantam R, Lawrence Zitnick C, Parikh D. Cider: Consensus-based image description evaluation. InProceedings of the IEEE conference on computer vision and pattern recognition 2015 (pp. 4566-4575).
  19. Anderson P, Fernando B, Johnson M, Gould S. Spice: Semantic propositional image caption evaluation. InComputer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14 2016 (pp. 382-398). Springer International Publishing.