A Comprehensive Review of Convolutional Neural Network Architectures and Evolution

Volume: 12 | Issue: 1 | Year 2026 | Subscription
International Journal of Image Processing and Pattern Recognition
Received Date: 10/14/2025
Acceptance Date: 12/22/2025
Published On: 2026-02-18
First Page: 32
Last Page: 40

Journal Menu


By: Swarnali Kundu, Biswasri Datta, Tousif Parvej, Pritam Pal, and Fakruddin Ali Ahmed.

1-4 Student, Department of Information Technology, B. P. Poddar Institute of Management & Technology, Kolkata, West Bengal, India
5 Assistant Professor, Department of Information Technology, B. P. Poddar Institute of Management & Technology, Kolkata, West Bengal, India

Abstract

Convolutional Neural Networks (CNNs) have become a foundational deep learning framework in computer vision because they can automatically extract layered, increasingly complex features directly from raw image data. CNNs use convolutional, pooling, and activation layers to extract spatial patterns from basic edges to intricate object components, drawing inspiration from the human visual cortex. An overview of CNNs’ basic architecture is given in this study, along with an explanation of important elements such kernels, convolution processes, pooling strategies, activation functions, and fully connected layers. To show how deep learning architectures have evolved in terms of depth, computational efficiency, and feature extraction capabilities, classic CNN models such as LeNet, AlexNet, VGGNet, and GoogLeNet are examined. Advances in large-scale picture categorization and recognition problems have been greatly aided by these structures. CNNs still have drawbacks despite their effectiveness, including high processing costs, the need for large labeled datasets, and interpretability issues. In order to serve real-time and embedded applications, future developments are anticipated to concentrate on lightweight architectures, increased model transparency, and improved training efficiency. All things considered, CNNs continue to be crucial to contemporary artificial intelligence research because they let machines to process and comprehend visual data with ever-increasing precision and resilience

Loading

Citation:

How to cite this article: Swarnali Kundu, Biswasri Datta, Tousif Parvej, Pritam Pal, and Fakruddin Ali Ahmed A Comprehensive Review of Convolutional Neural Network Architectures and Evolution. International Journal of Image Processing and Pattern Recognition. 2026; 12(1): 32-40p.

How to cite this URL: Swarnali Kundu, Biswasri Datta, Tousif Parvej, Pritam Pal, and Fakruddin Ali Ahmed, A Comprehensive Review of Convolutional Neural Network Architectures and Evolution. International Journal of Image Processing and Pattern Recognition. 2026; 12(1): 32-40p. Available from:https://journalspub.com/publication/ijippr/article=26340

Refrences:

  1. Hubel DH, Wiesel TN. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol. 1962;160:106–154.
  2. Fukushima K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern. 1980 Apr;36(4):193–202.
  3. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, et al. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989 Dec;1(4):541–551.
  4. LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 2002 Aug;86(11):2278–2324.
  5. Karayiannis N, Venetsanopoulos AN. Artificial neural networks: Learning algorithms, performance evaluation, and applications. 1st ed. New York (NY): Springer Science & Business Media; 2013. p. 1–450.
  6. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017 May;60(6):84–90.
  7. Li LJ, Su H, Lim Y, Fei-Fei L. Object bank: An object-level image representation for high-level visual recognition. Int J Comput Vis. 2014 Mar;107(1):20–39.
  8. Jarrett K, Kavukcuoglu K, Ranzato MA, LeCun Y. What is the best multi-stage architecture for object recognition? In: Proceedings of the 2009 IEEE 12th International Conference on Computer Vision; 2009 Sep 29–Oct 2; Kyoto, Japan. IEEE; 2009. p. 2146–2153.
  9. Krizhevsky A, Hinton G. Convolutional deep belief networks on CIFAR-10. Unpublished manuscript. 2010 Aug;40(7):1–9.
  10. Zeiler MD, Taylor GW, Fergus R. Adaptive deconvolutional networks for mid and high level feature learning. In: Proceedings of the 2011 International Conference on Computer Vision; 2011 Nov 6–13; Barcelona, Spain. IEEE; 2011. p. 2018–2025.
  11. Shin JS, Ma J, Choi SJ, Kim S, Hong M. Development of a deep learning model for predicting speech audiometry using pure-tone audiometry data. Appl Sci (Basel). 2024 Oct 15;14(20):9379.
  12. Shawky NE. Convolutional neural network and its applications in artificial intelligence. J ACS Adv Comput Sci. 2021 Jun;12(1):10–26.
  13. Balas VE, Kumar R, Srivastava R, editors. Recent trends and advances in artificial intelligence and internet of things. 1st ed. Cham (CH): Springer International Publishing; 2020. p. 1–520.