Response and reactions

morshed adnan
Apr 22, 2020
2 min read

In this blog, I would like to discuss about how the chosen paper ”Going Deeper with Convolutions” is structured, analyzed, written and presented. The paper is accepted on the 28th IEEE conference on computer vision and pattern recognition CVPR 2015. This paper has 20805 citations in Google scholar as of today. The paper introduces a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection of image in the ImageNet Large-Scale Visual Recognition challenge 2014.

In general, I found this paper to be an extraordinary research in the area of classification and detection of images on a large scale. The paper is significantly structured into various chapters and properly explained with necessary tables and figures. In contrast, the paper requires readers’ to have intermediary knowledge of convolutional neural network. This paper would however be an ideal start for those who are interested in deep architecture of convolutions and want to explore how the stacked inception module can be used to build such deep network. The approach from the authors (Szegedy et al., 2014) is quiet remarkable as it performs better while keeping the computational expense constant and uses 12 times fewer parameters than the winning architecture of (Krizhevsky & Hinton, n.d.) in the GoogleNet submission.

The authors (Szegedy et al., 2014) subsequently reviewed the related literatures in a dialectic manner and found two major drawbacks towards the improvement of the performance of deep neural networks. The drawbacks include larger number of parameters and increased use of computational resources. Therefore, it can be stated that the authors critically reviewed relevant literatures and found significant research gap.

The main contribution of the paper is explained into two chapters such as architecture details and the implementation of GoogLeNet network. In order to reduce the dimensions of the naïve inception module the authors introduced 1*1 convolutions as the first layer. These convolutions are used to compute reductions before the expensive 3*3 and 5*5 convolutions. The development of GoogLeNet is based on 22 deep layers and builds with several numbers of convolutions, max pool, inception, average pool, dropout etc. stacked layers. Therefore, I think, the setup of the network is really deep and complex which can be helpful in learning big networks with many parameters. The GoogLeNet network became champion in the challenge of large-scale image classification and detection competition of ILSVRC 2014.

In my opinion, I like the overall structure of the paper and the formation of different chapters with proper scientific explanation and logics. This paper definitely increased the level of my understanding of deep convolutional network. However the reader group of this paper seems to be very limited, as it requires in depth understanding of neural networks algorithm.

References:

· Krizhevsky, A., & Hinton, G. E. (n.d.). ImageNet Classification with Deep Convolutional Neural Networks, 1–9.

· Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., … Rabinovich, A. (2014). Going Deeper with Convolutions.

Response and reactions

Recent Posts

Comentarios

Subscribe Form