ImageNet Classification with Deep Convolutional Neural Networks

Abstract

📜 Abstract

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

Description

✨ Summary

The paper ‘ImageNet Classification with Deep Convolutional Neural Networks’ by Krizhevsky, Sutskever, and Hinton in 2012 has significantly influenced computer vision and deep learning domains. This work demonstrated the power of convolutional neural networks (CNNs) with the introduction of ‘AlexNet’, a large-scale, deep neural network that achieved unprecedented success in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). Its novel use of techniques like dropout for regularization and efficient GPU implementation for training set a new standard in image classification.

This approach showcased a major improvement in image classification accuracy and highlighted the potential of deep learning models for complex tasks. The paper has been extensively cited and considered foundational in the rapid growth of deep learning, influencing subsequent architectures such as VGG, GoogleNet, and ResNet. Its impact is also evident in the adoption of CNNs in industries for image and video processing applications, driving advancements in diverse areas like autonomous vehicles, medical imaging, and augmented reality.

Significant references include the VGGNet paper at arXiv:1409.1556, GoogleNet (Inception) described at arXiv:1409.4842, and ResNet at arXiv:1512.03385. These works build upon the principles demonstrated by AlexNet, further enhancing the capabilities and efficiencies of deep neural networks.