Who Really Invented Convolutional Neural Networks? The History of the Technology That Transformed AI

Neureal Network CNN first experiment - Neocognitron.

Many introductions to convolutional neural networks begin with a simple and widely repeated story. According to this version of events, CNNs appeared toward the end of the 1980s when Yann LeCun and colleagues at AT&T Bell Labs demonstrated a neural network that could recognize handwritten digits. Clips from that period, particularly the well known 1989 video, are often shown in lectures and online summaries. As a result, the idea that CNNs began in 1989 has become part of the popular narrative surrounding modern artificial intelligence.

A closer look reveals a much older and more layered history. CNNs did not emerge from a single discovery or a single laboratory. Instead, they grew from several lines of research developed over nearly a decade, beginning in Japan before gaining widespread recognition in Europe and the United States. These efforts led to the first convolutional architecture, the first backpropagation based convolutional model, and the first large scale applications. Each of these innovations played a crucial role in shaping what is now one of the most influential classes of models in machine learning.

This article reconstructs that history in a factual and neutral manner. It clarifies the contributions of Kunihiko Fukushima, whose neocognitron established the structure that defines CNNs. It explains the significance of the 1988 work of Wei Zhang and collaborators, who created the first two dimensional convolutional network trained with backpropagation. It then describes the later achievements of LeCun and his colleagues, who developed practical and widely adopted systems. The purpose is not to elevate one group at the expense of another, but to present an accurate account of how CNNs truly developed.

The Neocognitron and the Origins of the CNN Architecture

In 1979 and 1980, Kunihiko Fukushima introduced a neural architecture that is now recognized as the first deep convolutional neural network. He called his design the neocognitron. The system included several elements that later became central to CNNs. It used a hierarchy of layers that performed localized feature extraction and downsampling. It also incorporated the principle of translational invariance, which allows a network to recognize a pattern even when the pattern appears in slightly different positions.

The parallels between the neocognitron and modern CNNs are substantial. Fukushima’s design included what are now known as convolutional layers and pooling layers. The network extracted increasingly abstract features as information passed through its depth. These ideas appear in virtually all CNNs used today in fields ranging from computer vision to speech processing.

This work was significant for several reasons. First, it introduced a new and sophisticated architecture at a time when computing power was limited and neural networks were not yet widely accepted. Second, it offered an original approach inspired partly by knowledge of the visual cortex, yet it remained a distinct and engineered model rather than an attempt to replicate biology. Finally, it established concepts that would later become central to the success of deep learning.

The main difference between the neocognitron and contemporary CNNs concerns training. Fukushima used unsupervised and self organizing learning procedures instead of backpropagation, which had not yet become standard. These methods allowed the model to develop internal representations but did not involve the gradient based optimization processes that now define most deep learning systems.

Even with this distinction, the neocognitron stands as the first convolutional neural network in the architectural sense. By developing its core structure, Fukushima laid the foundation on which all later CNN models were built.

The 1986 NHK Demonstration of Handwritten Digit Recognition

A particularly important piece of CNN history is a short film produced in 1986 by Fukushima, S. Miyake, and T. Ito at the NHK Science and Technical Research Laboratories. The video shows the neocognitron recognizing handwritten digits. A user writes a number on a tablet, and the system identifies it. This demonstration shows that convolution based digit recognition was already functioning several years before the more familiar 1989 Bell Labs video.

For many years, this earlier demonstration remained relatively unknown. Several factors may explain why it did not become widely discussed. The neocognitron relied on learning rules that differed from the backpropagation framework that soon gained prominence. The research community in the mid 1980s had limited access to the computational resources required to train deep models, and interest in neural networks was still reemerging after earlier periods of skepticism. The NHK video was created in a context where international distribution of research materials was not as straightforward as it is today.

Despite limited visibility at the time, the 1986 demonstration represents an early and noteworthy milestone in the development of CNNs. It offers clear evidence that convolutional recognition systems were already capable of handling handwritten digits before the approach gained international prominence.

The First Backpropagation Trained Two Dimensional CNN: Zhang et al. in 1988

A separate development in the late 1980s provided what many now consider the first modern convolutional neural network. In 1988, Wei Zhang, J. Tanida, K. Itoh, and Y. Ichioka presented a model that used two dimensional convolutional filters trained with backpropagation. Their system performed pattern recognition, including character recognition, which made it functionally similar to the CNNs that would later be adopted worldwide.

This contribution is important because it showed that convolutional architectures and backpropagation could be combined into a single, unified system. The approach allowed the network to learn its filters directly from labeled data. This capability is a defining feature of current CNNs and played a major role in the success of deep learning.

Although this work clearly advanced the state of the art, it did not gain the level of attention that later CNN research achieved. Several practical and historical factors likely contributed. The paper appeared in conference proceedings that were not widely circulated internationally. The deep learning research community was not yet globally connected, and there were fewer opportunities for rapid dissemination of technical advances. Meanwhile, interest in backpropagation was rising quickly, and some of the most influential publications were appearing in prominent English language venues with greater reach.

Even so, the 1988 model offers an important connection between Fukushima’s early conceptual architecture and the later CNNs developed at Bell Labs. It demonstrates that backpropagation based convolutional models were operational before the end of the 1980s and before the more widely known 1989 results.

The Bell Labs Era and the Move Toward Practical CNNs

Beginning in 1989, Yann LeCun and colleagues at AT&T Bell Labs produced work that played a central role in the global recognition and adoption of CNNs. Their research combined convolutional architectures with efficient training methods and emphasized reproducible experiments and clear demonstrations. The well known 1989 video showing live digit recognition helped introduce CNNs to a broad audience.

During the 1990s, LeCun’s group continued to refine their models. They improved training procedures, explored new tasks, and integrated CNNs into operational systems. One of the most significant achievements was the use of CNN based technology for United States Postal Service automated mail sorting. These applications provided strong evidence that CNNs were effective for real world tasks and not only for laboratory experiments.

The key contribution of the Bell Labs research was not the invention of the CNN architecture or the first use of backpropagation in convolutional models. Those achievements belonged to earlier work. Instead, the Bell Labs research program made CNNs practical, scalable, and widely visible. It produced a coherent methodology, supplied detailed publications, and demonstrated applications that attracted international attention. As a result, these systems became the reference point for many later researchers.

Why Early Contributions Were Less Visible

Historical visibility in science often depends on factors beyond the technical merit of a contribution. Fukushima’s earlier work appeared primarily in Japanese journals and used learning rules that were soon overshadowed by the global rise of backpropagation. Zhang and colleagues published in a venue that did not reach a broad international audience, and their work arrived during a transitional moment in the field, when interest in neural networks was expanding rapidly in specific regions.

Institutional influence also played a role. Bell Labs was an internationally recognized research center with strong connections to conferences, journals, and global collaborators. As a result, the CNN systems produced there gained greater exposure and shaped how the history of the field was later remembered.

These factors do not diminish the contributions of any group. Instead, they highlight how dissemination, publication language, and institutional reach can influence how scientific developments are perceived and remembered.

A Clear and Accurate Development Timeline

When the historical record is examined carefully, a clear timeline emerges. The architecture that defines modern CNNs originated with Fukushima’s neocognitron at the end of the 1970s. The first two dimensional convolutional neural network trained by backpropagation was introduced by Zhang and collaborators in 1988. The practical and widely influential implementations that shaped global awareness were developed by LeCun and colleagues beginning in 1989.

This timeline reflects complementary contributions. Fukushima established the conceptual structure. Zhang and colleagues demonstrated the feasibility of combining convolution with backpropagation. LeCun and collaborators developed methods and applications that significantly expanded the reach and influence of CNNs.

Why This History Matters

Understanding the full history of convolutional neural networks serves several purposes. It provides a more accurate and complete picture of how the field developed. It acknowledges the contributions of researchers who played key roles in important innovations. It also highlights how major scientific advances often emerge from different research groups working independently across various locations.

Recognizing this more complete timeline encourages the research community to consider a broad range of sources and contributions. It also illustrates that transformative technologies often develop gradually rather than through a single defining moment.

Conclusion

The history of convolutional neural networks spans several distinct phases. Kunihiko Fukushima introduced the original convolutional architecture in 1979 and 1980. Wei Zhang and colleagues presented the first backpropagation trained two dimensional CNN in 1988. Yann LeCun and collaborators developed the practical and widely recognized systems that helped establish CNNs as a central approach in machine learning.

Each contribution played a significant role. Together, they form a coherent and well supported account of how CNNs developed into one of the most important technologies in modern artificial intelligence. This fuller history provides a more accurate understanding of the field and reflects the collaborative nature of scientific progress.

References

Fukushima, K. (1979). Neural network model for a mechanism of pattern recognition unaffected by shift in position. Trans. IECE Japan, J62-A, 10, 658–665. English version published in 1980.

Fukushima, K., Miyake, S., and Ito, T. (1986). Handwritten digit recognition demonstration. NHK Science and Technical Research Laboratories video.

Zhang, W., Tanida, J., Itoh, K., and Ichioka, Y. (1988). Shift invariant pattern recognition neural network and its optical architecture. Proceedings of the Annual Conference of the Japan Society of Applied Physics.

LeCun, Y. (1989). Convolutional neural network handwritten digit demonstration. AT&T Bell Laboratories video.

Schmidhuber, J. (2025). Who invented convolutional neural networks. AI Blog.

Leave a Reply

Your email address will not be published. Required fields are marked *