Convolution Neural Networks

3 min readJan 1, 2022

Vision is a fascinating journey of neurons, memory, imprinting and phsyiology. But also, critically, maths. The way that neural networks decode an image is close to but not an accurate representation how our own brains work. Neural networks rely critically on computational capability. It not only has to be equal to the capacity of one brain, but all the brains in the world (i.e all the images in the world at all angles). Which is why the development of the neural networks in recent years has relied largely on computational advances.

Understanding the Convolution Neural Network

Convolution is what happens, when we take a smaller function, and slide it across another function. The output of this slide is the convolution product of the two functions. In simple terms, it is like sliding a magnifying glass across the length and breadth of the image.

While we can definitely implement a fully connected MLP architecture for images, considering the number of neurons that need to be trained, this equates to more than millions and trillions of computations. Not making it feasible to use Neural Nets. What we then do is, use a convolution layer after the input layer of the image in a Neural Net Architecture. This helps reduce the number of computations, and also helps us identify the finer details of the image first, and then progress to a larger image which mimics image perception by the neurons in the brain.

Convolution Neural Network architecture generally consists of

Input layer
Convolution layer/s
Pooling layer/s
Output layers

Convolution Layer

A convolution layer can be defined with 3 parameters.

Number of neurons
The size of the convolution filter
The stride length and width of the filter

The one obvious problem with this is that the size of the filter, and its stride might not always fit neatly at the edge of the image. If we stop the filter slide before an image ends, we might lose some data around the boundary of the image. This is easily rectified by zero padding around the image to fit the size of the filter and its slide. This leads us to another parameter of a convolution layer

4. Padding, this must either be ‘Valid’ or ‘Same’. A ‘Valid’ padding does not use zero padding and ‘Same’ does if necessary.

Understanding the Filters or Convultion Kernels

The filters can themselves be decoded as images. Usually the convolution filters / images used in the beginning of a CNN are just lines or edges. For example a 4 by 4 filter with all zeroes except the diagonal, will ignore everything except for the presence of a diagonal shape in the image.

Pooling layer

Pooling layers are used to further reduce the size of the image by subsampling. They reduce computational load, reduce overfitting and make the neural networks tolerate image shift.

Several combinations of such layers give us a final CNN.

Ref:

https://www.khanacademy.org/math/differential-equations/laplace-transform/convolution-integral/v/introduction-to-the-convolution

Hands-On Machine Learning with Scikit-Learn and TensorFlow — Aurelien Geron March 2017, First Edition

Image — https://www.datatechnotes.com/2018/09/image-convolution-example-in-r.html

Convolution Neural Networks

Written by Pallavi Krishna