A very brief peek into the human brain and machine learning for absolute beginners!

Suraj Pandey
15 min readSep 8, 2022

--

Disclaimer: I have come across many of my friends who do not belong to the computer science community but are highly interested to learn about Machine learning/Artificial Intelligence for their professional/personal requirements. This article is especially written for them. I have deliberately picked only the crucial and easy to grasp parts so that they get a quick launchpad to start their learning journey. Anyone seeking in-depth knowledge may easily find apt materials online. That said, even for the learned ones, there are perspectives in this article that are worth a read. Now you may read on!

Anything can be accomplished through simplicity, the hard part is to believe in this practice” - This realisation of mine has grown stronger as I learn further about the things around me. For instance consider the grand mechanism of evolution, that has created such a variety of flora and fauna around the world. You may find not only various levels of intelligence but also various types of structures for it. On one side we have humans, with such a rich form of almost centralised intelligence that allows them to surpass sensory perceptions and work on abstract stimulus such as ideas and thoughts. On the other side we have organisms such as the octopus that has most of its ‘brain’ spread across its body, forming a decentralised intelligence. This allows it to achieve impressive feats like changing both the texture and the colors of its body within no time to hit that perfect camouflage. Then there are organisms like jellyfish who do not have a heart, nor a brain; but still have managed to survive and fascinate us. This eclectic tapestry is extremely hard to create if one is assigned the task to do so. However, evolution achieved all this through simplicity: Minute changes across a large span of time, practised regularly, allowed species to drift apart from each other, resulting in the vast variety. Similarly in matters as abstract as of war, simplicity plays a key role in determining the consequence. For instance in the first battle of Panipat (India), it were the simple strategic decisions like tulughma and araba, applied on a large scale repetitively, that collectively allowed Babur defeat a much larger army of Lodhi, leading to the foundation of the vast Mughal rule in the years to come. A similar pattern can be found in Michael Phelps’ (American competitive swimmer) case, where he ascribed most of his achievement to his non-stop adherence to his training schedule for 5–6 years. All he said he did was following the same regime with extreme regularity (although ancillary activities would have followed). There are innumerable such examples that can be quoted but to keep it short, one can understand that simplicity combined with regularity (repetition) is a universal key to intricacy and thereby richness.

But wait! Why did we digress so far? We did so because we had to peek into one such intricate product of nature: the human brain. The human brain has both physically and temporally spaced specialised components. For instance each of the reptilian, limbic and the neocortex portions of the brain have different roles to play and also take differing duration to develop. However, even this intricate marvel of nature has an underlying simple structure. Its the neural network!

A neural network is a network of neurons! Voila!

What? The above definition of neural networks did not amaze you? I get it, I did not tell you anything novel in that definition. But you need to read between the lines in order to realise the crux. Firstly you need to understand what a neuron is.

Fig 1: A neuron (connected with other neurons)

As can be seen in Fig 1, a neuron is a cell with tentacles (dendrites and axon). Put simply, a neuron receives inputs from the dendrites and gives output through its axon’s terminals (the direction gets reversed during sleep according to recent studies which in itself is a large and interesting topic to be discussed later). An input is inherently an electric impulse generated by a neighbouring neuron. This impulse enters (realised via neurotransmitters) through the branch like structures (dendrites) into the neuron’s body. If the incoming impulse is strong enough, the neuron fires a signal across its body through the axon to the axon terminal. Thus the signal is propagated to the next neuron through the synapse. This was a very coarse picture of what actually happens inside the neuron. An important point here is to understand that a neuron does not allow passage to all the incoming signals, but to only those that cross a certain threshold (action potential).

Now coming back to our definition of neural networks. Imagine a network of neurons.

Fig 2: Neural network activations in the brain

As can be seen in Fig 2, depending on the incoming signals, different neurons can fire/remain silent in different regions of the brain. This firing can be synchronous and asynchronous across different brain locations. And as explained in an earlier article (The missing links, what separates us from singularity), such parallel synchronous and asynchronous firings are what allow us to perform both the basic and the abstract functions such as locomotion, perception, thinking, imagination, etc. I hope now you appreciate the presence of simplicity in the human brain (in the form of a simple neuron which passes or stops a signal) and how its repetition (in the form of a neural network) allows us to acquire the fascinating intelligence that we have.

Now that you are introduced to the tools of simplicity and repetition, its time to actually use them and understand an Artificial Neural Network (ANN)! but before that let us have a look at a problem statement, through which we will both understand and realise the need of machine learning, leading us to the usage of ANN.

Consider this image:

Fig 3: The MNIST digit 4

Fig 3 shows an image of the number 4 (image has been taken from the source dataset known as the MNIST dataset). The image actually is a collection of pixels arranged in a 28x28 grid. The ‘0’ at the top left of the image shows the row number and the ‘0’ at bottom left shows the column number in the grid. Thus the top left pixel can be pointed out as (0,0) indicating the 0th row and the 0th column. Now if I ask you what is the color of the pixel at (0,0), you would say its dark. Similarly the color of the pixel at (27,27) (the bottom right corner) is also dark. The color at approximately (15,4) is bright. Thus one can look at each pixel of the image and gets its color. A pixel here is nothing but actually a number, displayed in the screen as a color. Thus the image is just a collection of numbers (total 28x28 = 784 numbers having values in the range 0 to 255). The lower the number, the darker is its color, and higher the number, the brighter is its color. Thus the value at (0,0) should be close to 0 and value at (15,4) should be close to 255.

Now that the actual ‘numerics’ of the image is known, let us try to achieve a basic human task: image classification. That is, given an image, can I tell which class it belongs to? The MNIST dataset has images of numbers between 0 to 9. So given any image from the MNIST dataset, can I tell its class, i.e., the represented number? Let us start with the below number:

Fig 4: The MNIST digit 1

As can be seen in Fig 4, we need to recognise the number 1. So an intuitive and seemingly valid logic is to follow the morphology of the number, in order to recognise it. The morphology of the number 1 is such that in the given image, it makes some of the middle vertically placed portion of the image bright and the remaining part stays black. Thus it would be correct to come up with the following set of rules (model) to classify an image as 1:

  1. The pixels in the column numbers 0 to 10 and 16 to 27 should all be dark (value close to 0)
  2. The pixels in the columns in the middle should be bright (value close to 255).
  3. Pixels at the intersection of the middle columns with few of the rows at the top and the bottom should be dark.”

Voila! we got ourselves the logic to recognise the image of class 1. But wait! what about the image below?

Fig 5: A tilted 1 in MNIST dataset

Fig 5 also displays an image of class 1 in the MNIST dataset. However, this image doesn’t fit our earlier defined set of rules. This is so because the number 1 here is ‘slanted’, thus some portion of the number bleeds into the columns that were actually to be left dark. For instance according to rule 1, beyond 16, all the columns should have dark pixels, but here the top right portion of the number 1 actually lies in the column range 16 to 21. So maybe we can change our model a bit as follows:

  1. The pixels in the column numbers 0 to 10 and 16 to 27 should all be dark (value close to 0)
  2. The pixels in the columns in the middle should be bright (value close to 255).
  3. Pixels at the intersection of the middle columns with few of the rows at the top and the bottom should be dark.”
  4. If any of the above columns/rows have a portion opposite to its desired value (dark or bright pixel), then its diagonally opposite portion should also have the same value.

Think about the rule 4 for a while and you will realise that we are trying to accommodate the ‘slanting’ of 1 here. So by now, we have managed to come up with rules to classify an image as 1. But wait, its just the tip of the iceberg. Behold some of the remaining images of the MNIST dataset:

Fig 6: Some samples from the MNIST dataset

As can be seen from the Fig 6, as we consider more classes of numbers (2,3 and 9 are shown), it gets tougher to come up with rules that can classify the images of same class as similar and at the same time differentiate across different classes. Thus a major problem with rule based models is that as the complexity of the input data increases, the underlying rules get qualitatively and quantitatively intractable and difficult to manage. Given that you have realised this difficulty, it is now that we summon Machine Learning (ML)!

So what is the big deal about it?

It is that Machine Learning is a method to solve a problem where instead of creating the rules to solve it, one simply feeds the data of the target problem to a model which in return carves out the rules itself that can be used to solve the problem. Thus we give in data and get back rules to solve a problem under ML. Did not get an intuition yet? Consider this example:

Fig 7: Using a stencil to draw accurately

As can be seen in Fig 7, an artist is able to draw accurate and consistent drawings of leaves using a stencil. Something similar is what we do in ML. ML starts with a random model. A random model is nothing but a blank sheet out of which a trained model (stencil) will emerge. When we provide data to the model (the blank sheet), the irrelevant portions of the models are cut and the relevant ones are allowed to stay. Analogously, we cut the blank paper sheet in such a way that a leaf may be carved out of it, cutting in relevant areas and allowing paper to stay wherever an obstruction is required. Thus when the stencil is carved out, we can use it to draw the image that it represents by allowing color to pass through. Similarly when the model is learnt, we can use it to classify an image by passing the image through the model and getting an output at the other end. An ML model is nothing but a (generally) large and complex mathematical equation (set of rules) that takes an image as input and gives the desired result as output. However, this complex mathematical equation is achieved by following the principle of simplicity using ML. Remember ANNs? Those are exactly how we can model such complex mathematical rules using simple methods and repetitions. That is why ANNs come under the methods of ML.

Workings of an ANN:

Before we dive into the ANN, let us understand what an artificial neuron is. An artificial neuron is nothing but an extremely simple mathematical structure that takes an input and gives an output. for example consider the following mathematical function:

Sum(a,b) = a+b

Here ‘Sum’ takes 2 inputs and gives back an output. So if one gives 2 and 3 as inputs to ‘Sum’, it will do its ‘+’ operation and add the two inputs, giving back 5. You can consider ‘Sum’ as a neuron. So whenever you feed 2 numbers to this neuron, you will get back their sum as its output. However, often we seek more functionalities from a neuron than merely adding its inputs. For instance while adding the inputs, we may want to give importance to some of the inputs and allow them to make more impact on the output. This is achieved by ‘weighing’ the inputs (simply multiplying a fraction to the particular input). We may also want the output to not be a direct sum of the inputs but a value that is high only when a threshold is achieved. This adds non-linearity to the model (that in itself is a separate topic). An artificial neuron can be understood as follows:

Fig 8: An Artificial neuron

As shown in Fig 8, an artificial neurons can have several inputs (x₁,x₂…xₙ). These inputs are fed into the neuron via weighted links (w₁,w₂…wₙ). These weights are directly multiplied with the values of inputs. Thus the resulting values of input that goes into the neuron gets enhanced or attenuated depending on the associated link’s weight. Finally all these weighted inputs are added, yielding a summation. This summation can be a high value or a low value, depending on the strength of the inputs. In order to introduce non-linearity, an activation function is applied over this summation that simply yields a mathematical value whose magnitude depends on whether the incoming summation crosses a threshold or not (for now, we do not need to discuss bias). Thus finally we obtain a mathematical value as an output from the neuron. Now here comes the interesting part!

Fig 9: An ANN

Fig 9 shows how individual neurons (circles in hidden layer 1 and 2) can be connected across to form the ANN. Each neurons’ output is an input to some other neuron. The best part about this network is non-linearity and its multi-layered structure. Thus depending on the incoming inputs’ strength (often called as activations), different neurons in ANN can exist in differing output states (high or low). Thus different regions of this network get ‘fired’ up depending on the type of input fed. Finally, the correct output gets lit up depending on the associated input.

Fig 10: Flow of activations through an ANN for MNIST data

As seen in Fig 10, we can perform our image classification task using ANN. An input sample (image of a number, 9 in the given example) will be fed to ANN. Each pixel (1st to 784th) is an input to the ANN. Depending on these inputs, the activations (high outputs of neurons) flow across the ANN to yield the correct output (the class 9). To be more precise, an input pixel is fed to the neurons of first layer after weighing its value. Depending on this weighing, some neurons of this layer may get activated (cross the threshold to give a high output) and feed a high input to the next layer’s neurons. This process continues till the final layer which gives the result of the classification. Here the result may be inferred as the number of the most activated neuron of the last layer to be the class of the input image. Thus without actually specifying any rules, we are able to obtain a mathematical model (ANN) to reach from the input (image) to the output (its class). One can imagine this process as color being passed through a stencil. The holes in the stencil allow the color to pass through while the obstructions stop the color, yielding the appropriate image on the other side.

It may be observed that neurons in an ANN are arranged in the form of ‘layers’. Each layer is nothing but a group of neurons connected to the neurons in the previous layer. Due to the presence of this multi-layered structure, we can realise such a mathematical model that can appreciate both the granular and the high level features in our data. For instance, for the image of number 9, the lower-level layers (towards the left) will be able to capture the lower-level (pixel level) details like - is the top right pixel dark or not?. Whereas, the higher-level layers (towards the right) will be able to capture the higher-level (feature level) details like - is there a loop in the number or is there a straight line in the number? In fact, a Deep Neural Network exploits this multi-layered approach intensively by using many layers of neurons to learn various levels of features of a data, thereby learning a better representation for the data. This is the freedom one gets with ANN, in order to accommodate deeper insights about the data, you do not need further complex rules but rather further repetitions of layers which is a trivial task!

Now that you have got familiar with an ANN, let it be known that we have not discussed a crucial and ingenious component regarding ANN: the training of ANN. As you might have observed, weights in an ANN play a major role. It is these weights that guide the flow of activations, finally leading them to the correct output. Thus setting the weights to an optimal value is crucial for an ANN model. But where did we set it? We haven’t yet. Actually setting up these weights to an optimal value that suits the problem at hand is what we call as training an ANN. One may sit down for a week and optimise all these weights for an ANN that will satisfy all the classes (0 to 9 in case of MNIST) by hit and trial, but that is an insane way to do it. Given that we have good computation power with us in the form of our personal computers and simple yet effective methods like gradient descent, optimising the weights gets very easy. That is exactly where data comes into picture to guide the setting of the weights. Both the data and its class are made available to the model and the weights are adjusted according to the output of the model, as to whether the output matches the actual class (called the ground truth). This process of providing the data and tuning the weights is repeated multiple times (called epochs) to make the model better each time. Drawing the stencil analogy here, during model training, each time we pass color through the stencil and check on the other side that whether the desired pattern was formed or not. We keep clipping the stencil until we get the desired pattern on the other side. That is why you may understand now why data availability is so important for ML tasks. With tools like Tensorflow and Google Colab, each step of ANN creation becomes very easy to be realised in a personal computer, that too in less than 10 minutes! In fact all of whatever you have read till now is sufficient to build your own neural network and perform machine learning. We shall be doing the same in the next article: Machine Learning for absolute beginners: Your first Neural Network in ~10 minutes & < 20 lines of code!

--

--

Suraj Pandey

A computer science researcher straying across observations, facts, theories and applications. Also fascinated by intelligence, chaos and philosophy in general.