How to make machines learn hat is Machine Learning?In simple words, it gives machines the ability to learn like humans, the same old definition that you must have read everywhere.
But how can machines learn? do they even know the concept of learning?. All these questions must have stormed your brain when you would have heard about Machine Learning.

Let’s start from the very beginning, suppose you are a computer scientist and you need to develop some algorithm so that machines can learn like humans. Okay, so humans use all of their six senses to interact with the world. These six senses convert your interaction with the world into signals which are later sensed and stored by your brain. Since computers are not capable of such six senses ( yet….) therefore we need to provide them the information. So how can we do that? We could either provide them an image or a video from a camera or a text from documents or from websites etc. As a matter of fact, images, text, videos, audios, etc. all can be used as data or information for computers.

“Computers are able to see, hear and learn. Welcome to the future.”

~Dave Waters

Now we have our data but the computer still can’t learn them, for them it is just a bunch of zeros and ones.

One thing that we have and computers don’t (yet..) is subconsciousness, we know what we are doing and why we are doing. We have the ability to reason, to question things around us but computers are not capable of such things. Therefore how can we make them learn? We will use a powerful tool that has been there in the world for very long: Statistics.

First of all let me clear one thing, when we say computers are learning what we really mean is that it has gained the ability to generalize things. For example, suppose you are a one-year-old kid and your mother shows you something like this (Fig: 1) :

Fig : 1

And tells you that it is a ball. What you will interpret is that my mother is showing me something and speaking ball. On the next day, she does it again now you are thinking that why mother is showing this round object and speaking ball every time. On the third day when your mother repeats it, you will now start to think maybe this round object is known as the ball. So now when the fourth time your mother will repeat this, you will immediately recollect that this is the ball. How did you do that? It’s because you were shown something over and over again until you gained enough information about it, now even if you will be showed a different red color ball you will immediately recollect that this object is known as the ball. But since you did not have enough information about the ball you are likely to term this (Fig: 2)as well, as the ball :

Fig: 2

This would have not been the case if your mother would have shown you this (Fig 2) as well and told you its a fruit and then you would have made a proper generalization of the ball may be by learning about the texture of the ball and the fruit. Now back to being a computer scientist, you have to bring this ability to computers and therefore we will use statistics.

Since for computers, everything is a number, I will use numbers as data for the rest of the blog. Suppose you are given some points like (1,1), (2,2), (3,3) and is asked what will the other half of (4,?). The answer is 4 right. How did you do that? I think you would have seen the pattern that each pair have the same numbers and in maths this in the form of line y=x. So what if your computer was able to do this like it would have taken these points plots a line that fits most of these points and uses the function that represents the line to give values of y for future x.

Fig 3: A graph f(x)=x representing your data

Let’s take an example you are been given some points (2,4),(4,8),(8,16) now what will be the other half of (9,?). Like a computer (since they don’t know about any patterns they will not be able to figure out the function at first strike like you did), so now you are a computer, therefore, you saw these points and draw a random line y=mx (remember?) with m=1 i.e y=x. How well your function does? Is it able to represent your data correctly?

Fig 4: Red points are the ones that y=x predicted and the black line is the actual line that fits

If you will try to fit your inputs (2,4,8) on the line y=x you will get (2,4,8), what will be the error = (actual output — predicted output) for all points which will be (4–2)+(8–4)+(16–8) i.e 14. Now again try a different value of m such that we can minimize our error since our current function predicted output lags behind our input we may want to increase m to say 1.5. How about now? does this represent your data correctly now?

Fig 5: Red ones are the predicted points ( y=1.5x ) and a black line is an actual line that fits

umm not exactly now the output is 3, 6, 12 respectively and the error is 7. Though our error is now less we are still a little far from perfectly fitting the line. Now how about m=2? this would bring the error to 0. So what you did is that you tried different values of m and you moved towards the direction of values which would decrease the error. This is exactly the same way the machine learning algorithm learns. You have noticed that we are only changing the values of m which is because it is the only value that we have in our hands rest all is predefined (our outputs and inputs), therefore the machine can only manipulate the value of m which is also knows as weights.

Fig 4: (Red Color:- y=1.5x; Green Color: y=x and Black Color: y=2x)

The data that I mentioned earlier was for lines which pass through (0,0), not all lines pass through the origin. So we can use y=mx+c for them where now m and c will be our weights, the computer will try to approximate the values of both m and c in order to reduce the error. Now, what about nonlinear data? or about data with more than one dependent variable? For the earlier one, we could define a function something like this y=ax+by+cz+d where a,b,c,d will be our weights and x,y,z will be our dependent variables. For this type of data, the computer will learn these four weights (a,b,c,d). We could increase the number of dependent variables and weights if our data demands more. And for the latter question, we could introduce a different function a more complex one like sending mx+c or ax+by+cz+d into a sigmoid or tanh function, etc. I will not go in deep with them. But now you understand how the machine actually learns. In the real world, the data is never perfect it contains noise so you can’t perfectly trust your data, so you would not try to bring the error to 0 rather you will try to minimize it as much as possible why we do this is because if the machine will trust noisy data (data with a lot of the wrong values) too much it will start to give wrong answers to the new data based on its generalization (Also known as Overfitting ) and if it is not able to learn enough it will still not be able to predict properly (Undefiting). There is a proper balance that is needed to be maintained on how much you should trust your data.

To sum up, in machine learning, machine generalizes the data by making the best possible function to represent your data (by approximating the weights) and uses that function to predict future values.
The weights are the memory units for machines that it learns 😉

Now you know what actually happens in ML 🙂 but this was just the simplest explanation of what exactly is machine learning more specifically this was the working of supervised learning technique that I used to explain you the concept of machine learning, there are more ways such as unsupervised learning, semi-supervised learning, and reinforcement learning. By now, you know what exactly machine does to learn things. This small example takes a much bigger and complex form in the algorithms by using linear algebra, statistics, calculus, probability, etc. but at the base level, they are all doing the same thing learning weights and minimizing error.

Will see you now in my next blog. Till then feel free to get in touch :