AI Sidewalk #2: Delving into Binary SVMs (Part 1)

Shantanu Phadke

Beginning of a new journey. Credit to www.pexels.com/@tony-367405

Last time we explored the concept of classification and used sklearn’s build-in SVM model to predict wine type based on quantities of various compounds such as Alcohol, Malic Acid, Flavanoids, and Magnesium. We were able to achieve a ~97.2% accuracy on our test set, however we just black-boxed much of how SVMs actually function.

This time we will start a deep-dive straight into the theory behind binary SVMs (Support Vector Machines), including the different types of SVMs and the mathematical optimizations behind them.

Let’s start off with the simplest subset of classification problems, ones with two possible output classes. The input data for this situation would be in the form (xi, yi) where xi is the set of input features of the ith sample and yi is the correct output label for the ith sample. Here each yi would only take on one of two possible values.

Credit to https://pixabay.com/users/geralt-9301/

Okay, so we have defined how our data looks and our desired outcome, but what do we do next? Well, we define a loss function. Loss functions measure how closely aligned our model’s current predictions are relative to the set of true labels, and thus we general will find ways to minimize the value of our selected loss.

What type of loss function will be use for SVMs? Well it matters what type of SVM we decide to use. There are two types of SVMs we will talk about: (i) Hard Margin SVMs and (ii) Soft Margin SVMs. Their loss functions and optimization methods differ because the two models prioritize different measurements in fitting data. Hard Margin SVMs can’t exhibit misclassification errors while Soft Margin SVMs can have misclassification errors.

Hard Margin SVMs prioritize maximizing the margin for the data they are trained on. Margin is the minimum distance between an input point and the decision boundary our model comes up with. Why would we want to do this in the first place though?

Above we saw an ideal case where all points were in fact linearly separable, but the question remains, what do we do when that isn’t the case? Well, this is where Soft-Margin SVMs come to the rescue! Where Hard-Margin SVMs fail to find a decision boundary successfully, this second type of SVM can successfully find a solution since it follows relaxed constraints which allow for misclassifications.

Why can’t Hard-Margin SVMs work on data that isn’t linearly separable? After all, what exactly restricts these models from making errors? Well, the proof lies in the pudding, or in this case the math. Here we go.

(1) All of the points MUST be perfectly separated, and each individual point should have a distance from the decision boundary that is at least as large as the margin m.

(2) The margin has to be great than or equal to 0.

Since our decision boundary will be the line defined by w^T*x + b = 0, we can derive the first constraint as follows:

We want a way of being able to expand on Hard-Margin SVMs so that the newly formed model is capable of producing decision boundaries even when it isn’t possible to perfectly classify data. The way we will accomplish this feat is by adding in extra variables into our existing optimization known as slack variables.

Case I: Point (xi,yi) is linearly separable -> 0 ≤ epsilon_i ≤1

Case II: Point (xi,yi) is not linearly separable -> epsilon_i > 1

Dual Formulation of the Optimization

Although the above optimization does in fact make mathematical sense, it is tough to optimize. Luckily for us, in comes the concept of duality, where we can express a minimization problem as an equivalent maximization problem! Not going to go into the details of deriving this dual in this post for the sake of length, but the process is called Lagrangian Duality, and the resulting maximization becomes:

That’s quite a bit of theory and math for this time, next time we will further explore Lagrangian Duality, see how the coefficients we derive from the optimization above help us get optimal weights for our SVM model, and introduce the concept of kernels!

Feedback really helps me to figure out how these articles are being perceived and also helps to increase the quality of future posts, so if you enjoyed this read be sure to leave a clap, and if you have any questions/concerns/feedback be sure to leave them in the comments. Thank you!

Credit to https://pixabay.com/users/athree23-6195572/read original article at https://medium.com/@ssp247/ai-sidewalk-2-delving-into-binary-svms-part-1-d304907dd520?source=rss——artificial_intelligence-5

Share
Do NOT follow this link or you will be banned from the site!