A classic tool for classification-Logistic Regression

Behind the scenes of the machine learning model

yash shah
3 min readNov 30, 2020

If you want to guess whether the child buys chocolate or not and many similar problems where you need answers in Binary(logit) then Logistic regression can be one of the best ways to leverage in the data science and analytics domain. It is also used in knowing answers which have more than 2 options(Multi-class) like — from Mars, Skittles, and M & Ms which chocolate a boy would purchase in the given scenario.

Logistic Regression

The logistic regression model is essentially a machine learning technique used for supervised learning to predict the categorical target variable. Using the scikit learn It can be quickly implemented in python as shown. A comprehensive summary of the model can be found on scikit learn.

code implementation

One key thing to note is Model from this library calls a label encoder for the target variable so we don’t have to encode the Target variable. For optimal working of the model, a data scientist needs to encode categorical features and normalize continuous features before fitting the model.

Diving further into a logistic regression, it uses the sigmoid function as a basic function that fits into the data to provide the probability of occurrence of the event. In simple words, how it functions is — it fits the sigmoid function which gives the probability of target variable which based on threshold classified into different categories. this threshold commonly known as decision boundary plays a significant role.

sigmoid function

Behind the scenes, logistic regression uses a cross-entropy or log loss function as a cost function for binary classification. For the specified cost functions, the Gradient descent method is used to optimize weights.

cost function
combined cost function

For classifications with more categories than 2(Multiclass), the Softmax activation function must be used along with the data to standardize (range[0,1],sum=1) the probability distributions. For multi-class classification, one v/s rest approach essentially matches the model for each class and predicts the class that has the highest likelihood from all class distributions.

softmax function

Logistic regression being a classification model (amongst SVM, Random Forest, Naïve Bayes), its performance can be measured by Accuracy, precision, recall, F1 score depending on the use case.

In Summary, The logistic regression model uses a statistical approach by fitting data to the sigmoid function and minimizing its cost function by means of gradient descents to estimate the likelihood and the class for the given features.

--

--