# What is logistic regression and how does it work If you don’t understand logistic regression you certainly don’t understand deep learning.

– Jonathan Gordon, PhD Machine Learning & Deep Learning, University of Cambridge

## Logistic regression interpretation

What is it about logistic regression that you should know?

Logistic regression is a traditional statistics technique that is also very popular as a machine learning tool.

Logistic regression predicts whether something is true or false instead of predicting something continuous. Logistic regressions ability to provide probabilities and classify, new samples using continuous and discrete measurements.

## When to use logistic regression?

Logistic regression is a beautiful example of what is called a “generalized linear model” in statistics, and it embodies a very deep principle that is worth knowing just for its own sake. If you think of ML (or statistics) as constructing generative models of data, a very simple principle is to assume the data is generated as a “linear” process, the simplest of all models.

– Sridhar Mahadevan, PhD Computer Science, Rutgers University

Logistic regression can be used to classify samples, and it can use different types of data like size and/or genotype to do that classification, and it can also be used to assess what variables are useful for classifying samples ie.

Logistic regression typically needs a lot less data than deep learning.

There’s a right way and a wrong way to implement machine learning, and for nearly every problem, beginning with the most complicated, data-hungry model you can think of is almost certainly the wrong way to go.

Deep learning can’t do what logistic regression can. Deep learning does not have parameter estimates for each variable in the model. It’s also explainable, in the sense that it’s not just a black box for bringing things in and taking them out. You may be interested in both (or in addition to) prediction and description at times.

Machine Learning (ML) classifies Logistic Regression (LR). Here’s how to implement machine learning the best way:

1. Begin with the simplest model that solves the problem, such as logistic regression.
2. Take any measurements. Are the results up to par? If that’s the case, you’re done. If you don’t, this model becomes your starting point.
3. Build a more complex model, such as a decision tree.
4. Collect the same data as before. How do they match up to your starting point? If it’s slightly better, you’re done. If it’s better but not quite there, go back to phase 3 and try again for a better model. If it isn’t dramatically better, you may need to reconsider your feature engineering and start over.

Notice, models that are linear:

• When compared to deep neural nets, it is possible to train quickly.
• Can manage wide feature sets with ease.
• Can be trained using algorithms that don’t necessitate a lot of tinkering with learning rates and the like.
• Can be debugged and interpreted more effectively than neural networks You should look at the weights assigned to each function to see which one has the most effect on a prediction.
• Provide a great place to start learning about machine learning.
• In industry, they’re commonly used.

#### Consider this example of using logistic regression

Let’s say you’re looking for a traffic recovery scenario after Google’s December update. One of the likely reasons for a drop in traffic could be a poor-quality link profile, and you decide to check backlinks and reject toxic ones.

For this analysis, I used data from Majestic and Netpeak Software.

I also used Orange software for backlink analysis.

As you can see in the figure, the data is marked with two classes of spam (value “Yes”) and non-spam (value “No”).

I used the Logistic Regression widget and the Model Explain widget to get information about the most important factors that affect the likelihood that a referring domain will be assigned to the appropriate class.

The Model Explain widget reports that the most important criteria are:

1. Majestic Trust Ratio.
2. Trust Flow.
3. Citation Flow.

We can see that a referring domain is classified as Spam if it has a low Trust Flow value and a high Citation Flow value. In other words, a lot of backlinks but little trust.

Please read what a data point in a machine learning model is.

### SUMMARY

Logistic regression, like all regression analyses, is a predictive analysis. To define data and illustrate the relationship between one dependent binary variable and one or more nominal, ordinal, interval, or ratio-level independent variables, logistic regression is used. Logistic regression is the appropriate regression analysis for the dichotomous variable (binary).