Why use Logistic Regression instead of Linear Regression 

Why use Logistic Regression instead of Linear Regression 


In linear regression, the observations (red) are assumed to be the result of random deviations (green) from an underlying relationship (blue) between a dependent variable (y) and an independent variable (x).

I. Quantamental Investing 

Quantamental Investing is the use of machine learning, mathematical modelling, and data analysis to calculate the optimal probability of executing a profitable trade. It combines traditional fundamental analysis with quantitative analysis to generate a higher alpha that beats the market. This involves using sophisticated mathematical models and algorithms to analyze a large amount of data. Including financial statements, market trends, and economic indicators. The results of these quantitative analyses become combined with the traditional fundamental analysis methods. By integrating these two approaches, quantamental investing aims to improve the accuracy and speed of investment decision-making, creates more adaptive portfolios, and eliminates human errors and biases. This ultimately enhances investment performance. 

II. Logistic Regression 

Figure 1. The standard logistic function ; note that for all .

Logistic regression is a statistical method used to model the relationship between a binary outcome  variable (i.e. a variable that takes on only two possible values, such as 0 and 1, pass and fail, malignant and not  malignant, spam and not spam etc.) and one or more independent predictor variables. Logistic regression in machine learning becomes used for classification tasks.

As a result, it falls under the Supervised Learning methodology that uses a sigmoidal or logistic function to transform the output of a linear equation into a probability value. As the outcome is a probability, results are bounded between 0 and 1. This probability value can then become used to classify a new observation into one of two categories based on a threshold value. There are three types of logistic regression – Binary, Multinormal (three or more unordered output categories), and Ordinal (three or more ordered output categories).  

III. Linear Regression 

Linear Regression is a statistical method that aims to identify the best linear relationship between  continuous dependent variables and the independent predictor variables. This form of regression estimates the  coefficient(s) of a linear equation, involving one or more predictor variables, by reducing the error between the  plotted regression line and datapoints.

Example of simple linear regression, which has one independent variable

Moreover, often visualized as a straight line on a scatter plot. Linear regression uses methods like least squares, mean squares, etc. to discover the best fit line for a set of datapoints. Linear models are easy to implement & interpret and use straightforward statistical error estimators. The parameters of the linear regression model represent the slope and intercept of the best-fit line and indicates the correlation  (strength and direction) of the relationship between the dependent variable and the independent predictor variables.

IV. Why use Logistic Regression 

The intuition behind using logistic regression is to describe data and explain the relationship between one  dependent binary variable and one or more nominal or ordinal independent predictor variables. This regression framework becomes used to predict the probability of an event occurring and categorizing it.  

Logistic regression is a widely used methodology in finance because it is particularly suited to model binary  events, such as credit risk modeling, fraud detection, bankruptcy risk, market trends, etc., that are common to the  financial domain. Logistic regression provides accurate results based on a set of pertinent characteristics (such as  credit score, income, employment history, etc.) and other risk factors (such as loan term, interest rates, economic  factors, regulatory changes, etc).

Using linear regression in the aforementioned scenarios will lead to futile and inaccurate predictions because linear regression predicts values that are outside of the binary range.

Unlike linear regression, where a best-fit line becomes obtained by which future values can become easily predicted, logistic regression  captures the landscape of classification and categorization.  

Logistic regression uses a sigmoidal function to map the output of a model to a probability value between 0  and 1. This probability value can become interpreted as the probability of a target variable being in one of the two  possible outcomes. The sigmoidal function used, is an S-Shaped curve having an equation – f(y) =

1 + �!” 

where ‘y’ is the input to the function. The input function is the sum product of explanatory variables ‘�#’ and  regression coefficients ‘βn’. 

The S-shaped curve of the sigmoidal function allows the model to capture the non-linear relationship between the independent predictor variables and the log-odds of the dependent variables. Log-odds is defined as the natural  logarithm of the probability of success divided by the probability of failure (i.e. odds). 

Simply applying a sigmoidal function to the output of a linear regression model would not convert it into a logistic regression model.

Although it would constrain the output to be between 0 and 1. This transformed output  would not accurately represent probabilities nor be appropriate for binary classification tasks. Logistic regression  follows an approach of directly modeling the log-odds of the dependent variable. As a linear function of the  independent predictor variables. It then then applies the sigmoidal function to obtain the predicted probabilities  for the binary classification task. 

V. Conclusion 

In conclusion, logistic regression is a powerful machine learning tool for trading & investing classification problems. Moreover, its ability to identify important features and relationships between variables provides valuable insights into the underlying data. By incorporating categorical and continuous, simple and complex data and its ease to understand makes it useful for decision making. 

Written by Varun Chandra Gupta  



[1] https://www.morganstanley.com/im/publication/insights/investment-insights/ii_quantamentalinvesting_us.pdf [2] https://www.blackrock.com/corporate/literature/whitepaper/viewpoint-artificial-intelligence-machine-learning-asset-management october-2019.pdf 

[3] https://www.ibm.com/topics/logistic-regression 




[7] https://www.simtrade.fr/blog_simtrade/logistic-regression/

Why use Logistic Regression instead of Linear Regression