Why use Logistic Regression instead of Linear Regression
I. Quantamental Investing
Quantamental Investing is the use of machine learning, mathematical modelling, and data analysis to calculate the optimal probability of executing a profitable trade. It combines traditional fundamental analysis with quantitative analysis to generate a higher alpha that beats the market. This involves using sophisticated mathematical models and algorithms to analyze a large amount of data. Including financial statements, market trends, and economic indicators. The results of these quantitative analyses become combined with the traditional fundamental analysis methods. By integrating these two approaches, quantamental investing aims to improve the accuracy and speed of investment decision-making, creates more adaptive portfolios, and eliminates human errors and biases. This ultimately enhances investment performance.
II. Logistic Regression
Logistic regression is a statistical method used to model the relationship between a binary outcome variable (i.e. a variable that takes on only two possible values, such as 0 and 1, pass and fail, malignant and not malignant, spam and not spam etc.) and one or more independent predictor variables. Logistic regression in machine learning becomes used for classification tasks.
As a result, it falls under the Supervised Learning methodology that uses a sigmoidal or logistic function to transform the output of a linear equation into a probability value. As the outcome is a probability, results are bounded between 0 and 1. This probability value can then become used to classify a new observation into one of two categories based on a threshold value. There are three types of logistic regression – Binary, Multinormal (three or more unordered output categories), and Ordinal (three or more ordered output categories).
III. Linear Regression
Linear Regression is a statistical method that aims to identify the best linear relationship between continuous dependent variables and the independent predictor variables. This form of regression estimates the coefficient(s) of a linear equation, involving one or more predictor variables, by reducing the error between the plotted regression line and datapoints.
Moreover, often visualized as a straight line on a scatter plot. Linear regression uses methods like least squares, mean squares, etc. to discover the best fit line for a set of datapoints. Linear models are easy to implement & interpret and use straightforward statistical error estimators. The parameters of the linear regression model represent the slope and intercept of the best-fit line and indicates the correlation (strength and direction) of the relationship between the dependent variable and the independent predictor variables.
IV. Why use Logistic Regression
The intuition behind using logistic regression is to describe data and explain the relationship between one dependent binary variable and one or more nominal or ordinal independent predictor variables. This regression framework becomes used to predict the probability of an event occurring and categorizing it.
Logistic regression is a widely used methodology in finance because it is particularly suited to model binary events, such as credit risk modeling, fraud detection, bankruptcy risk, market trends, etc., that are common to the financial domain. Logistic regression provides accurate results based on a set of pertinent characteristics (such as credit score, income, employment history, etc.) and other risk factors (such as loan term, interest rates, economic factors, regulatory changes, etc).
Using linear regression in the aforementioned scenarios will lead to futile and inaccurate predictions because linear regression predicts values that are outside of the binary range.
Unlike linear regression, where a best-fit line becomes obtained by which future values can become easily predicted, logistic regression captures the landscape of classification and categorization.
Logistic regression uses a sigmoidal function to map the output of a model to a probability value between 0 and 1. This probability value can become interpreted as the probability of a target variable being in one of the two possible outcomes. The sigmoidal function used, is an S-Shaped curve having an equation – f(y) = 1
1 + �!”
where ‘y’ is the input to the function. The input function is the sum product of explanatory variables ‘�#’ and regression coefficients ‘βn’.
The S-shaped curve of the sigmoidal function allows the model to capture the non-linear relationship between the independent predictor variables and the log-odds of the dependent variables. Log-odds is defined as the natural logarithm of the probability of success divided by the probability of failure (i.e. odds).
Simply applying a sigmoidal function to the output of a linear regression model would not convert it into a logistic regression model.
Although it would constrain the output to be between 0 and 1. This transformed output would not accurately represent probabilities nor be appropriate for binary classification tasks. Logistic regression follows an approach of directly modeling the log-odds of the dependent variable. As a linear function of the independent predictor variables. It then then applies the sigmoidal function to obtain the predicted probabilities for the binary classification task.
In conclusion, logistic regression is a powerful machine learning tool for trading & investing classification problems. Moreover, its ability to identify important features and relationships between variables provides valuable insights into the underlying data. By incorporating categorical and continuous, simple and complex data and its ease to understand makes it useful for decision making.
Written by Varun Chandra Gupta
 https://www.morganstanley.com/im/publication/insights/investment-insights/ii_quantamentalinvesting_us.pdf  https://www.blackrock.com/corporate/literature/whitepaper/viewpoint-artificial-intelligence-machine-learning-asset-management october-2019.pdf