# Causality In Predictions

Causality In Predictions

Causality prediction is a mathematical method to discover the causal relationship between the dependent variables (aka results) and the independent variables (aka causes) through regression analysis, econometric models, and the input-output model.

Among them, a regression analysis prediction method is to establish an appropriate econometric model to predict through the causal relationship and the degree of mutual influence between things. In the real economy, there are inherent relations among many variables, some of which are dominated by other variables or factors. The former is called dependent variable or explained variable, and the latter is called independent variable or explained variable.

There are many ways of performing regression analysis, not only one formula, causal relationships are not necessarily measured by regression. A/B test may be another method. Also, there are many sophisticated ways in Probability and Statistics to examine that.

Regression analysis is one of the analytical methods reflecting the causal relationship between the explained variable and the explanatory variable.

Generally speaking, prediction is to infer the future based on what happened in history. Just like the posterior probability in Bayesian law, it is different from the prior probability in that the prior probability distribution does not have to have an objective basis, it can be partially or completely based on subjective consciousness. The posterior probability distribution integrates the sample information and the prior probability distribution, which is more objective.

Thus, the internal causality and logic will be more objective and close when the posterior probability is used to predict. In forecasting, we usually sample the population sample, and the prior probability reflects our understanding of the unknown parameters before sampling. After sampling, the sample information brings new information. According to the new information, our understanding of unknown parameters also changes, this change is reflected in the posterior probability.

Therefore, sampling is particularly important.

Sampling error will bring prediction error. Survivorship bias occurs if companies are excluded from the analysis because they have gone out of business or because of reasons related to poor performance. Data-mining bias comes from finding models by repeatedly searching through databases for patterns. Look-ahead bias exists if the model uses data not available to market participants at the time the market participants act in the model.

Finally, time-period bias is present if the time period used makes the results time-period specific or if the time period used includes a point of structural change.

Take a few machine learning companies for example. Most of them use machine learning methods to study past information, to make predictions. Amazon, for instance, uses machine learning for all consumer services, from online stores to kindle and echo devices.

Machine learning is used to determine user preferences (such as product purchases), as well as Alexa engine, Alexa smart home devices, Amazon JHIM, Amazon Rekognition, Amazon music, and other functions. The machine learning method will recommend the products you may need according to the purchase and browsing records you did on Amazon in the past. The recommendation that this method often brings inadvertently meets customers’ optional but unnecessary needs and brings more consumption for Amazon.

Facebook is also a company that uses machine learning to the extreme. Facebook’s two billion users use machine learning every day, but they don’t realize it.  Information push, advertisement and facer are the main service objects of machine learning in Facebook.

Message push ranking algorithm enables users to see the most important thing to them first when they visit Facebook. The general model will determine the various user and environmental factors that affect the content ranking through training.

Later, when users visit Facebook, the model generates the best push from thousands of candidates, which is a personalized collection of images and other content and the best ranking of selected content. Advertising systems use machine learning to determine what kind of ads are displayed to specific users.

By training the advertising model, we can understand the user characteristics, user context, previous interaction and advertising attributes, and then learn to predict the most likely ads that users click on the website. Then, when the user visits Facebook, we pass the input into the trained model to run and immediately determine which ads to display.

The causal relationship between their prediction and recommendation mostly lies in the traces left by users on the Internet in the past. Facer is Facebook’s face detection and recognition framework. Given an image, it first looks for all the faces in the image. Then, the face recognition algorithm for a specific user is run to determine whether the face in the graph is a friend of the user. Facebook uses the service to recommend friends to users who want to tag in their photos.

Finally, Google can do visual processing, image processing, Google language, search engine ranking, voice recognition and search prediction. In addition, it also provides cloud AI services for its Google cloud service customers, allowing customers to add machine learning to their applications for image search and recognition, translation, and voice control.

Causality Written by Xinying Lai

Edited by Jimei Shen, Tianyi Li, Calvin Ma & Alexander Fleiss

Causal Machine Learning with Causalens CEO Darko Matovski