Building Artificial Intelligence & Machine Learning
The very phrase sends up images of advanced robotics. Machines smarter than humans. Machines capable of making their own decisions. For decades filmmakers have created renditions of artificial intelligence from the benign, such as Rosie the servant robot in the Jetsons, to the malignant, such as HAL in Space Odyssey 2001.
It is no wonder then that at every meeting with potential investors, the question pops up “What is Artificial Intelligence?”. Unfortunately, outside of popular culture, artificial intelligence means something far more prosaic. Artificial intelligence does not necessarily mean that the machine can truly think for itself, or start answering philosophical questions. It is simply “The science of making computers do activities that require intelligence when done by humans.”
Computer scientists and mathematicians for decades have been at work trying to increase the scope and range of these intelligent activities that a computer can handle.
When the computer first came about, it was nothing more than a simple calculator, a box of gears which could perform simple arithmetic computations. However, with the advent of solid-state memory in the 50's and 60's, computers could start to “remember” previous calculations, and create ever increasingly complicated structures. The field of artificial intelligence really started to come into it's own in the 80's and 90's as computers started to become cheap enough and fast enough in order to start to be able to crunch huge amounts of data.
There are two main approaches towards creating an intelligent program.
The first approach to creating Artificial Intelligence was rule-based. A programmer or scientist would code into the computer a set of rules or statements which determined what the computer did on various inputs.
Grammar checking software is a good example. So, for example, one might argue that proof-reading a paper is an activity which requires a modicum of intelligence, so then that makes Grammar Checking software a simple form of AI. Most grammar checking software involves programming into the computer all the rules of English grammar. What types of words can follow intransitive and transitive verbs, the plaement of the adjective in relation to the Noun, etc. And now, from this set of rules, the program can take in a sentence or a paragraph or a paper, and accurately detect where there are mistakes. Certain semantics parsers, programs which try to “understand” what a sentence means use such a rule-based approach when trying to break down an English sentence.
Another example would be almost all chess playing programs. A set of rules are given to the computer to determine how good or bad a certain board position is. The computer is then able to search out through the possible moves in a position, and apply that set of rules to each board position to determine the optimum move to make.
This approach, while sometimes extremely effective has many drawbacks.
Disadvantages of Rule-Based Algorithms
First of all, it requires rules to be set. Which means that the programmer needs to have expert knowledge in the subject matter in order to create the rule set. In addition, one must be able to quantify this expert knowledge. So for example in the chess programs, when grand masters look at the chess board and decide whether their position is good or not, a lot of intuition is used. It is not easy for them to breakdown and properly weight the individual aspects of the position. Thus further complicating the creation of the rules.
Also, the rules must be valid both for the present and for the future. Rule-based semantic parsers are actually not used very widely in the field because, for the most part, people do not use proper English grammar! And the grammar rules do not lead to a unique interpretation even when they are followed.
Difficult to adapt Rule-based algorithms to other purposes
In addition, this approach to creating artificial intelligence is very cumbersome. For each new problem, and new behaviour, a complicated set of rules needs to be devised and implemented. And if the underlying parameters of the problem start to change, for example new common usages are adopted into the rules of English grammar, the program needs to be rewritten. Thus these programs written using solely this methodology tend to be narrow in scope, and constantly need refining.
Machine Learning Approach
The second approach towards Artificial Intelligence scientists have taken is the machine learning approach. Machine learning is a process through which a program is given a corpus of data, such as historical stock information and returns, and a task, or set of tasks, such as predicting the returns of future stocks. The learning algorithm is considered successful if as it's corpus of data, called it's training data, increases, it's ability to complete each task increases.
How machine learning works
More formally, every machine learning algorithm depends on 3 things which need to be able to be programmed. First, there needs to exist an experience set, sometimes called a training set. This is data that the algorithm will “learn” from. Next, there needs to be some task, some action that we're trying to make the machine do. For example a task could be playing a game of chess, predicting the outcome of a game, predicting a stock return. And finally there needs to be some performance measure. Some way for the algorithm to be able to differentiate between two different ways of completing a task. In general, a machine learning algorithm attempts to find it's own rules and methods in order to optimize it's performance measure.
Linear Regression example of machine learning
Least Squares Regression can be thought of as a very limited learning algorithm, where the training set consists of a number of x and y data pairs. The task would be trying to predict the y value, and the performance measure would be the sum of the squared differences between the predicted and actual y's. Of course, generally speaking, we wouldn't choose this performance measure, as most often we would want to measure our performance not on the data in our training set, but rather on future data.
Uses mathematical optimization techniques
Machine learning really started to come into vogue in the 90's as machines became faster, and improved mathematical optimization techniques were developed and refined. These algorithms have proven to be very successful, and have often shown themselves to perform better than the straight rules-based approach.
Deep Blue Example
As a famous example, the chess playing program Deep Blue which challenged then reigning champion Gary Kasparov in 1996 and ended up beating him in 1997 used machine learning in order to play the game. Instead of simply being handed a set of rules on how to value each board position, the computer program was given a large set of board positions which had been evaluated by a group of masters. These masters did not assign a numerical value to the board, but merely indicated whether the position gave an advantage to either side, or whether there was equality on the board. It was then up to the program to decide how to weight the different factors on the board, in order to as closely match the masters' evaluations as possible. The result, which was a computer beating a man who is widely considered one of the best players of all time in chess, was impressive to say the least.
Advantages to Machine Learning
The machine learning approach has several advantages over it's rule based companion.
Expert knowledge in the subject is not necessary
First, expert knowledge is not as necessary. Because machine learning algorithms will learn their own rules and methods in order to solve the problem, they are capable of creating smarter and better rules than the programmer.
Algorithm can be “smarter” than the person
Secondly, machine learning algorithms can build on and use human intuition in finding solutions, as opposed to forcing the human programmer to try and break down his own intuitive decisions. Deep Blue is the perfect example of this advantage, where instead of forcing chess experts to find and properly weight the factors in a position, the experts were allowed to do what they do best, deciding whether a board position is good or not. The actual weightings of the factors was left up to the machine.
Adaptability of many algorithms.
Lastly, good machine learning algorithms can be used for many purposes, and do not need to be maintained as diligently as ruled-based systems. As their body of experience grows, learning algorithms can modify their rules to take into account the new reality.
Because of these advantages, machine learning has started to take over for rules-based systems, although often a fusion of the two are used to tackle real world problems.
Advantages of AI
Artificial Intelligence solutions can offer a lot advantages over human expert decision-making.
First, compared to their human counterparts, AI programs are capable of incredible mathematical precision. Computers have an enormous ability to crunch numbers. Even if you arraged it so everybody on Earth got together and started to add and subtract numbrs, they wouldn't come close to the number crunching ability of a $1000 dollar consumer computer, let alone a fast mainframe super-computer.
Perfect recall and vast memory abilities
In contrast, humans are pretty bad at remembering specific facts. A lot of times only vague or half-remembered truths are used to justify decisions.
Algorithms are reproduceable and dispassionate
Artificial intelligence algorithms are dispassionate, and are not saddled with preconceived notions. They can only make rules which are actually found in the training data sets. This can sometimes be a problem for the AI when the biases and perceptions a human would have are actually accurate, but not reflected in the data. Often times, this is not the case, and the human decision makers generally carry a lot of baggage with them. When things are going well, humans often ignore or trivialize potential negative outcomes. Conversely when things are going poorly, humans often ignore or trivialize potential positive outcomes.
Disadvantages of AI programs
Artificial Intelligence programs are saddled with their own drawbacks however.
Framing the Problem
Because Machine Learning algorithms depend on mathematical optimizations, it can be difficult to transform the question you want answered, or the desired behaviour into a proper mathematical context. Often times, the problems that the machine learner “solves” are only approximations of what you actually want to solve. For example, in finance, mean-variance optimization has become the gold standard in deciding how to create your portfolio. Even making the inaccurate assumptions that people have exponential utility curves and that returns are normally distributed , one would only be led to the belief that the Sharpe Ratio, the mean divided by the standard deviation should be optimized, not mean-variance. However, mean-variance is used simply because it is computationally and mathematically easier to solve for.
Cannot create important factors from a set of rules
It is also currently very difficult for the AI to come up with it's own problem to solve, or to come up with it's own factors. For example, chess programs, and even Deep Blue need to have the factors they use in order to make decisions be fed into them by the programmers. This is somewhat disappointing, since in a very meaningful sense, everything about a game is known. Chess as a game as very clear and explicit win conditions, moethodology for moving pieces, and there is only a finite number of different board positions. (Blondie24) Even so, attempts to create machine learning algorithms which attempt to abstract meaningful factors from the rules have not been terribly successful.
Dependent on programmer to provide it data
The learning algorithm cannot learn what you do not give it. Black Box. Thus the computer programmer needs to be capable of determining which factors are important and relevant. If he misses a factor, the learning algorithm may not perform very well, or even at all. The way around this is the “Kitchen sink” method wherein every conceivable piece of data is given to the machine learning algorithm. This does eliminate the possibility that the programmer leaves out a crucial piece of information, however throwing everything at the problem including the kitchen sink leads to...
Over-fitting the Data
Overfitting the data. Machine learning algorithms are so powerful and so exact, that if one is not careful in designing the learner, one can come up with nonsense. Overfitting is essentially finding patterns and signals which happened to be true in the data set, but will not be true in data outside of the training set. So when actually using the algorithm, poor decisions will be made.
Types of Overfitting
There are two general ways in which an artificial intelligence system may over-fit the data.
The first and most common way to overfit is due to spurious correlations in the data. Spurious correlations occur when, by random chance, a factor given the AI seems to be highly correlated with the output. However, since this correlation happened by chance, it will not be present in data outside the training set.
Bangladeshi Butter Production
An example which has sort of been passed around for a while now is the Bangladeshi butter production. In a study performed in 1995, a Caltech professor took hundreds of data series published by the UN and their respective member countries, and tried to find the data which would have been the best predictor of the S&P 500. The result was that he found over the time period 1983-1993, the production of butter in Bangladesh had the highest correlation, above 0.85. Of course this was a nonsense result, and for years both before 1983 and after 1993, this correlation would be non-existent.
Garbage In Garbage Out
The more factors that are provided to a learning algorithm, the greater the chance is that the algorithm will learn something that is only true over its training set, and will not be true in the future.
The second common method of over-fitting is creating a model that more complicated than the signal it is trying to capture. This problem is not caused by having too many factors, but rather by allowing the algorithm to have too much leeway in deciding how to relate those factors to the output.
Of course, you do not want to arbitrarily curtail the complexity of your model, and harm it's ability to make intelligent choices, so often models are created that can be extremely complex. Overfitting can than be avoided with an appropriate choice of performance measure.
AGE vs Height Example
Scatterplot, perhaps representing the age of a child on the x-axis, and his height on the y-axis.
Least Squares Regression Line
Give the learning algorithm too much leeway, get non-sensical “better” fit
So, to sum up
“Machine Learning Algorithms are like an unstable explosive. When handled correctly, they can move mountains, but when mishandled they are more likely to blow up in your face.”
When to use AI
We can see from these examples that artificial intelligence and machine learning have become a very powerful decision-making tool. Machine learning algorithms really start to shine when the problem offers a large amount of data to learn from. These algorithms have shown a tremendous ability to take these large data sets and accurately combine the different factors in order to create predictions. Often better than even a team of experts. So the question that you should be asking yourself is Why aren't AI investment strategies used more often in investing?
After all, the financial markets offer a bewildering array of information, from prices for each tick, options chains prices, fundamental information about companies and securities, etc. And all this data goes back decades. In addition, the markets seem complicated enough that it seems possible that humans are really taking into account all the relevant information when they make decisions. This should be an area where machine learning algorithms really shine. Yet to date, only a few non-high-frequency traders employ learning algorithms to aid in their investment process.
Challenges of AI Investing
It turns out that when trying to apply machine learning to longer-term investments, a number of challenges begin to arise.
System Constantly Evolving
First, the financial system is constantly changing and evolving. And the system does not necessarily change smoothly.
Market Conditions Constantly Changing
Daily volatilities of stocks undergo rapid shifts every day. So if you are trying to estimate how much a stock will go up using historical data, one has to take into account that the past market that you are using is not the same as the current one. It is similar, but certainly not the same.
Meanings of Factors changes over time
It is even hard to find factors in order to base decisions off of that are not changing. Apple is not the same company today that it was a year ago, and certainly not the same company it was 10 years ago. It used to be that each dollar of revenue, and each dollar of profit Apple made was from the sales of computers. Now of course, the majority of those profits and revenues come from the sales of phones and music players, and this somewhat new industry of tablet computers. So we can see that the individual risk factors, and the way we should be valuing Apple's revenue has shifted dramatically over the past decade. This problem is not really an issue, or at least not a large one, when dealing with shorter-term high frequency traders. When only predicting out a couple days or minutes, you don't have to learn from data 10 years ago, so you don't have to worry as much about the distributions of your factors changing. For longer-term investment horizons, these factor drifts need to be taken into account.
High Noise to Signal Ratio
Another impediment to applying machine learning algorithms to long term investing is that as the time horizon increases on holdings, the volatility and noise also increases. The increase in noise leads to the increase in the danger of overfitting the dataset you are learning from. So what you want to learn is some sort of pattern or signal which will stay true in the future. But when there is a lot noise in the data, it can become easy to start “learning the noise”, which means learning patterns or signals which will not be true in the future, but were just an artifact of the volatility in the system.
Very Complicated and inter-related Systems
To top it all off, financial markets and stock prices are incredibly complicated and chaotic systems. When only making short term predictions, the markets can be fairly easy to model. You have a group of unknown buyers who are willing to buy the stock, and a group of unknown sellers willing to sell. You can gain hints and clues about the numbers of buyers and sellers and what price they are willing to buy and sell at through the order books, both of that stock and other stocks, and from that information, one can try to come up with a way to buy low and sell high. When you start holding stocks for many days weeks months even years, the system becomes much more complicated. Suddenly you have to start taking into account the appreciation or depreciation of the dollar, increased commodities prices, surpluses of natural gas inventories, Greece defaulting on its debt, US GDP figures, China trade deficit, and the list goes on. With so many factors influencing the direction of a stock's price, it can be tempting to keep giving the learning algorithm more and more. However, especially due to the high level of noise in the system, adding more and more factors can easily lead to the Bangladeshi butter problem. It is all too easy to find spurious correlations in the data set which can trip up even the best machine learning algorithms.
So these are the challenges that long-term investing offers to machine learning algorithms. And the reason why more people are not applying machine learning to make investment decisions. Simply going on the web and downloading a Neural Network or Support Vector Machine software package, handing it a bunch of data, and sitting back and wait for results in simply not going to work.
AI Investing in Practice
That is not to say of course that the challenges are insurmountable. We at Rebellion Research have spent many years working on and devising an algorithm which has shown itself capable of surmounting these hurdles. In order to deal with the problems longer-term investment horizons create, we had to make a few adjustments to the learning algorithm.
Large Set of Factors that Correspond to Investment Styles
First and foremost what we did was come up with a large set of factors to feed the machine. These factors were selected in such a way as to span the breadth and depth of investment strategies. What I mean by this is that there is a way of combining our factors to recreate almost every investing style. Whether it's a deep value investing style which cares mostly about the value of the assets a company holds, to a value investor more interested in steady and conservative cash flows, to a growth investor who looks for steady and continued growth rates from a market leader, to a growth investor who's looking for the next up-start company that's a “game-changer” that will revolutionize it's industry. The investment styles encompass most style of investing, including Value/Growth Momentum/Contrarian and Macro styles.
So we reposition the learning problem from finding out which stocks are likely to outperform, and by how much, to which styles of investing are likely to perform well in the future.
Create Stable Factors
By looking at the problem in this slightly different way, however, we are able to solve the problem with the meanings of our factors changing in our historical data set. The sources of a companies revenue will change. The value of a dollar of earnings for a company is not constant over time; it depends on how that dollar was made, and how likely that company is to keep earning that dollar. However, investment styles, although they have grown and become more nuanced with time, are relatively constant. A deep-value style concerned with making sure a company's liquidation value is greater than it's market cap will make the necessary adjustments when a company's assets start to deteriorate (but stay the same on the balance sheet). Thus, this allows us to actually use over a decades worth of historical data, without it losing relevancy to the styles today.
Factors contain lots of relevant information
Unfortunately, using these factors does have a drawback. In order to span the space of investment styles, which is a huge space, you need a lot of factors. In fact we use several thousand factors in order to make our investment decisions. That is not to say that all of these factors are incorporated for each stock, only about 30-40 of them are, but still the presence of all of these factors is going to lead to a much greater chance for overfitting.
Correcting For Over-fitting
For example, we can imagine a naïve way of using these factors to make an investment portfolio. One could look at how the factors have performed historically, and come up with the investment style which has performed the best over the past decade. Then after identifying that style, simply buy all the stocks which correspond to that style, sit back, and watch the alpha accrue. Unfortunately, this approach is almost certainly doomed to failure. The risk that the single best investment style would just be the style that happened to work well over the past decade, but will not work in the future is simply way too high. When dealing with so many factors, some are always bound to float to the top, or near the top of the list just by random chance.
Modified Bayesian Learner
So that brings me to the second modification we made. Which was to use a modified Bayesian Learner. Now Bayesian learners are well known, but very useful algorithms to use for machine learning.
Bayesian Learners Update probabilities given new information
Bayesian learners get their name because the rely on what is known as Bayes' theorem in probability, which dictates how probabilities of events should be updated given new pieces of knowledge. What is particularly appealing is that the programmer can specifically model and control the ability of the learning algorithm to actually learn the data.
Rate of Learning can be controlled
That is the programmer can set the burden of proof the data in the training set needs to overcome, and how precise the eventual predictions can be. By controlling the precision of the algorithm then, we can make sure that a single strategy does not completely dominate all the others.
Algorithm does not become too precise
Therefore our investment decisions on a single stock is not dominated by any one particular strategy, but is influenced by about 30-40. The number of strategies which influence our portfolio as a whole is much larger.
Overcoming the Bangladeshi Butter Problem
This helps to mitigate the risks of spurious correlations. To be sure, a lot of these investment styles we hold in a positive light will turn out to be a result of spurious correlations; they will not be good indicators of stock performance in the future. However, by relying on many different investment styles in order to make our investing decisions, we can make sure that at least some of our factors are meaningful, and thus the fate of the fund does not rely on making sure any 1 investment style will continue to perform in the future.
A corollary to this is that using this modified Bayes learner to make predictions about our stocks, is that it is more resilient to changess in future performance of investment styles. As we know, sometimes one investment style can be superior to another style for very long periods of time. Throughout the 90's, growth strategies were significantly better than value strategies, and after the dot-com crash, suddenly value again reigned supreme. Thus, even if a certain investment style tends to do well over the long-term, it can certainly suffer prolonged periods of lackluster performance. Because our investment decisions rely on such a variety of different investment styles, the effect of a couple styles suddenly switching their performance characteristics is mitigated.
So, to wrap things up, we can see that long-term investing horizons gives some hurdles to applying artificial intelligence then many other fields. However, with a certain amount of foresight and careful design of the learner, these challenges can be surmounted and I fully expect to see machine learning algorithms playing a larger part in actually investing in the stock market, as opposed to just trading in it.
Read more from RebellionResearch.com:
Written by Jeremy Newton, Edited by Alexander Fleiss
We just sent you an email. Please click the link in the email to confirm your subscription!