Quantitative Trading Based on Machine Learning

Quantitative Trading Based on Machine Learning 


Dr. Igor Halperin on Reinforecement Learning & IRL For Investing & The Dangers of Deep Learning

Quantitative Trading Based on Machine Learning 

In recent years, many machine learning algorithms have been successfully applied in the field of quantitative trading. However, most of them require manual selection of parameters and features based on market conditions. This affects how the model performs in complex market situations. We plan to build an unsupervised machine learning model, including using clustering, dimensionality reduction algorithms to select stocks and extract features, as well as using reinforcement learning methods in CNN to adjust parameters, let the model adapt to changes in various markets, automatically retrain and give Trading strategy . 

1 Problem statement 

Our group plans to design an automated portfolio generating system based on machine learning algorithms. Including comparing the performance of various classification algorithms and time series algorithms on financial market data, as well as a trading strategy that is automatically executed based on the current state of the market. 

2 Significance 

Most current trading algorithms rely on manually selecting factors and manually adjusting parameters based on market conditions. Inspired by the flexibility and feature learning capabilities of CNN, financial data can be organized into CNN as two-dimensional ”data pictures”, supplemented by new features extracted by unsupervised algorithms, to design the network structure of financial markets. In addition, the advantage of this model is that it can predict stock prices without manual intervention, and automatically adjust parameters according to market conditions, thereby reducing information loss and improving prediction accuracy in quantitative trading. 

3 Related Work 

Many existing machine learning methods are being applied in quantitative trading. Linear regression is probably the most widely used method for discovering strong empirical regularities among a large amount of data. It can be applied for predicting excess returns in large stocks. Let X denotes a vector of p predictive variables, Y denotes the return on the asset, α denotes a column vector of coefficients and ϵ denotes for a normally distributed random error term with a zero mean. Then, the model of asset returns can be represented as: Y = + ϵ.[6] K-nearest neighbor algorithm can also be used for predicting stock price in business.

Moreover, the predictions can be extremely close and almost parallel to the actual prices. In classification problems, KNN can compare a test set with the training data using similarity metrics. Not only is it robust with a very small ratio according to the results, but also the results are rational and reasonable. Moreover, regarding the Boosting algorithm, it can be used for forecasting stock returns as well. Overfitting will occur if a model picks up noises instead of signals. In the case of forecasting stock returns, it’s extremely important to avoid overfitting because of its low signal-to-noise ratios. Errors always decrease and then become negligible when the number of boosting iterations keeps increasing.[8] 

Among machine learning models, genetic programming plus random forest model performs well on stock prices

Moreover, prediction and inspires the modification to the original AlphaNet. Its essence is to apply the genetic algorithm to the parameter adjustment process of the random forest, and to optimize the parameters of the random forest algorithm with the good search ability and flexibility of the genetic algorithm. The method makes the model more interpretable and can become directly embedded in the feature extraction layer of the neural network in our method, covering all operation functions in genetic programming. Furthermore, the disadvantage is that the model process is complex and requires more manual intervention. When the stock pool, forecast period, and data frequency change, the model needs to be retrained, this shortcoming can be overcomed by our proposed method which is an end-to-end model. 

4 Model Framework 
4.1 Data Preparation 

We will choose S&P 100 as our initial stock pool. We plan to download daily adjusted closing prices for the whole 100 stocks from 2020-01-01 to 2022- 01-01. Furthermore, will use the first half set of data (2020-2021) to be the training data of model prediction and then the second half(2021-2022) to be the testing data to evaluate our model performance. We will also get the daily return of OEX from 2021-01-01 to 2022-01-01 as a market index for comparison. 

4.2 Stock Pre-selection Process 

Since we hope to acquire a smaller stock pool in order to optimize the computing resource, we need to pick stocks from the SP 100 stock based on certain principles. Here we check the stock data from 2020-01-01 to 2022- 01-01 and try two different approaches as shown below. 

4.2.1 Based on diversity 

From the S&P stock pool, we want to pick 30 stocks to represent the whole 100 stocks based on stocks’ data from 2020-01-01 to 2022- 01-01. Here we first applied the K-means method clustering the whole 100 stocks into 10 groups. And then pick one stock from each group based on its profit during 2020-01-01 to 2022- 01-01. After that, we take out these 10 stocks and apply the K-means method again, clustering the remaining 90 stocks into 10 groups and then pick one stock from each group based on its profit again. At last, we similarly took 10 stocks from the remaining 80 stocks. Here, we finally got 30 stocks as our new stock pool which we believe would represent the original 100 stocks while keeping the diversity. 

4.2.2 Based on stability 

From the S&P stock pool, we picked the top 30 stocks with the highest Sortino ratio. The Sortino ratio is a modification of the Sharpe ratio and measures the risk-adjusted return of the stocks. The Sortino ratio S becomes calculated as S = (Rp − rf )d, where Rp is the actual or expected portfolio return, rf is the risk-free rate in addition σd is the standard deviation of the downside (negative returns). A higher Sortino ratio means that the stock is earning more return per unit of the bad risk that it takes on, which also indicates a higher stability of the stock. Therefore, the new 30 stocks would be the default portfolio that we started with. 

4.3 Return Prediction 

The method that we are proposing is using a model called AlphaNet to solve the problem of predicting stock prices. Moreover, AlphaNet became initially designed to do automatic image matting and performs well on many image datasets. Inspired by the flexibility and feature learning capability of AlphaNet, we can organize financial data as two-dimensional ”data pictures” into AlphaNet and supplemented with new features extracted by unsupervised algorithm. In addition, to apply AlphaNet in the field of quantitative trading, previous research has made several modifications to the original AlphaNet and achieved good performance on real datasets.

Figure 1 shows our modified AlphaNet structure for stock data. Similar to the way in CNN, the input of AlphaNet becomes organized as data images. Each stock’s data from t − n to t forms a data image. Therefore, if we have n stocks in a time interval, we can obtain n data images of this interval. In the data image, every column refers to a time step (e.g. a single day) and different rows denote different kinds of values. Futhermore, these values are open (opening price), high (highest price), low (lowest price), close (closing price), volume, return 1 (one-day yield), and two ratios volume/low and low/high. We can also add more selected features into the data images. After we obtain 

Figure 1: structure of modified AlphaNet 

the input with preprocessing, first, the input is processed by a feature extraction layer, in the original AlphaNet, this layer consists of two convolutional layers, while in our modified version, while in our modified version, these two convolutional layers are replaced by a convolutional layer for feature extraction and a normalization layer to ensure nonlinearity, this replacement is inspired by the research of genetic programming’s application in quantitative trading. Furthermore, Table 1 shows the definition of these functions. Next, the extracted features become put into two RNN or LSTM layers. Furthermore, the features obtained also become concatenated and put into a fully connected layer. Then we can get a ten-day yield of each stock which is the label of each data image. 

In conclusion, based on this modified AlphaNet, we propose our new model. That is, for day T + i, select the portfolio according to the data of the previous i day, then predict the returns of the stocks in the portfolio on the T + i + 1 day and make a trading strategy, and repeat the above steps at day T + i + 1. 


[1] Israel, Ronen, Bryan T. Kelly, and Tobias J. Moskowitz. ”Can Machines’ Learn Finance?.” Journal of Investment Management (2020). 

[2] Lin, Chen, et al. ”Mining of stock selection factors based on genetic programming.” Financial engineering research of Huatai Securities. 2019. 

[3] Lin, Chen, et al. ”AlphaNet: Factor Mining Neural Networks.” Financial engineering research of Huatai Securities. 2020. 

[4] Sharma, Rishab, Rahul Deora, and Anirudha Vishvakarma. ”AlphaNet: An Attention Guided Deep Network for Automatic Image Matting.” 2020 International Conference on Omni-layer Intelligent Systems (COINS). IEEE, 2020. 

[5] Chen, Liang-Chieh, et al. ”Encoder-decoder with atrous separable convolution for semantic image segmentation.” Proceedings of the European conference on computer vision (ECCV). 2018. 

[6] Desai, V.S., Bharati, R. A comparison of linear regression and neural network methods for predicting excess returns on large stocks. Annals of Operations Research 78, 127–163 (1998). 

[7] Alkhatib, K., Najadat, H., Hmeidi, I., Shatnawi, M. K. A. (2013). Stock price prediction using k-nearest neighbor (kNN) algorithm. International Journal of Business, Humanities and Technology, 3(3), 32-44. 

[8] Rasekhschaffe, K. C., Jones, R. C. (2019). Machine learning for stock selection. Financial Analysts Journal, 75(3), 70-88.

Quantitative Trading Based on Machine Learning