Stock Returns : Investigating Short-term Determinants of Stock Returns Using Classification
- Portfolio Construction, Risk Management and Transaction Costs 11
- Benchmarking and Results 13
- Limitation and Future improvements 15
- Reference 17
- Background and motivation
Predicting stock price is one of the most widely studied problems, attracting researchers from all fields including economics, history, statistics, mathematics, and computer science. The volatile nature of the stock market makes it difficult to apply simple time-series or regression techniques.
According to the famous “Efficient Markets Hypothesis theory”, stock prices have already reflected all available information and therefore the stock moving direction would be a random walk.
Nevertheless, there are certain market phenomena that actually run contrary to this theory. For instance, Jegadeesh and Titman in the “profitability of momentum strategies”(1993) found that in the short term, stock prices tend to exhibit momentum.
Stocks that have recently been increasing continue to increase, and recently decreasing stocks will continue to decrease. This type of tendency reflects some predictability to future stock prices in a short term, contradicting the Efficient Markets Hypothesis theory.
In modern society, the momentum strategy is one of the most academically investigated effects with strong persistence. Yet, the theory about momentum indicates that stocks’ trend to continue for a considerable time, which means stocks that have performed well in the past would still perform well.
On the other hand, stocks which have performed poorly in the past would continue to perform badly. This effect works in stock markets in developed countries and emerging markets. In order to perform a daily trading strategy under momentum theory, we forecast the stock’s moving direction (up or down) using a model with 18 momentum factors with support vector machine strategy.
- Data collection and Cleaning
We first gather a list of tickers that either existed or still exist in S&P 500 from Wharton Research Data Services in order to avoid survival bias.
We select all the tickers to be our stock universe, which assures that our project is fair and reasonable in the data perspective. With these tickers, we fetch their open, high, low, close prices, volume, historical and average implied volatility from the past 10, 20, 30 days from the Sharadar daily equity price database and the Vol database using the Quandl API.
After merging all the requested data together, we get a giant data frame, in which the stocks are grouped by the ticker name and ranked by ascending date.
For data cleaning, we eliminate about 2% of the whole dataframe, where the rows have NaN values. Because of the small proportion of invalid data, the elimination will not cause significant effect on the following process. As a result, we have a data universe that entails 719 stocks for each day, and over 1.6 million entries in total.
- Prediction Method
3.1 Factor Construction
In our project, we investigated the GuoTaiJunAn Securities 191 alpha factors (GuoTai JunAn Securities) and selected the 12 most significant ones among them, which are related to price and volume, accompanied by stocks’ historical volatility and implied volatility mean within 10, 20, 30 days, to build an 18-factors final model.
Figure3.1 18 factors selected in the final model
The factors are all constructed using daily-frequency price and volume statistics, and we also have data cleaning in this part. For example, in constructing the Alpha54, we replaced the negative and positive infinity with NaN. On this basis, we constructed a style-neutral multi-factor stock selection strategy based on short-term price and volume characteristics.
Noticeably, these 18 factors all capture past information since they all require rolling computational strategy in formulas, which in turn serves as the cornerstone of our momentum strategy that we could use past information to imply a short term moving tendency.
In order to train our model, which is a supervised classification problem, we also transformed our target variable Y.
Using the formula (Close-Open)/Open, we calculated the percentage of the price change each day, and shifted the column so the Nth day factors are all aligned with the N+1th day returns. Then, we ranked our stocks daily with their returns in the descending order.
Finally, we transformed the daily returns using the quantile cut function, and divide them into 5 groups: the top 20% of the stocks with the highest returns will get the signal 4, the next 20% will get the signal 3, and so on. In our portfolio construction, which is discussed about later, we will only focus on longing the stocks that are predicted to have the signal 4, which has the highest returns among all.
Figure3.2 Transformed data
3.2 Support Vector Machine
The SVM model is a supervised machine learning model that is mainly used for classifications.
Other than the well-known techniques such as naive Bayes, decision tree, rule induction, Support Vector Machine (SVM) has gained more attention and has been adopted in data classification problems in order to find a good solution.
SVM has been proven to perform much better when dealing with high dimensional datasets.
The SVM algorithm has an advantage that it is not affected by local minima, furthermore it does not suffer from the curse of high dimensionality because of the use of support vectors. However, SVM performance highly depends on parameter setting and its kernel selection. The selection quality of SVM parameters and kernel functions have an effect on the learning and generalization performance.
3.3 Kernel Method
SVM learns how to separate different groups by forming decision boundaries. It sounds simple However, not all data are linearly separable. In fact, in the real world, almost all the data are randomly distributed, which makes it hard to separate different classes linearly. In this time, we can use the kernel method to project the data to other dimensions and meanwhile, with little computational cost even in very high-dimensional spaces.
Figure3.3 Idea of Kernel Method
Different SVM algorithms use different types of kernel functions. For example, linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid.
Figure3.4 Kernel Functions
3.4 Hyperparameter Tuning
Hyperparameters are very critical in building robust and accurate models. They help us find the balance between bias and variance and thus, prevent the model from overfitting or underfitting.
First, we need to know, SVM tries to
1.Increase the distance of decision boundary to classes
2.Maximize the number of points that are correctly classified in the training set
There is obviously a trade-off between these two goals. Decision boundaries might have to be very close to one particular class to correctly label all data points in the training set. However, in this case, accuracy on the test dataset might be lower because the decision boundary is too sensitive to noise and to small changes in the independent variables.
On the other hand, a decision boundary might be placed as far as possible to each class with the expense of some misclassified exceptions. This trade-off is controlled by the C parameter.
If C is small, the penalty for misclassified points is low so a decision boundary with a large margin is chosen at the expense of a greater number of misclassifications. If C is large, SVM tries to minimize the number and cause overfitting.
Gamma controls the sparseness of the data when samples are mapped to high dimensional space. Large gamma values result in too specific class regions, which could lead overfitting.
Grid search optimizes the SVM parameters (C, gamma, degree, etc.) using a cross validation (CV) technique as a performance metric. The performance of every combination is evaluated based on some performance metrics. The goal is to identify good hyper-parameter combinations so that the classifier can predict unknown data accurately.
One of the biggest problems of SVM parameter optimization is that there are no exact ranges of C and gamma values. We believe that the wider the parameter range is, the more possibilities the grid search method has of finding the best combination parameter.
Therefore, in our project, we decided to make the range of C and gamma from 0.01 to 10.
We applied sklearn’s GridSearch method with five-fold cross validation on 10% of our total data to retrieve the mode of best parameters of each iteration: C = 0.5, Gamma = 0.1, Kernel = Radial Basis Function.
The training accuracies for all classes range from 32% to 38.5%, and the testing accuracies for all classes range from 21% to 33%, an overall 1% to 13% premium over random guessing. Exemplary train test accuracy is shown in Figure 3.6.
Figure3.6 Exemplary train test accuracy
Figure3.7 Hyperparameter Tuning
- Portfolio Construction, Risk Management and Transaction Costs
Since our prediction frequency is daily, we decide to develop a daily trading strategy. Therefore, we decided to develop a model to train the factors from 1 to N days, with the transformed Y value from 2 to N+1 days, and implement the model on the N+2 day to prevent biases or the use of future data.
After trying different values for the number of training days N, including 7, 14, 30, 60, 90, and 120, we decided to fix our training days to 14 days, which means we will train the model using every 1-14 days of factors, predict the value of Y for 2-15 days, and implement the trained model immediately with the 16-th day models and make the corresponding prediction (which will be the transformed signal Y on the 17th day).
Theoretically, we will get the probability of each rank for each stock, and the rank with the highest probability will be chosen as this stock’s class for that specific day.
According to the algorithm, however, because we also bring into the c parameter and gamma parameter which will be mentioned in the next section, the weight is not the only factor that affects the final selection of ranks. After getting the results, we sort the stocks by their classes, and our strategy picks all stocks with highest performance to trade, which are the ones with class 4. The amount of stocks traded each day varies between 85 to about 120.
As we discussed earlier, in our model, the class 4 stocks are the ones predicted with the highest 20% of returns among all the stocks in our portfolio, based on the ‘confidence’ in the SVM outputs. This indicates the probability of the stock belonging to class-4. We chose to trade all of them after examining the results of trading only the top 30, 50, 80 stocks ranked by their “confidences”.
In addition, we decide to construct our portfolio based on the “confidence” defined above. Compared to the traditional 1/N method, we allocate our position relative to each stock’s normalized “confidence”, defined as:
Due to the fact that we do not have access to intraday data, risk management in our strategy can only be achieved by limiting the position we allocate to each stock instead of the usual stop-loss order approach.
We applied a more straightforward way to reduce the risk, which is to only trade the stock that weighs less than 1% of the portfolio to aim for a maximum drawdown of 10%.
Finally, we take transaction costs into consideration under the backtesting process.
To be more specific, we take the effects of trading fees and slippage into account.
Typically, the trading commissions are usually 0 these days, but there could be tax which is about 2 basis points per trade, so we take that as a part of the transaction cost. The strategy trades stocks when the market opens in the morning and closes in the afternoon.
Nevertheless, because of the limitation of trading, we cannot trade all the stocks at the same time, which leads to the bid/ask spread changes. The trading prices will not be strictly the open or close prices in the dataset. Thus, we take a slippage = 0.01% to eliminate the effects of bid-ask spread.
- Benchmarking and Results
We assume a starting position of 100,000 dollars at 2010-01-01. Each day, we construct our portfolio based on the method discussed above, buy them at their open prices and sell them at close prices (both with slippage considered and fees paid immediately).
The result is summarized below:
|Sharpe ratio (r=3.22%)||0.34|
|Strategy annualized return||6.08%|
Figure5.1 Backtesting Result I
From the result of backtesting, we have a maximum drawdown of -17.3%, which is higher than the 10% target we set for risk management.
This strategy has a Sharpe ratio of 0.34 compared to the risk-free rate from 2010 (10y Treasury Note) and an annualized return of 6.1%. In total, we have cumulative returns of 80.4% over the 10-yield period.
To benchmark our performance, we set two references: One is investing 100,000 dollars in SP500 in 2010-01-01 and holding it till 2020; the other one is day-trading SP500 at open and close with the same fee and slippage structure. The result is shown below:
Figure5.2 Backtesting Result II
Figure5.3 Comparison Plot with SP500
The “holding SP500” method turns out to be the most profitable: it generates 11.45% annualized return. The beta of our strategy is close to 0.3 on average, however, our strategy did turn out to have alpha – at least during 2010 to 2015 – as compared to the “day-trading SP500” method, which generates an annualized return of -7.6% during the years, possibly due to paying large amounts of trading fees.
- Limitation and Future improvements
We acknowledge several limitations to our model. One limitation is the selection of factors. In this model, we only selected 18 factors that are related to volatility, price, and volume, but there are more possibilities for us to explore, such as applying more effective factors to the model. Furthermore, our strategy has alpha decay in 2016.
We think it might be due to the fact that we only have momentum and there is no mean-reversion, and our model is highly sensitive to inputs and parameters. We assume the same market structure for 2016 as in 2010, yet the market structure changes over years and in 2016 the market weighs more on technology stocks, but we did not see the SVM try to adjust the weight and we still weigh on stocks such as energy or financial firms.
Figure6.1 Stocks traded in 2016
Another limitation is with regards to collecting data. We collected and utilized the daily stock price data. However, in the real world, changes happen on the order of minutes or even seconds. For more accurate results, perhaps we should focus our price data at a far more granular level, at intra-day trading data.
By observing intraday trends, we can create more robust models that capitalize on sudden changes in a momentum method. Also, when our model is predicting the future returns based on the daily stock prices, it would be better to tune our parameters using more data.
However, because of the limited approaches to raw data, we only get the data for a 10-year period, which might affect the accuracy of the model prediction.
Overall, our model can be improved by having more significant factors and doing relevant research about which factors are effective in the long term in considering different market scenarios, more parameters tuning, and a more accurate dataset with an appropriate time span.
Written by : Zhicheng Liang, Wenxin Mu, Zhanbo Peng, Jiawen Zhang & Jing Zhang
Compustat Daily Updates – Index Constituents. Retrieved from: https://wrds-sol1.wharton.upenn.edu/output/15eea9e21df966f1.csv
Fubing, L., Chen, L., Aolin, C., Fanxue, M. (2017). 国泰君安-基于短周期价量特征的多因子 选股体系——数量化专题之九十三. GuoTai JunAn Securities. Retrieved from: https://guorn.com/static/upload/file/3/134065454575605.pdf