What Should Unsupervised Machine Learning Be Used For?

What Should Unsupervised Machine Learning Be Used For?

Artificial Intelligence & Machine Learning

Unsupervised Machine Learning In Stock Market

In recent years, machine learning has gained a huge amount of attention and been widely applied in various fields. As the survey conducted by BarclayHedge in 201856.4% of hedge fund pros implement a machine learning approach in their investment decisions, in contrast back to a year ago this percentage is only 20. The majority of machine learning approaches applied on the stock market are supervised learning based, like penalized regression is being used for stock returns prediction and classification methods for selecting stocks for portfolio optimization. However, this paper aims to explore the benefits of utilizing unsupervised learning on the stock market.

Unsupervised machine learning, also known as clustering, is the task of organizing a set of data points into a subset in such a way that data points in the same group exhibit more significant similarity to each other than to those in the other group. K-mean clustering is one of the most popular clustering approaches, it repeatedly partitions observations into K non-overlapping clusters in which each observation belongs to the cluster with the nearest mean. One application of K-mean clustering on the stock market is to train an algorithm to group companies based on their stock price’s movement over the chosen period.

We subtract the open price from the close price of each company’s stock at every date to obtain stock’s movements and use it as cluster features. Since different companies have different scales between the price movements, we need to normalize our features, which is the movements using their cross-sectional means and standard deviation to eliminate any skewed clustering. After we determined our K for K-mean cluster and got the normalized features, we could use the trained model to actually predict the cluster label of each company. At the end we should see companies within the same industry be clustered together as banks clustered with banks and hospitals clustered with hospitals hence the analyst could use cluster results to do research like fundamental analysis and corporate bond’s risk analysis.

Besides research, unsupervised machine learning could also serve for investing strategy. A famous algo trade strategy is pair trading strategies, it has a lot of varied approaches to achieve including cointegration, time series and distance approach. The cointegration pair trading strategy, which is the most mainstream approach, first pioneered by Gerry Bamberger and later led by Nunzio Tartaglia’s quantitative group at Morgan Stanley in the 1980s. It is a market neutral strategy that matches a long and short position in two cointegrating stocks, at any time point when the correlation between the two securities weakens short the outperform stock and long the underperformance stock. Due to the strategy’s nature as market neutral, this strategy is aiming to profit from the convergence of the two stock prices in the long run regardless of the stock market’s performance. However, using only past stock price data to determine the correlated stock pairs may not able to guarantee the convergence of stock pair’s price in the future since stock with similar characteristics are more likely to move together in the future, past stock price is only a single dimensional data and it is insufficient to explain all the different characteristics of stock. Unsupervised learning could help us to cluster companies not only by past stock price but also including more characteristics of stocks.

As this paper mentioned in the second paragraph, the K-mean cluster could become used to group companies by their historical returns as features.

We could further add some other characteristics for our exploration of similar stock pairs. For example, in Green Zhang’s paper they are trying to use OLS to derive characteristics that are significantly related to stock such as beta, leverage, current ratio or quick ratio. Those new added features will help us more accurately separate stocks into clusters. Agglomerative clustering is another clustering method that treats each data point as a single cluster at the beginning but clusters step by step after until the cluster goal is achieved. 

Before we do any kind of clustering on stocks, we always need to normalize and scale our data to avoid skewed clustering. To normalize features we could simply subtract cross sectional mean from stock price and divide by standard deviation. Due to the distance measurement applied in cluster algorithms, our cluster model will assign equal weight to all features without considering the significance level of each feature, unsupervised learning could also be utilized on this issue. PCA is a popular technique for dimension reduction when dealing with high dimensions large dataset, it reduces highly correlated features of data into a few main, uncorrelated composite variables. If we apply PCA on our normalized features and eliminate principal components that have little explanation on total variance before we do the K-mean clustering on stock, we can get over the skewed clustering issue.

In conclusion, unsupervised machine learning works well on both clustering stocks and dimension reduction in clustering; it could be utilized in the stock market.

Yiyang Yao



Artificial Intelligence & Machine Learning

What Should Unsupervised Machine Learning Be Used For?