Can machine learning help in trading? Machine Learning Pair-trading Strategy

Can machine learning help in trading? Machine Learning Pair-trading Strategy

Machine Learning For Investing

Pairs trading is a trading strategy that involves matching long and short positions in two stocks that are highly correlated. Since finding trading pairs in numerous stock pools is time-consuming, we develop a framework for pair trading using an unsupervised approach, which uses PCA and DBSCAN to search for trading pairs in stock pools. And execute the trading strategy based on the ratio z-score of the two stocks, using the framework to back test 128 trading pairs from 2016 to 2021 and get a return of about $40,000. 


Pair trading is a kind of trading strategy for hedging risk which involves matching the short position with the long position in two underlying securities having a high positive correlation. The strategy observes the performance of the two correlated stocks and buys the undervalued share together with selling the overvalued share, and that’s how profits can be sought. 

Pair trading involves the monitoring of historically correlated securities. It works on a big assump tion: the market is neutral. This implies that the two securities in a pair that have moved historically in the same direction will keep on moving in the same direction. Therefore, the choice of the two secu rities is basically a pair having the same industry or are direct competitors. Moreover, the assumption also expects that the outperforming stock will come back to the neutral price (decrease) whereas the underperforming stock will also go back to the neutral price (increase). 

There are some advantages of pair trading. First, pair trading is able to mitigate potential losses and risks. This is because the strategy involves dealing in two securities so if one is underperforming then there are chances that the other absorbs the losses from the underperformance. Second, pair trading helps the trader to earn profits regardless of the conditions of the market, no matter the market is increasing or declining or swinging, etc. In fact, the best advantage of pair trading is that the trader is completely hedged, which usually does not happen in normal trading. In pair trading, hedging is done as the trader sells the overvalued security and buys the undervalued security, which limits the chances of loss. 

Therefore, pair trading is a powerful strategy, which based on the assumption that the market is neutral. Our model uses an unsupervised approach to develop a framework for pair trading and search for trading pairs in stock pools. 

Trading pair detection 

The trading pair detection algorithm can be simplified as a clustering algorithm based on corre lation, there are many unsupervised learning algorithms that can be used for trading pair detection, such as k-means, DBSCAN, PCA(Principal Component Analysis), etc. 



K-means is a popular method for cluster analysis in data. It aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. It is relatively simple to implement while we need to manually set the parameter k. 

In our model, we introduce K-means to cluster the stocks, combined with covariance to get histori cally similar pairs of stocks. After K-means clustering, we have found 137 clusters. However, K-means clustering contains all the outliers, leading to a lot of clusters that only contain one stock in it. It is not a good method for us to find stock pairs efficiently. 

import numpy as np 

from sklearn import cluster, covariance 

edge_model = covariance.GraphicalLassoCV(alphas=4, n_refinements=4, tol=0.0001, enet_tol=0.0001, max_iter=100, mode=’cd’, n_jobs=None, verbose=False, 


X = returns.copy() 

X /= X.std(axis=0) 

_, labels = cluster.affinity_propagation(edge_model.covariance_) 

n_labels = labels.max() 


for stock in stock[‘Adj Close’].pct_change().iloc[1:,:].dropna(axis=1).columns.tolist(): names.append(stock) 

names = np.array(names) 

for i in range(n_labels + 1): 

print(‘Cluster %i: %s’ % ((i + 1), ‘, ‘.join(names[labels == i]))) 


Since K-means gets too many clusters, we think it is likely to be caused by noise in the dataset, so we use PCA and DBSCAN to improve the K-means algorithm. DBSCAN is a density-based clus tering algorithm that forms clusters of dense regions of data points, ignoring low-density regions. The advantage over K-means is that it can be applied to noisy datasets and can easily identify outliers. In DBSCAN, points are divided into core points, (density-) reachable points and outliers. See Fig.1. 

Figure 1: DBSCAN algorithm 

If a point is found to be a dense part of a cluster, its neighborhood is also part of that cluster. All points found within the neighborhood are added, as are their own neighborhood when they are also dense. When a densely connected cluster is completely found, a new unvisited point is retrieved and processed, thereby discovering another cluster or noise. Use the following code to perform DBSCAN clustering on stocks. First, use PCA to reduce the dimension of the data. 

We choose to retain 50 principal components because according to the cumulative variance diagram of Fig.2, we can see that 50 principal components retain 70% of the variance of the original data, which is enough for the clustering algorithm.Then use DBSCAN to cluster the dimensionality-reduced data, and the final number of clusters is 13. It can be seen that compared with k-means, the number of clusters is much less, because DBSCAN can avoid the interference of noise. 


pca = PCA(n_components=N_PRIN_COMPONENTS) 



X = np.hstack((pca.components_.T,)) 

X = preprocessing.StandardScaler().fit_transform(X) 

clf = DBSCAN(eps=3, min_samples=3) 

labels = clf.labels_ 

n_clusters_ = len(set(labels)) – (1 if -1 in labels else 0) 

print(“\nClusters: %d” % n_clusters_) 

Figure 2: Cumulative variance diagram of PCA 
Pair selection 

After the clustering is completed, we can select trading pairs in the clustering. For K-means, we select trading pairs according to the correlation coefficient between two stocks. If the correlation coefficient is greater than 0.2, they are considered to be a trading pair. 

For DBSCAN, we select trading pairs according to cointegration, because the data used in dbcans is the data after dimensionality reduction, and cointegration can better reflect the long-term relationship of the data than the correlation coefficient. 

The trading pair selection code for DBSCAN is as follows. 

def find_cointegrated_pairs(data, significance=0.05): 

n = data.shape[1] 

score_matrix = np.zeros((n, n)) 

pvalue_matrix = np.ones((n, n)) 

keys = data.keys() 

pairs = [] 

for i in range(n): 

for j in range(i+1, n): 

S1 = data[keys[i]].dropna(axis=0) 

S2 = data[keys[j]].dropna(axis=0) 

result = coint(S1, S2) 

score = result[0] 

pvalue = result[1] 

score_matrix[i, j] = score 

pvalue_matrix[i, j] = pvalue 

if pvalue < significance: 

pairs.append((keys[i], keys[j])) 

return score_matrix, pvalue_matrix, pairs 


Visualization plays an important part in order to learn more about the available data and to identify any main pattern. After the clustering algorithm, we hope to find a visual means to evaluate the effect of clustering. We tried the following two algorithms, Locally linear embedding and t-SNE(t-distributed Stochastic Neighbor Embedding). 

Locally linear embedding 

Locally linear embedding (LLE) seeks a lower-dimensional projection of the data which preserves distances within local neighborhoods. It can be thought of as a series of local Principal Component 

Analyses which are globally compared to find the best non-linear embedding. Unlike decomposition methods such as PCA, it generally uses nearest-neighbors approaches to embedding, allowing them to capture nonlinear structures that would be otherwise lost. We can use python package to construct a Locally linear embedding visualization. The result is shown in Figure 3. 

Figure 3: Locally linear embedding result 

Notice that the generated plot is a 2-dimensional scatter plot. From the plot, the points with the same color mean the same cluster group, and the letter aside each point is the name of the stock. There are also some lines in between two points, which indicates a high correlation between the two points and can be treated as a pair. 

Although the result indeed gives us a 2-dimensional plot that helps us to understand the distribution of the data, however, we still think that this method has some disadvantages. Most importantly, we can see that from the above plot, some points are too dispersed while others are too crowded. This cannot help us to cluster the data well. Therefore, we would like to reject the locally linear embedding visualization and try other methods. 


Due to the poor performance of Locally linear embedding, we try to use t-SNE, t-distributed Stochastic Neighbor Embedding (t-SNE) is a statistical method used to visualize high-dimensional data by giving each data point its location in a 2D or 3D map. It is a nonlinear dimensionality reduction technique ideal for embedding high-dimensional data for visualization in a low-dimensional space in two or three dimensions. Specifically, it models each high-dimensional object by 2D or 3D points, such that similar objects are modeled by nearby points, and dissimilar objects are modeled by distant points with high probability. We use the following code to implement t-SNE. 

r_tsne = TSNE(learning_rate=1000, perplexity=25, random_state=1337).fit_transform(X.T) embedding = r_tsne.T 

plt.figure(1, facecolor=’w’, figsize=(10, 8)) 


ax = plt.axes([0., 0., 1., 1.]) 


partial_correlations = edge_model.precision_.copy() 

d = 1 / np.sqrt(np.diag(partial_correlations)) 

partial_correlations *= d 

partial_correlations *= d[:, np.newaxis] 

non_zero = (np.abs(np.triu(partial_correlations, k=1)) > 0.03) 

plt.scatter(embedding[0], embedding[1], s=100 * d ** 2, c=labels, 


start_idx, end_idx = np.where(non_zero) 

segments = [[embedding[:, start], embedding[:, stop]] 

for start, stop in zip(start_idx, end_idx)] 

values = np.abs(partial_correlations[non_zero]) 

lc = LineCollection(segments, 


norm=plt.Normalize(0, .7 * values.max())) 


lc.set_linewidths(15 * values) 


plt.xlim(embedding[0].min() – .15 * embedding[0].ptp(), 

embedding[0].max() + .10 * embedding[0].ptp(),) 

plt.ylim(embedding[1].min() – .03 * embedding[1].ptp(), 

embedding[1].max() + .03 * embedding[1].ptp()) 

Visualize the results of k-means and DBSCAN respectively, and the results are shown in the Fig.4 and Fig.5. 

Figure 4: Visualization for K-means 
Figure 5: Visualization for DBSCAN 

In the results of t-SNE, we can see that many classes in K-means are mixed together and are related to each other, which we don’t want to see, because we want stocks in the same class to have strong correlation, and class and class low correlation. 

For DBSCAN, the effect is relatively good. It can be seen that there is a clear dividing line between the finally screened trading pairs, and the same category is close. 

Algorithm evaluation 

Through the above results, we believe that PCA+DBSCAN is the best method, and we will develop a trading strategy based on this method. 

Pair trading strategy 

Bollinger Bands 

Bollinger bands is a popular indicator being widely used in a lot of trading strategies. It’s normally consisted of two bands, a lower band and an upper one, and it’s calculated as the following: 

BOLU = MA(T P, n) + m ∗ σ[T P, n

BOLD = MA(T P, n) − m ∗ σ[T P, n


BOLU = Upper Bollinger Band 

BOLD = Lower Bollinger Band 

MA = Moving Average 

n = Number of days in smoothing period 

m = Number of Standard Deviations 

σ[T P, n] = Standard Deviation over last n periods of TP 

Here, the two key inputs are the moving average of stock price and the number of standard deviation. They are chosen based on individual’s preference, and one of the most commonly used bollinger bands is 2-sigma on 20 days moving average of stock price. 

Trading strategy utilizing Bollinger Bands 

Bollinger bands provides investors a sense on whether the market is overbought or oversold, i.e, whether a stock is traded at a level abovebelow its intrinsic or fair value. To use bollinger bands as an indicator we assume the market behaves in a mean reversion way. That is, when we observe the stock price deviates substantially from its average or mean price, then we expect it will eventually reverts back to mean in long-term. 

Thus, using mean reversion theorem as our fundamental assumption we are able to develop the trading strategy with bollinger bands. We calculate the simple moving average price of a stock, and derive its upper and lower bollinger bands with, say, 2 sigmas. The next step is to observe the trend of current stock price. When the price reaches upper band, we speculate that the stock is oversold because it deviates too much from its historical average, so we short the position and expect the price will fall in the future to approach mean. Reversely, when the price hits lower band then we know the price has fallen a lot. We long the position and hold it until the price rises and reverts back to the mean. When we observe the price fluctuates around its mean we closed out all positions we hold and stop trading until it hits either band. 

Above is an example of using Bollinger band to decide the buy/selling point. The ”sell zone” is located where the stock price hits the lower blue band, and we enter ”buy zone” when price reaches the upper blue band. We will cease any trading activities when the price hits neither of the two bands. 

Price Ratio Z-Score Bollinger Band 

Referring back to how bollinger band works, we desinged our own indicator named the Price Ratio Z Score bollinger band. Instead of stock price, we now take the simple moving average of price ratio S1/S2 as our new input. Then we compute its z-score using the formula Zi = (Xi−X¯)/Std. Z-score represents the number of standard deviations a datapoint is from the mean and it mimics the ”m” parameter cho sen in the original bollinger band cal

The next step is to determine a specific trading rule. We choose Z = ±1 as the buy/selling point and Z = (0.5, +0.5) as the time of closing. In detail, we short one share of S1 and long the exact money value of S2 when our calculated z-score hits the upper bollinger band of 1 sigma. At this point S1 outperforms S2 so we believe, according to the mean reversion assumption, that either price of S1 will fall or price of S2 will rise to the average level eventurally. Vice versa, when the z-score hits the lower band of -1 sigma we perform the opposite strategy. We long a share of S1 and sell the equivalent amount of S2 in terms of money value because we expect S1 to rise and S2 to fall. When the z-score fluctuates between the interval (-0.5,0.5) we close out all the positions we currently hold and stop any trading activities until the z-score hits ±1 again. Bellow attached the detail script of our trading strategy. 

for i in range(len(ratios)): 

if zscore[i] > zthreshhold_buysell: 

tmp = S1[i] – S2[i] * ratios[i] 

money += tmp 

countS1 -= 1 

countS2 += ratios[i] 

elif zscore[i] < -zthreshhold_buysell: 

tmp = S2[i] * ratios[i] – S1[i] 

money += tmp 

countS1 += 1 

countS2 -= ratios[i] 

elif abs(zscore[i]) < zthreshhold_clear: 

tmp = countS1*S1[i] + S2[i] * countS2 

if maximum_S1_needed < countS1: 

Dr. Igor Halperin on Reinforecement Learning & IRL For Investing

maximum_S1_needed = countS1 

if maximum_S2_needed < countS2: 

maximum_S2_needed = countS2 

if tmp > 0: 

gain_count += 1 

gain_amount += tmp 

gold_marker_gain[i] = zscore[i] 

elif tmp < 0: 

loss_count += 1 

loss_amount += tmp 

gold_marker_loss[i] = zscore[i] 

money += tmp 

countS1 = 0 

countS2 = 0 

tmp = countS1*S1[i] + S2[i] * countS2 

if tmp > 0: 

gain_count += 1 

gain_amount += tmp 


loss_count += 1 

loss_amount += tmp 

money += tmp 


We used SP500 as our stock pool and scraped historical stock data of all 500 stocks from 2005-01-01 to 2021-01-01. We used the first 70% of dataset, which is from 2005 to 2016, to train our model to pick the correlated stock pairs. We picked a total of 128 pairs of stocks from the 408 stocks draw from histrorical SP500 stock list. We then test our model on the rest 30% of dataset from 2016 to 2021 to check the strategy’s performance, and generated a total return of approximately $ 40000. 

Below is an example of plot showing the automated trading activity from a single stock. It shows the z-score of the stock from 2016-01 to 2021-01. The red and green dash lines represent the upper and lower band of 1 sigma. Once the z-score reaches the bands we then long/short the stock. We can also observe some triangles across the plot and they indicate the time that z-score falls into the ”no man’s land” interval and we will close out all positions and remain inactive. 

Below is another example of trading activities of a pair of correlated stocks. The red triangles signal a selling point and green as a buying point. We can clearly observe that S1 and S2 execute the opposite trading activity simultaneously. 


In the model, we apply different clustering and visualization methods in order to find proper pairs of stocks for the pair trading process. After logically analysing, we first apply PCA method to decrease the dimension of the stock data to 50, then cluster them with DBSCAN method and visulize the result based on T-SNE. The pairs are found with coint threshold value of 0.3 and a total of 128 pairs have been picked. At last, we established our trading strategy based on z-score of these stock pairs and generated a total return of approximately 40000 dollars with zero initial capital in 3 years. 

The biggest advantage of our strategy is that hedging is automatically done in the trading process under the assumption that temporary difference will converge in the future since we sell the overvalued security and purchase the undervalued security thereby limiting the chances of loss. Therefore, we will be able to mitigate the potential loss and risk during our trading process. 

However, our strategy makes great profits due to our high frequency of trading. While in reality, if considering transaction fees, our profits will not be so high. The strategy also highly rely on whether the pair of securities have high statistical correlation and sometimes historical correlation not works in the future. So in order to further improve our model, we will make more efforts on identifying and testing the pairs’ statistical relation and update our pairing process in a proper integral. 

References for Can machine learning help in trading? Machine Learning Pair-trading Strategy

[1] Zheng L, Wang S, Tian L, et al., Query-adaptive late fusion for image search and person re identification, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1741-1750. 

[2] Wang, Jieren, Camilo Rostoker, and Alan Wagner. ”A high performance pair trading application.” 2009 IEEE International Symposium on Parallel Distributed Processing. IEEE, 2009. 

[3] Chen, Cathy WS, et al. ”Pair trading based on quantile forecasting of smooth transition GARCH models.” The North American Journal of Economics and Finance 39 (2017): 38-55. 

[4] Lin, Tsai-Yu, Cathy WS Chen, and Fong-Yi Syu. ”Multi-asset pair-trading strategy: A statistical learning approach.” The North American Journal of Economics and Finance 55 (2021): 101295.

Can machine learning help in trading? Machine Learning Pair-trading Strategy