Trading Agricultural Commodities : Policy Gradient Learners in Trading Agricultural ETFs
Trading Agricultural Commodities : This paper analyzes the use of Reinforcement Learning in trading Agricultural ETFs. The first section of the paper examines the performance of using a naive trading strategy to be used in a later comparison with Reinforcement Learning strategies.
Here, returns were very volatile, with agents both severely under- and over-performing index benchmarks. The second part of this paper examines the use of the Monte Carlo Policy Gradient algorithm in trading.
Finally, results of performing Dynamic Time Warping are presented for side-by-side comparisons between naive trading and trading using reinforcement learning algorithms. Overall, reinforcement learning produces more volatile results with a positive Sharpe ratio as compared to naive trading, with negative average returns.
ETFs and Price History
There are seven Agricultural ETFs that are examined in this paper: Teucrium Corn Fund (CORN), Teucrium Soybean (SYB), iPath Series B Bloomberg Coffee Subindex (JO), iPath Bloomberg Cocoa Subindex (NIB), Teucrium Sugar (CANE), iPath Series B Bloomberg Livestock Subindex (COW), and iPath Series B Bloomberg Sugar (SGG).
Figure 1: Price history of seven agricultural ETFs examined in the paper. The vertical line represents Jan 30, 2020, when the World Health Organization declared COVID-10 as a public health emergency of international concern.
The coronavirus outbreak has prompted many stores and restaurants across the country to shut down, and less demand for produce has driven the prices of many agricultural ETFs to their lowest in a decade. From the historical price graph of agricultural ETFs, we can see that COW, SGG, JO and NIB have both higher prices and more volatile historical prices. Since the onset of coronavirus, this volatility has notably increased. The national lockdown caused by the coronavirus pandemic has greatly affected the supply and demand of the US food industry. However, the prices of CANE, SOYB and CORN have been relatively stable. SGG and COW exhibited the biggest percentage drops in price since the national lockdown; sugar demand dropped for the first time in four decades, and thousands of meat processing plants closed due to rapid falling in demand associated with COVID-19, global lockdowns and safety risks. The only exception seems to be JO, which reflects the returns that are potentially available through an unleveraged investment in the futures contracts on coffee. There has been a steady consumer demand for coffee despite the pandemic, which helped to lift the price of the commodity.
Naive Trading Strategy and Results
This naive trading strategy utilizes a simple moving average (SMA) to determine whether the ETF is relatively over- or under-valued. Henceforth, “agent” will refer to the computer assuming/selling positions. The durations of the SMA for each of the ETFs for this paper were determined through empirical observation and are listed in the section below.
Every day after the initial training period (the duration of the SMA), the agent examines the pseudo-z-score (calculated as the number of standard deviations of the SMA that the price is above or below the SMA). If the pseudo-z-score exceeds a set “buy threshold”, then the agent assumes a short position. If it falls below a negative “buy threshold” then, by the same logic, a long position is taken. This is due to the assumption of mean reversion.
If the agent holds positions, whenever the price of the ETF falls/rises back to within a threshold of the SMA (the “sell threshold”), then positions are liquidated as mean reversion, theoretically, will have occurred. In addition to this, after a certain number of days, if the proportion of backtesting elapsed exceeds a certain threshold (the “safe threshold”), then regardless of the ETFs relative price to the SMA, if a profit can be turned, positions are liquidated. At the end of the backtesting period, all positions are liquidated.
The buy/sell thresholds are 1.5 and 0.1, respectively.
The safe threshold is 0.6
The periods in which the simple moving average was measured are listed in the table below.
The results of this trading strategy are displayed in the tables below.
These returns aim to serve as a control to examine whether Reinforcement Learning has the potential to improve upon naive trading strategies. For instance, liquidation at the end of the backtesting period regardless of profits is not a truly viable nor a profitable strategy. However, it allows for an exploration of whether or not RL strategies can learn to avoid long holding periods, especially considering the volatility of agricultural ETF prices.
Reinforcement Learning Strategy and Results
The reinforcement learning strategy is based off the Monte Carlo Policy Gradient (REINFORCE) Algorithm:
[IMAGE OF REINFORCE GOES HERE]
There are two distinct states in a RL trading environment: one in which the agent holds positions and the other in which the agent does not. Hence, two different neural networks were trained: one to decide which actions to take if the agent held positions and the other to decide which actions to take if the agent did not. The architecture of the neural networks were as follows.
Holding neural network:
3 Hidden Layers (128 neurons, 64 neurons, and 32 neurons)
2-Dimensional Outputs, using Categorical Cross Entropy Loss
Non-holding neural network:
3 Hidden Layers (128 neurons, 64 neurons, and 32 neurons)
3-Dimensional Outputs, using Categorical Cross Entropy Loss
Every epoch iterated over all 2 months of training. Every day, six inputs were found:
- The current pseudo-z-score, calculated as the number of standard deviations the price was from the SMA, of a constant type from the naive trading strategy
- Opening price
- Closing price
- High price
- Low price
Every day, the agent would take actions depending on whether it was holding a position or not. If the agent was not holding a position, then the agent either randomly assumed a long/short position or did nothing. If the agent held a position, then the agent randomly either sold or held it.
Every state (with state being defined as the set of 6 inputs) was rewarded using a discounted monte-carlo rollout at the end of the epoch, with the original reward being the PnL of the transaction. For instance, if a position was bought at day 1 and sold at day 5 with an ROI of 0.1%, then the original (non-discounted) rewards for all of days 1 to 5 would have been constant at 0.1%.
Every day in the testing period, the agent took actions depending on whether positions were being held or not. If positions were not held, then the inputs were fed into the non-holding neural network, and the agent either assumed a long/short position or did nothing. If positions were held, then inputs were fed into the holding neural network and the agent either liquidated its position or held it.
Additionally, each position was given a lifespan of 10 days, at which point it was forcefully liquidated to ensure that the agent did not fall trap to bear/bull markets.
Finally, after 30 days, no new positions were assumed and only (if there was one), an existing position could be liquidated. At the end of the backtesting period, if positions were still held, then they were forcefully liquidated.
Dynamic Time Warping Analysis
As a preface to this section, it must be noted that Reinforcement Learning is a convergence-based ML strategy. However, given the large computational power required to establish neural network convergence on such large datasets, it was infeasible to strive for convergence when conducting research for this paper. However, a general trend might still be established, as through observation, training epochs greater than 100 do not modify returns significantly.
It is evident that the difference in performance between using the Monte Carlo Policy Gradient and Naive SMA trading on Agricultural ETFs is inconsistent. However, if the backtesting periods are separated into two distinct groups: one where training differed significantly from testing, and one where they were relatively similar, it is clear that RL performance can be attributed to the agent being exposed to scenarios during training that also appeared during testing.
Using Dynamic Time Warping (DTW), we can quantify and test this hypothesis. Dynamic Time Warping is a technique used to find an optimal mapping between two time series of equal or different lengths, by minimizing the sum of euclidean distances between the points that are mapped to each other.
The Euclidean sums are displayed in the tables below.
Although it is difficult to establish a direct linear relationship between the sum of Euclidean Distances used to perform a DTW on the training/testing time series and the portfolio performance, upon examination of each of these tables it is clear that a general trend between the two exists. The greater the distance is when performing DTW, the worse the agent was able to apply learned strategies from the training environment on to the testing environment.
This is a fairly intuitive concept. The more the testing environment resembles the training environment, the better the agent will perform, with the caveat that overfitting must be prevented. Using traditional strategies to avoid overfitting such as cross-validation and regularization, as well as ensuring the training data contains a diverse set of price fluctuations could be key to better agent performance. This leads to several possible steps for further exploration.
Sharpe Ratio Analysis
|Naive Sharpe Ratio||RL Sharpe Ratio|
|Average Sharpe Ratio||-0.0029556667||0.01209644444|
Overall, even though both two strategies exhibit low Sharpe ratios, the Reinforcement Learning trading strategy has relatively higher and more volatile portfolio Sharpe ratios than the naïve trading strategy. The risk-adjusted return generated by the Reinforcement Learning trading strategy is much higher before January 2020. Specifically, from September 2019- January 2020, Reinforcement Learning strategy performs almost 8 times better than the naïve trading strategy. After January 2020, the Sharpe ratios generated by Reinforcement Learning strategy is much lower due to its high portfolio standard deviations, especially in March where most of the agricultural ETFs exhibited the biggest percentage drops in price caused by the coronavirus pandemic.
Before the coronavirus fueled the recession in the market, the Sharpe ratio of Reinforcement Learning strategy was around 0.059 on average, and it dropped to -0.026 after January 2020. While for the naïve strategy, the difference between the average Sharpe ratios before and after the coronavirus breakout is only less than 0.01%. With restaurants and schools shuttered during national lockdown, prices and demand for essential agricultural products has fallen. Over the past few years, farmers have already endured a slew of financial hardship from the U.S.-China trade war. As the coronavirus pandemic disrupts supply chains across the country, farmers were left with an abundance of food that they can’t sell. As a result, the prices of Agricultural ETFs dropped rapidly since January 2020, and by April 2020, many of the ETFs examined in the portfolio reached an all-time low.
Economic Analysis of Trading Performance
Figures 2-8: Naive vs RL ROIs
Figure 9: Price movement of CORN, SOYB, NIB, and JO
Figure 10: Price movement of COW, SGG, and CANE
The graphs provided illustrate the accuracy of both the naive trading model and the Reinforcement Learning (RL) in comparison to the actual pricing of the ETFs. In general, both models performed better in the first two months, with naive trading on average doing better than RL. Both models performed the best for COW and CANE and most poorly for JO and NIB. The two models also have the greatest variation in output for February and March. A brief economic analysis of market volatility in agriculture and the impact of fear associated with the SARS-CoV-2, or COVID-19, pandemic may help to contextualize the inconsistencies.
In December and January, market sentiment can explain the low performance of Reinforcement Learning. The U.S. and China reached a preliminary trade deal in mid-December, spurring some optimism in investors. In this deal, Beijing pledged to buy an additional $200B in goods/services, including agriculture, over the next two years, but doubts soon grew questioning whether or not China would uphold the deal. In the graphs, we see that the output for RL increased, but that of naive trading decreased for NIB during this period, and the outputs for CORN differed in magnitude of decrease. Naive trading in both of these cases performed better than RL. The initial increase in investments followed by the drop due to skepticism likely contributed to the already volatile agricultural ETF prices, limiting the performance of RL.
Another likely contributing factor to the model output accuracy for CORN includes the declining price of Ethanol in the U.S. Gulf around this time (with the exception of early January). Because most ethanol in the U.S. is produced from corn, the activity of ethanol production and sales plays into the prices of corn. On January 1st, crude oil prices went down 60% and corn by 10-15%. And, on January 9th ethanol production fell to its lowest level in a month, 1.062M barrels/day. While we should expect to see decreases in the performance of CORN in the naive trading and RL model ouputs, CORN decreases only in naive trading from December to January.
As demonstrated in the variability of output between the two models, volatility and uncertainty sets in January 2020. The quick emergence of coronavirus instilled fear and perhaps even hysteria. That month, the U.S. caught wave of its first case, and those in China continued growing. Investors grew worried that the virus would disrupt the global economy, resulting in a deepened stock sell-off on the 31st of January. Note, however, that the increase in NIB as demonstrated by naive trading is accurate. The decrease of SGG, however, should be an increase.
Trading Agricultural Commodities
February, on the other hand, experienced rising ROI, or at least a small decrease (CORN and SOYB), in almost every category, with the exception of NIB and SGG. But, the outputs of naive trading were accurate in increasing for NIB and SGG should also be increasing. This positive turn is likely a result of the panic-buying that began the end of the month and into March. Basic food and meats were stockpiled, resulting in price increases. Wholesale choice beef, for example, experienced a 25% increase in price despite the cost of production falling. This event is mirrored in the steep increase in COW. These prices were only temporary, however, lowering as calm sets in and panic-buying slows.
It is important to note that the extreme volatility, or overreactions, of the market can take weeks or months to move past. In the case of coronavirus, it seems that the overrections are a result of disconnect in commodity prices and the underlying market. As in the example above, beef prices are increasing while shelves are empty because of stockpiling. The underlying market shows a different story: the stock market crash began on February 20th, with the Dow Jones Average dropping from 29219.98 to 18591.93 by March 23rd.
Price volatility in March likely served as the impetus for the great variability in model outputs for the ETFs, especially for JO. In the Naive ROI graph, there is a steep increase in JO’s percentage of ROI that is not seen in the other graphs. Mid-February is the beginning of price volatility for this ETF, returning to normal trends only in May. Mirroring the increase in the positive percent change of the ETF, the naive trading model is accurate to have a steep increase in the ROI of JO in March. However, the steep decrease in the RL model is inaccurate and likely a result of low-performance during high volatility.
CORN also had major discrepancies between the outputs of naive trading and RL models. In this case, however, naive trading decreases from February through to April, but RL has a steep decrease in March and a steep increase thereafter. As demonstrated in the actual prices of the ETF, naive trading performed better than the RL model as a consistent decrease is accurate. In reality, the price percent change of the ETF varied from about -7.0% to -24.0%.
Figure 11: COW ticker history
In general, the model outputs of the ETFs in April and May were more comparable and not as marked by coronavirus uncertainty. NIB, JO, and CANE displayed the most variability in model output during these two months.
Some increase in ROI those two months is expected as Consumer Price Index (CPI) in both food-at-home and food-away-from-home consumption increased. The CPI for food-at-home consumption increased 2.6% from March to April, and 1.0% from April to May (compared to only 0.5% from February to March), with food-at-home prices up 2.4% from last May. Increases for food-away-from-home consumption is 0.1% from March to April and 0.4% from April to May (compared to 0.2% from February to March), with food-away-from-home prices up 2.1% from last May. Because restaurants continue to face difficulty in opening due to social distance guidelines, food-away-from-home consumption pales in comparison to food eaten at home.
Trading Agricultural Commodities
Figure 12: 12-month CPI change for All Urban Consumers
Still, variability in consumption continues to affect the performance of RL compared to naive trading. As outlined in the graph above, we are starting to see positive trends in CPI beginning in May. (Note that all urban consumers represent about 93% of the U.S. population.) February to March and March to April experienced the greatest drops in CPI, helping to explain the drops in ROI in March in most ETFs in all four models. The slowed decrease in CPI likely also contributed to the increase in nearly all ETFs in April.
Because much of the time period we tested the performance of naive trading and RL took place during extreme market volatility, the performance of RL especially seems to be impacted by this volatility. In times of extreme volatility, such as in February and March, RL performs more poorly than stable periods such as in September through November and May.
To improve performance for this particular strategy, there are two key areas which should be investigated.
Stationarity of Time Series Data
Simple Moving Average/Bollinger Bands strategies are most successful when mean reversion occurs naturally in time series. To test this, an Augmented Dickey-Fuller test might be employed.
An Augmented Dickey-Fuller Test (ADF) is, by definition, an augmentation of the number of lags in the original Dickey-Fuller Test. A Dickey-Fuller test is based on the model of an autoregressive process of order 1 (AR(1)):
Here, T is the number of observations in the time series. The ADF tests the null hypothesis that , which would indicate that a unit root is present and thus, the time series is not stationary, against the alternative hypothesis that which would indicate that the time series is stationary. The value comes from a wrapped normal distribution with an expected value of 0. The Augmented Dickey-Fuller test merely extends the number of lags to p, which in this paper was determined by minimizing the Akaike Information Criterion (AIC).
Here, the null and alternative hypotheses are the same as the original Dickey-Fuller Test. The purpose of using the ADF is to include the autoregressive processes of orders greater than 1, which intuitively could more accurately time series.
One possible modification to the strategies outlined above is that in order to perform trades, stationarity for a calculated number of days must, by the ADF, exist. This would ensure that the agent is able to better use the price spread to more accurately assume positions.
Exit Strategy and Cutting Losses
To improve agent performance, instead of forcefully liquidating all positions at the end of a set holding period, it would be wise to instead either exit the market upon turning a profit at an earlier stage, or, with a clearly defined strategy to cut losses, maintain the positions for a longer period of time.
A neural network, for instance, might be able to identify the ideal period in which to perform a Mann-Kendall test to identify a decreasing/increasing trend. This would make it so that loss cutting strategies are executed better and do not impede on the agent’s long-holding-period strategies.
Areas for Further Analysis
There are a few key areas for further analysis. Firstly, given that Agricultural ETFs are so volatile in pricing, it would be important to investigate the inherent driving factors in price, on a daily basis, as they might provide neural networks with value in predicting movement.
Secondly, instead of training/testing neural networks on daily closing prices on monthly aggregates, intraday data might result in better performance.
Thirdly, given that neural networks perform poorly on testing sets that differ significantly in behaviour and value to training sets, it would be interesting to examine strategies to train neural networks to identify, during testing, whether the data it is being fed behaves similarly to the data it was trained on. This could help it cut losses at a much faster rate than it currently can.
Finally, an alternative set of inputs to the agent could include long-term driving factors of agricultural prices, which would provide long-term trading strategies.
The prices of agricultural ETFs are largely driven by supply and demand balances.
For example, many investors are betting that as living standards rise in developing countries, the demand for more and higher quality food, especially meat, will grow, boosting prices. Agriculture ETF prices can also change based on trade agreements: when China stopped purchasing all soybeans from the US in 2018 in response to Trump’s escalation of the Sino-US trade war, the soybean ETF reached record lows as the US faced a glut in supply. In 2018, when American pig farmers were forced to pay pork tariffs to Mexico, the largest buyer of US pork by volume, the lean hogs ETF plummeted.
Trading Agricultural Commodities
The supply of agricultural products can also drastically change based on unforeseen natural causes such as weather, pests or disease. For example, earlier in 2020, the wheat ETF spiked after low harvests in Australia were announced due to poor weather, and after concerns that large swarms of locusts were traveling to southwest Asia, further threatening the supply of wheat. Additionally, in 2019, a deadly spell of African swine fever killed nearly half of China’s hogs. This sudden decrease in supply of hogs sent prices of the lean hog ETF plummeting, and also lowered the demand of soy, the main feed for hogs. Lastly, the demand for certain crops, while less variable, can also influence the prices of agriculture ETFs, as revealed by the Coronavirus pandemic.
By feeding information such as government trade agreement data, weather, and consumer demand indicators into a neural network, the learning engine might be able to detect direct relationships between these factors and long-term ETF price trends.