Trading Agricultural Commodities : Policy Gradient Learners in Trading Agricultural ETFs

Trading Agricultural Commodities : Policy Gradient Learners in Trading Agricultural ETFs

Abstract

Trading Agricultural Commodities : This paper analyzes the use of Reinforcement Learning in trading Agricultural ETFs. The first section of the paper examines the performance of using a naive trading strategy to be used in a later comparison with Reinforcement Learning strategies.

Here, returns were very volatile, with agents both severely under- and over-performing index benchmarks. The second part of this paper examines the use of the Monte Carlo Policy Gradient algorithm in trading.

Finally, results of performing Dynamic Time Warping are presented for side-by-side comparisons between naive trading and trading using reinforcement learning algorithms. Overall, reinforcement learning produces more volatile results with a positive Sharpe ratio as compared to naive trading, with negative average returns.

ETFs and Price History

There are seven Agricultural ETFs that are examined in this paper: Teucrium Corn Fund (CORN), Teucrium Soybean (SYB), iPath Series B Bloomberg Coffee Subindex (JO),  iPath Bloomberg Cocoa Subindex (NIB), Teucrium Sugar (CANE), iPath Series B Bloomberg Livestock Subindex (COW), and iPath Series B Bloomberg Sugar (SGG).

Price History

Figure 1: Price history of seven agricultural ETFs examined in the paper. The vertical line    represents Jan 30, 2020, when the World Health Organization declared COVID-10 as a public health emergency of international concern.

The coronavirus outbreak has prompted many stores and restaurants across the country to shut down, and less demand for produce has driven the prices of many agricultural ETFs to their lowest in a decade. From the historical price graph of agricultural ETFs, we can see that COW, SGG, JO and NIB have both higher prices and more volatile historical prices. Since the onset of coronavirus, this volatility has notably increased. The national lockdown caused by the coronavirus pandemic has greatly affected the supply and demand of the US food industry. However, the prices of CANE, SOYB and CORN have been relatively stable. SGG and COW exhibited the biggest percentage drops in price since the national lockdown; sugar demand dropped for the first time in four decades, and thousands of meat processing plants closed due to rapid falling in demand associated with COVID-19, global lockdowns and safety risks. The only exception seems to be JO, which reflects the returns that are potentially available through an unleveraged investment in the futures contracts on coffee. There has been a steady consumer demand for coffee despite the pandemic, which helped to lift the price of the commodity. 

Naive Trading Strategy and Results

This naive trading strategy utilizes a simple moving average (SMA) to determine whether the ETF is relatively over- or under-valued. Henceforth, “agent” will refer to the computer assuming/selling positions. The durations of the SMA for each of the ETFs for this paper were determined through empirical observation and are listed in the section below.

Every day after the initial training period (the duration of the SMA), the agent examines the pseudo-z-score (calculated as the number of standard deviations of the SMA that the price is above or below the SMA). If the pseudo-z-score exceeds a set “buy threshold”, then the agent assumes a short position. If it falls below a negative “buy threshold” then, by the same logic, a long position is taken. This is due to the assumption of mean reversion. 

If the agent holds positions, whenever the price of the ETF falls/rises back to within a threshold of the SMA (the “sell threshold”), then positions are liquidated as mean reversion, theoretically, will have occurred. In addition to this, after a certain number of days, if the proportion of backtesting elapsed exceeds a certain threshold (the “safe threshold”), then regardless of the ETFs relative price to the SMA, if a profit can be turned, positions are liquidated. At the end of the backtesting period, all positions are liquidated. 

Parameters Used

Buy/Sell Thresholds

The buy/sell thresholds are 1.5 and 0.1, respectively.

Safe Threshold

The safe threshold is 0.6

SMA Periods

The periods in which the simple moving average was measured are listed in the table below.

ETFSMA Period
CORN11
SOYB12
NIB11
JO10
CANE11
COW 11
SGG12

Results

The results of this trading strategy are displayed in the tables below.

2020 May

ETFROI (%)
CORN1.16E-14
SOYB0.146306
NIB8.452004
JO1.067073
CANE0
COW-2.38619
SGG4.831991

2020 April

ETFROI (%)
CORN-5.85443
SOYB-1.6703
NIB-4.10448
JO-7.97222
CANE7.472458
COW-2.66538
SGG2.404858

2020 March

ETFROI (%)
CORN-3.91433
SOYB-3.09704
NIB-16.169
JO25.44473
CANE-21.4386
COW-13.3188
SGG-22.6449

2020 February

ETFROI (%)
CORN-2.6994
SOYB3.036295
NIB2.902454
JO-0.80037
CANE0.94086
COW7.690418
SGG1.796407

2020 January

ETFROI (%)
CORN1.852516
SOYB-5.19577
NIB-9.09091
JO-17.5819
CANE5.483405
COW-7.67008
SGG10.16703

2019 December

ETFROI (%)
CORN3.270703
SOYB5.985205
NIB0.579876
JO-8.70233
CANE-3.84989
COW0.021853
SGG-3.78141
Trading Agricultural Commodities

2019 November

ETFROI (%)
CORN-4.40294
SOYB-2.42215
NIB-3.21305
JO-11.4807
CANE-0.90772
COW-1.23434
SGG-0.37613

2019 October

ETFROI (%)
CORN0.988142
SOYB-1.22502
NIB2.549087
JO1.445001
CANE2.074074
COW2.787016
SGG2.697352

2019 September

ETFROI (%)
CORN1.929704
SOYB3.179973
NIB-10.2703
JO-1.76829
CANE3.582555
COW3.592783
SGG3.535883

These returns aim to serve as a control to examine whether Reinforcement Learning has the potential to improve upon naive trading strategies. For instance, liquidation at the end of the backtesting period regardless of profits is not a truly viable nor a profitable strategy. However, it allows for an exploration of whether or not RL strategies can learn to avoid long holding periods, especially considering the volatility of agricultural ETF prices.

Reinforcement Learning Strategy and Results

The reinforcement learning strategy is based off the Monte Carlo Policy Gradient (REINFORCE) Algorithm: 

[IMAGE OF REINFORCE GOES HERE]

There are two distinct states in a RL trading environment: one in which the agent holds positions and the other in which the agent does not. Hence, two different neural networks were trained: one to decide which actions to take if the agent held positions and the other to decide which actions to take if the agent did not. The architecture of the neural networks were as follows.

Holding neural network:

6-Dimensional Inputs

3 Hidden Layers (128 neurons, 64 neurons, and 32 neurons)

2-Dimensional Outputs, using Categorical Cross Entropy Loss

Non-holding neural network:

6-Dimensional Inputs

3 Hidden Layers (128 neurons, 64 neurons, and 32 neurons)

3-Dimensional Outputs, using Categorical Cross Entropy Loss

Every epoch iterated over all 2 months of training. Every day, six inputs were found:

  1. The current pseudo-z-score, calculated as the number of standard deviations the price was from the SMA, of a constant type from the naive trading strategy
  2. Opening price
  3. Closing price
  4. High price
  5. Low price
  6. Volume

Every day, the agent would take actions depending on whether it was holding a position or not. If the agent was not holding a position, then the agent either randomly assumed a long/short position or did nothing. If the agent held a position, then the agent randomly either sold or held it.

Every state (with state being defined as the set of 6 inputs) was rewarded using a discounted monte-carlo rollout at the end of the epoch, with the original reward being the PnL of the transaction. For instance, if a position was bought at day 1 and sold at day 5 with an ROI of 0.1%, then the original (non-discounted) rewards for all of days 1 to 5 would have been constant at 0.1%. 

Every day in the testing period, the agent took actions depending on whether positions were being held or not. If positions were not held, then the inputs were fed into the non-holding neural network, and the agent either assumed a long/short position or did nothing. If positions were held, then inputs were fed into the holding neural network and the agent either liquidated its position or held it. 

Additionally, each position was given a lifespan of 10 days, at which point it was forcefully liquidated to ensure that the agent did not fall trap to bear/bull markets.

Finally, after 30 days, no new positions were assumed and only (if there was one), an existing position could be liquidated. At the end of the backtesting period, if positions were still held, then they were forcefully liquidated.

Results

2020 May

ETFROI (%)
CORN1.41309
SOYB0.43668
NIB5.71323
JO-10.4046*
CANE7.00535
COW6.17073
SGG0.54993
Trading Agricultural Commodities

2020 April

ETFROI (%)
CORN-5.06353*
SOYB-0.2859*
NIB11.98497
JO-9.75323*
CANE2.54775*
COW2.28002
SGG5.77425
Trading Agricultural Commodities

2020 March

ETFROI (%)
CORN-12.1037*
SOYB-7.6463*
NIB-11.8589*
JO3.80331
CANE-19.5684*
COW-27.0555*
SGG-25.621*
Trading Agricultural Commodities

2020 February

ETFROI (%)
CORN1.74968
SOYB-3.6904*
NIB-19.3575*
JO0.52194
CANE2.70891
COW2.66600
SGG-25.13598*
Trading Agricultural Commodities

2020 January

ETFROI (%)
CORN-3.0202*
SOYB-6.93132*
NIB4.83107
JO-16.7474
CANE3.26281
COW-8.96402*
SGG4.69009
Trading Agricultural Commodities

2019 December

ETFROI (%)
CORN0.74982
SOYB3.13507
NIB3.69027
JO12.36534
CANE2.65556
COW3.03188
SGG3.78695
Trading Agricultural Commodities

2019 November

ETFROI (%)
CORN-1.65563*
SOYB0.31799*
NIB-0.17759*
JO5.46503
CANE0.30257
COW-2.83181*
SGG1.16023
Trading Agricultural Commodities

2019 October

ETFROI (%)
CORN1.11247
SOYB1.20833
NIB5.55935
JO1.72112
CANE1.95534
COW2.74592
SGG-0.42069*

2019 September

ETFROI (%)
CORN4.44432
SOYB6.25951
NIB7.24048
JO7.61177
CANE0.77882
COW2.91964
SGG0.55090

Dynamic Time Warping Analysis

As a preface to this section, it must be noted that Reinforcement Learning is a convergence-based ML strategy. However, given the large computational power required to establish neural network convergence on such large datasets, it was infeasible to strive for convergence when conducting research for this paper. However, a general trend might still be established, as through observation, training epochs greater than 100 do not modify returns significantly. 

It is evident that the difference in performance between using the Monte Carlo Policy Gradient and Naive SMA trading on Agricultural ETFs is inconsistent. However, if the backtesting periods are separated into two distinct groups: one where training differed significantly from testing, and one where they were relatively similar, it is clear that RL performance can be attributed to the agent being exposed to scenarios during training that also appeared during testing.

Using Dynamic Time Warping (DTW), we can quantify and test this hypothesis. Dynamic Time Warping is a technique used to find an optimal mapping between two time series of equal or different lengths, by minimizing the sum of euclidean distances between the points that are mapped to each other. 

 The Euclidean sums are displayed in the tables below.

2020 May

ETFEuclidean Sum
CORN6481893
SOYB4024293
NIB826067.6
JO2792490
CANE771436.6
COW858250.7
SGG224817.8

2020 April

ETFEuclidean Sum
CORN1386559
SOYB459868.2
NIB492450.7
JO3481253
CANE640762.6
COW608944.8
SGG220466.8

2020 March

ETFEuclidean Sum
CORN1146737
SOYB546656
NIB1635924
JO2394711
CANE1475560
COW1034717
SGG203793.1

2020 February

ETFEuclidean Sum
CORN761561.7
SOYB579194.5
NIB4689589
JO2096867
CANE1274665
COW823848.4
SGG112240.7

2020 January

ETFEuclidean Sum
CORN660758
SOYB631382.4
NIB5509994
JO2512662
CANE1237405
COW683179.5
SGG141283.8

2019 December

ETFEuclidean Sum
CORN833179.7
SOYB610274.6
NIB3732158
JO2391850
CANE1094615
COW851692.8
SGG104660.6

2019 November

ETFEuclidean Sum
CORN2031015
SOYB603888.6
NIB4603125
JO1180082
CANE1012869
COW1257430
SGG113533.5

2019 October

ETFEuclidean Sum
CORN2031015
SOYB799728.6
NIB1708028
JO1283893
CANE749429.7
COW1038146
SGG85479.63

2019 September

ETFEuclidean Sum
CORN1697768
SOYB501927.8
NIB719955.3
JO1067927
CANE1030973
COW312421.6
SGG89126.32

Although it is difficult to establish a direct linear relationship between the sum of Euclidean Distances used to perform a DTW on the training/testing time series and the portfolio performance, upon examination of each of these tables it is clear that a general trend between the two exists. The greater the distance is when performing DTW, the worse the agent was able to apply learned strategies from the training environment on to the testing environment.

This is a fairly intuitive concept. The more the testing environment resembles the training environment, the better the agent will perform, with the caveat that overfitting must be prevented. Using traditional strategies to avoid overfitting such as cross-validation and regularization, as well as ensuring the training data contains a diverse set of price fluctuations could be key to better agent performance. This leads to several possible steps for further exploration.

Sharpe Ratio Analysis

Naive Sharpe RatioRL Sharpe Ratio
2020 May0.0326720.018024
2020 April-0.0239460.010360
2020 March-0.033432-0.092804
2020 February0.034427-0.00036351
2020 January-0.024714-0.00029913
2019 December-0.0162660.074547
2019 November-0.0670410.003969
2019 October0.0680190.066050
2019 September0.0036800.094986
Average Sharpe Ratio-0.00295566670.01209644444
Standard Deviation4.1646054376.010615039

Overall, even though both two strategies exhibit low Sharpe ratios, the Reinforcement Learning trading strategy has relatively higher and more volatile portfolio Sharpe ratios than the naïve trading strategy. The risk-adjusted return generated by the Reinforcement Learning trading strategy is much higher before January 2020. Specifically, from September 2019- January 2020, Reinforcement Learning strategy performs almost 8 times better than the naïve trading strategy. After January 2020, the Sharpe ratios generated by Reinforcement Learning strategy is much lower due to its high portfolio standard deviations, especially in March where most of the agricultural ETFs exhibited the biggest percentage drops in price caused by the coronavirus pandemic.

Before the coronavirus fueled the recession in the market, the Sharpe ratio of Reinforcement Learning strategy was around 0.059 on average, and it dropped to -0.026 after January 2020. While for the naïve strategy, the difference between the average Sharpe ratios before and after the coronavirus breakout is only less than 0.01%. With restaurants and schools shuttered during national lockdown, prices and demand for essential agricultural products has fallen. Over the past few years, farmers have already endured a slew of financial hardship from the U.S.-China trade war. As the coronavirus pandemic disrupts supply chains across the country, farmers were left with an abundance of food that they can’t sell. As a result, the prices of Agricultural ETFs dropped rapidly since January 2020, and by April 2020, many of the ETFs examined in the portfolio reached an all-time low. 

Economic Analysis of Trading Performance

Figures 2-8: Naive vs RL ROIs

Figure 9: Price movement of CORN, SOYB, NIB, and JO 

Figure 10: Price movement of COW, SGG, and CANE

The graphs provided illustrate the accuracy of both the naive trading model and the Reinforcement Learning (RL) in comparison to the actual pricing of the ETFs. In general, both models performed better in the first two months, with naive trading on average doing better than RL. Both models performed the best for COW and CANE and most poorly for JO and NIB. The two models also have the greatest variation in output for February and March. A brief economic analysis of market volatility in agriculture and the impact of fear associated with the SARS-CoV-2, or COVID-19, pandemic may help to contextualize the inconsistencies.

In December and January, market sentiment can explain the low performance of Reinforcement Learning. The U.S. and China reached a preliminary trade deal in mid-December, spurring some optimism in investors. In this deal, Beijing pledged to buy an additional $200B in goods/services, including agriculture, over the next two years, but doubts soon grew questioning whether or not China would uphold the deal. In the graphs, we see that the output for RL increased, but that of naive trading decreased for NIB during this period, and the outputs for CORN differed in magnitude of decrease. Naive trading in both of these cases performed better than RL. The initial increase in investments followed by the drop due to skepticism likely contributed to the already volatile agricultural ETF prices, limiting the performance of RL.

Another likely contributing factor to the model output accuracy for CORN includes the declining price of Ethanol in the U.S. Gulf around this time (with the exception of early January). Because most ethanol in the U.S. is produced from corn, the activity of ethanol production and sales plays into the prices of corn. On January 1st, crude oil prices went down 60% and corn by 10-15%. And, on January 9th ethanol production fell to its lowest level in a month, 1.062M barrels/day. While we should expect to see decreases in the performance of CORN in the naive trading and RL model ouputs, CORN decreases only in naive trading from December to January. 

As demonstrated in the variability of output between the two models, volatility and uncertainty sets in January 2020. The quick emergence of coronavirus instilled fear and perhaps even hysteria. That month, the U.S. caught wave of its first case, and those in China continued growing. Investors grew worried that the virus would disrupt the global economy, resulting in a deepened stock sell-off on the 31st of January. Note, however, that the increase in NIB as demonstrated by naive trading is accurate. The decrease of SGG, however, should be an increase. 

Trading Agricultural Commodities

February, on the other hand, experienced rising ROI, or at least a small decrease (CORN and SOYB), in almost every category, with the exception of NIB and SGG. But, the outputs of naive trading were accurate in increasing for NIB and SGG should also be increasing. This positive turn is likely a result of the panic-buying that began the end of the month and into March. Basic food and meats were stockpiled, resulting in price increases. Wholesale choice beef, for example, experienced a 25% increase in price despite the cost of production falling. This event is mirrored in the steep increase in COW. These prices were only temporary, however, lowering as calm sets in and panic-buying slows.

It is important to note that the extreme volatility, or overreactions, of the market can take weeks or months to move past. In the case of coronavirus, it seems that the overrections are a result of disconnect in commodity prices and the underlying market. As in the example above, beef prices are increasing while shelves are empty because of stockpiling. The underlying market shows a different story: the stock market crash began on February 20th, with the Dow Jones Average dropping from 29219.98 to 18591.93 by March 23rd. 

Price volatility in March likely served as the impetus for the great variability in model outputs for the ETFs, especially for JO. In the Naive ROI graph, there is a steep increase in JO’s percentage of ROI that is not seen in the other graphs. Mid-February is the beginning of price volatility for this ETF, returning to normal trends only in May. Mirroring the increase in the positive percent change of the ETF, the naive trading model is accurate to have a steep increase in the ROI of JO in March. However, the steep decrease in the RL model is inaccurate and likely a result of low-performance during high volatility.

CORN also had major discrepancies between the outputs of naive trading and RL models. In this case, however, naive trading decreases from February through to April, but RL has a steep decrease in March and a steep increase thereafter. As demonstrated in the actual prices of the ETF, naive trading performed better than the RL model as a consistent decrease is accurate. In reality, the price percent change of the ETF varied from about -7.0% to -24.0%.

Figure 11: COW ticker history

In general, the model outputs of the ETFs in April and May were more comparable and not as marked by coronavirus uncertainty. NIB, JO, and CANE displayed the most variability in model output during these two months.

Some increase in ROI those two months is expected as Consumer Price Index (CPI) in both food-at-home and food-away-from-home consumption increased. The CPI for food-at-home consumption increased 2.6% from March to April, and 1.0% from April to May (compared to only 0.5% from February to March), with food-at-home prices up 2.4% from last May. Increases for food-away-from-home consumption is 0.1% from March to April and 0.4% from April to May (compared to 0.2% from February to March), with food-away-from-home prices up 2.1% from last May. Because restaurants continue to face difficulty in opening due to social distance guidelines, food-away-from-home consumption pales in comparison to food eaten at home. 

Trading Agricultural Commodities

Figure 12: 12-month CPI change for All Urban Consumers

Still, variability in consumption continues to affect the performance of RL compared to naive trading. As outlined in the graph above, we are starting to see positive trends in CPI beginning in May. (Note that all urban consumers represent about 93% of the U.S. population.) February to March and March to April experienced the greatest drops in CPI, helping to explain the drops in ROI in March in most ETFs in all four models. The slowed decrease in CPI likely also contributed to the increase in nearly all ETFs in April.

Because much of the time period we tested the performance of naive trading and RL took place during extreme market volatility, the performance of RL especially seems to be impacted by this volatility. In times of extreme volatility, such as in February and March, RL performs more poorly than stable periods such as in September through November and May.

Future Steps

To improve performance for this particular strategy, there are two key areas which should be investigated.

Stationarity of Time Series Data

Simple Moving Average/Bollinger Bands strategies are most successful when mean reversion occurs naturally in time series. To test this, an Augmented Dickey-Fuller test might be employed.

An Augmented Dickey-Fuller Test (ADF) is, by definition, an augmentation of the number of lags in the original Dickey-Fuller Test. A Dickey-Fuller test is based on the model of an autoregressive process of order 1 (AR(1)):

Here, T is the number of observations in the time series. The ADF tests the null hypothesis that , which would indicate that a unit root is present and thus, the time series is not stationary, against the alternative hypothesis that which would indicate that the time series is stationary. The value comes from a wrapped normal distribution with an expected value of 0. The Augmented Dickey-Fuller test merely extends the number of lags to p, which in this paper was determined by minimizing the Akaike Information Criterion (AIC). 

Here, the null and alternative hypotheses are the same as the original Dickey-Fuller Test. The purpose of using the ADF is to include the autoregressive processes of orders greater than 1, which intuitively could more accurately time series.

One possible modification to the strategies outlined above is that in order to perform trades, stationarity for a calculated number of days must, by the ADF, exist. This would ensure that the agent is able to better use the price spread to more accurately assume positions.

Exit Strategy and Cutting Losses

To improve agent performance, instead of forcefully liquidating all positions at the end of a set holding period, it would be wise to instead either exit the market upon turning a profit at an earlier stage, or, with a clearly defined strategy to cut losses, maintain the positions for a longer period of time.

A neural network, for instance, might be able to identify the ideal period in which to perform a Mann-Kendall test to identify a decreasing/increasing trend. This would make it so that loss cutting strategies are executed better and do not impede on the agent’s long-holding-period strategies.

Areas for Further Analysis

There are a few key areas for further analysis. Firstly, given that Agricultural ETFs are so volatile in pricing, it would be important to investigate the inherent driving factors in price, on a daily basis, as they might provide neural networks with value in predicting movement.

Secondly, instead of training/testing neural networks on daily closing prices on monthly aggregates, intraday data might result in better performance.

Thirdly, given that neural networks perform poorly on testing sets that differ significantly in behaviour and value to training sets, it would be interesting to examine strategies to train neural networks to identify, during testing, whether the data it is being fed behaves similarly to the data it was trained on. This could help it cut losses at a much faster rate than it currently can.

Finally, an alternative set of inputs to the agent could include long-term driving factors of agricultural prices, which would provide long-term trading strategies. 

The prices of agricultural ETFs are largely driven by supply and demand balances.

For example, many investors are betting that as living standards rise in developing countries, the demand for more and higher quality food, especially meat, will grow, boosting prices. Agriculture ETF prices can also change based on trade agreements: when China stopped purchasing all soybeans from the US in 2018 in response to Trump’s escalation of the Sino-US trade war, the soybean ETF reached record lows as the US faced a glut in supply. In 2018, when American pig farmers were forced to pay pork tariffs to Mexico, the largest buyer of US pork by volume, the lean hogs ETF plummeted.

Trading Agricultural Commodities

The supply of agricultural products can also drastically change based on unforeseen natural causes such as weather, pests or disease. For example, earlier in 2020, the wheat ETF spiked after low harvests in Australia were announced due to poor weather, and after concerns that large swarms of locusts were traveling to southwest Asia, further threatening the supply of wheat. Additionally, in 2019, a deadly spell of African swine fever killed nearly half of China’s hogs. This sudden decrease in supply of hogs sent prices of the lean hog ETF plummeting, and also lowered the demand of soy, the main feed for hogs. Lastly, the demand for certain crops, while less variable, can also influence the prices of agriculture ETFs, as revealed by the Coronavirus pandemic.

By feeding information such as government trade agreement data, weather, and consumer demand indicators into a neural network, the learning engine might be able to detect direct relationships between these factors and long-term ETF price trends.

Trading Agricultural Commodities : Policy Gradient Learners in Trading Agricultural ETFs written by Alexander Fleiss, Vishal Dhileepan, Corina Perez-Cobb, Zoe Wang & Rohan Mehta