Machine Learning For Trading
Machine Learning For Trading In multi-period trading with realistic market impact, determining the dynamic trading strategy that optimizes expected utility of final wealth is a hard problem.
In this paper we show that, with an appropriate choice of the reward function, reinforcement learning techniques (specifically, Q-learning) can successfully handle the risk-averse case.
We provide a proof of concept in the form of a simulated market which permits a statistical arbitrage even with trading costs.
The Q-learning agent finds and exploits this arbitrage.
We show how machine learning can be applied to the problem of discovering and implementing dynamic trading strategies. Moreover in the presence of transaction costs.
In reinforcement learning, agents learn how to choose actions in order to optimize a multi-period cumulative “reward” which has the same mathematical form as if marginal wealth is the reward.
The investor cannot directly control future changes in wealth; rather, the investor makes trading decisions or portfolio-selection decisions which affect the probability distribution of future wealth. Furthermore, in effect, the investor chooses which lottery to play.
In the theory of financial decision-making a lottery is any random variable with units of wealth.
Moreover, the generalized meaning of the word ‘lottery” due to Pratt (1964), any investment is a lottery. Playing the lottery results in a risk, defined as any random increment to one’s total wealth.
In conclusion, the lottery could have a positive mean, in which case some investors would pay to play it. Whereas if it has a zero mean then any risk-averse investor would pay an amount called the risk premium to remove the risk from their portfolio.
Gordon Ritter is a Professor at NYU Courant and Tandon, Baruch College, and Columbia, and an elite buy-side quantitative trader / portfolio manager (selected as Buy-Side Quant of the Year, 2019).