Scientific Computing
Reproduction of A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem
Qichao HE
1
Objective
Reinforcement Learning framework to provide a deep machine learning solution to the portfolio management problem.
A novel topology of Ensemble of Identical Independent Evaluators (EIIE).
Experiment on crypto-currency assets.
The objective of this paper is to…
The author proposed a novel approach which is called
New asset class crypto assets
2
Dataset
12 crypto currencies price data.
Data fetched from Poloniex exchange.
Selection based on trading volume.
Bitcoin treated as cash, all other assets are denoted in BTC.
Data period: 2014-07-01 to 2017-04-27.
Flat fake price-movements (0 decay rates) are used to fill the missing data points.
Dataset they choose,
Treat for missing data point
3
Dataset
Price Tensor is of shape (3, 50, 11)
3 features are feed to the model: high, low, close.
50 periods: 50*30mins = 1 day and 1 hour.
11 assets.
BTC is always one as it is denoted as cash
4
Model & Algorithm
The key characteristic of EIIE is divide the network into small subnet for each asset. (Therefore, we have 11 subnets for 11 assets).
They are joined in the output layer, which is a softmax layer.
Each subnets will individually evaluate the timeseries and learn the pattern of the price series of an individual asset.
However network parameters are shared among the subnets.
Three reinforcement learning models: CNN, RNN, LSTM
Softmax
5
Model & Algorithm
Convolutional Neural Network (CNN)
Model & Algorithm
Recurrent Neural Network(RNN) and Long/Short Term Memory(LSTM)
Model & Algorithm
CNN implementation
Model & Algorithm
RNN & LSTM implementation
Model & Algorithm
Optimization Algorithm
Online Stochastic Batch Learning
Update
Weight Matrix
Results
All three models were able to achieve at least 4-fold returns in 20 days (green boxes)
What they did not say in the article (red boxes)
Results
Performance
Results
Interpretation of low SHARPE ratio
Highly correlated assets
Overfitting the data
Volatility not in the reward function
Steps of Reproduction: Download the data
Poloniex API: https://docs.poloniex.com/#returnchartdata
Reproduction Result: Train the model
Author has provided the source code on Github Repo: https://github.com/ZhengyaoJiang/PGPortfolio
Build with Tensorflow
Reproduction Result: YES and NO
Their result (left) and my reproduction (right).
Random seed not known.
Hyperparameter unstable.
16
Reproduction Result: Extend the backtest
Extend the period (the green line).
Bad performance if we extend the backtest to AUG. 2018
Reflection & Further Research
Conceptually hard to match price patterns. (Auto-Driving V.S. Technical Trading)
Alternative data.
Modify reward function to incorporate risk.
A3C Model and share the weight matrix entirely.
Incorporate more prior knowledge.
And always… MORE DATA.
You can teach a child to drive a car, however you can find many Harvard MIT graduates on a trading floor. The conceptual difficulty is not on the same level.
In theory you can approximate any function if given enough data.
18
Summary
Interesting to apply reinforcement learning in Computational Finance.
Novel approach of EIIE.
Needs to address overfitting issue.
Q&A
Questions?