Scientific Computing

profilesruthi Manalikar
final_pre_QichaoHe.pptx

Reproduction of A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem

Qichao HE

1

Objective

Reinforcement Learning framework to provide a deep machine learning solution to the portfolio management problem.

A novel topology of Ensemble of Identical Independent Evaluators (EIIE).

Experiment on crypto-currency assets.

The objective of this paper is to…

The author proposed a novel approach which is called

New asset class crypto assets

2

Dataset

12 crypto currencies price data.

Data fetched from Poloniex exchange.

Selection based on trading volume.

Bitcoin treated as cash, all other assets are denoted in BTC.

Data period: 2014-07-01 to 2017-04-27.

Flat fake price-movements (0 decay rates) are used to fill the missing data points.

Dataset they choose,

Treat for missing data point

3

Dataset

Price Tensor is of shape (3, 50, 11)

3 features are feed to the model: high, low, close.

50 periods: 50*30mins = 1 day and 1 hour.

11 assets.

BTC is always one as it is denoted as cash

4

Model & Algorithm

The key characteristic of EIIE is divide the network into small subnet for each asset. (Therefore, we have 11 subnets for 11 assets).

They are joined in the output layer, which is a softmax layer.

Each subnets will individually evaluate the timeseries and learn the pattern of the price series of an individual asset.

However network parameters are shared among the subnets.

Three reinforcement learning models: CNN, RNN, LSTM

Softmax

5

Model & Algorithm

Convolutional Neural Network (CNN)

Model & Algorithm

Recurrent Neural Network(RNN) and Long/Short Term Memory(LSTM)

Model & Algorithm

CNN implementation

Model & Algorithm

RNN & LSTM implementation

Model & Algorithm

Optimization Algorithm

Online Stochastic Batch Learning

Update

Weight Matrix

Results

All three models were able to achieve at least 4-fold returns in 20 days (green boxes)

What they did not say in the article (red boxes)

Results

Performance

Results

Interpretation of low SHARPE ratio

Highly correlated assets

Overfitting the data

Volatility not in the reward function

Steps of Reproduction: Download the data

Poloniex API: https://docs.poloniex.com/#returnchartdata

Reproduction Result: Train the model

Author has provided the source code on Github Repo: https://github.com/ZhengyaoJiang/PGPortfolio

Build with Tensorflow

Reproduction Result: YES and NO

Their result (left) and my reproduction (right).

Random seed not known.

Hyperparameter unstable.

16

Reproduction Result: Extend the backtest

Extend the period (the green line).

Bad performance if we extend the backtest to AUG. 2018

Reflection & Further Research

Conceptually hard to match price patterns. (Auto-Driving V.S. Technical Trading)

Alternative data.

Modify reward function to incorporate risk.

A3C Model and share the weight matrix entirely.

Incorporate more prior knowledge.

And always… MORE DATA.

You can teach a child to drive a car, however you can find many Harvard MIT graduates on a trading floor. The conceptual difficulty is not on the same level.

In theory you can approximate any function if given enough data.

18

Summary

Interesting to apply reinforcement learning in Computational Finance.

Novel approach of EIIE.

Needs to address overfitting issue.

Q&A

Questions?