Dynamic Portfolio Management with Deep Reinforcement Learning (Portfolio of Fidelity® Funds)

DRL is used to provide dynamic asset allocation for a portfolio

Investment Industry
Reinforcement Learning
A2C
PPO
OpenAI Gym
finrl
stable baselines3
pyfolio
Author

Kobus Esterhuysen

Published

November 19, 2021

1. Problem: Dynamic Asset Allocation

Asset allocation refers to the partitioning of available funds in a portfolio among various investment products. Investment products may be categorized in multiple ways. One common (and high level) classification contains three classes:

  • Stocks
  • Bonds (fixed income)
  • Cash and cash equivalents

Other classification systems may sub-classify each of these categories into subclasses, for example stocks into US and International stocks, or into small-cap, medium-cap, and large-cap stocks, etc.

Another aspect of asset allocation is concerns the underlying strategy. Some strategies are:

  • Tactical asset allocation
  • Strategic asset allocation
  • Constant weight asset allocation
  • Integrated asset allocation
  • Insured asset allocation
  • Dynamic asset allocation

We will focus on the last of these strategies: Dynamic asset allocation. With this approach the combination of assets are adjusted at regular intervals to capitalize on the strengthening and weakening of the economy and rise and fall of markets. This strategy depends on the decisions of a portfolio manager. In this project, however, we will attempt to replace the portfolio manager with an AI agent. The agent will be trained on past market behavior by means of Deep Reinforcement Learning.

For this Proof-Of-Concept (POC) project we will make things as simple as possible. The portfolio will consist of 9 Fidelity mutual funds. We will use the DJIA as a handy reference against which the performance of our agent’s dynamic behavior can be compared. Our agent will have the opportunity to adjust the mix of these funds on a daily basis. In practice, a 401K agent, for example, might be setup to make monthly adjustments to reduce transation costs or to avoid constraints related to trading. For simplicity, trading costs will not be taken into account for now.

We have selected a mutual fund from a number of fund categories:

  • Large Value
    • Fidelity® Blue Chip Value Fund (FBCVX)
  • Small/Mid Value
    • Fidelity® Mid-Cap Value Fund (FSMVX)
  • Income-Oriented
    • Fidelity® Dividend Growth Fund (FDGFX)
  • Large Blend
    • Fidelity® US Low Volatility Equity Fund (FULVX)
  • Small/Mid Blend
    • Fidelity® Stock Selector Small-Cap Fund (FDSCX)
  • Go-Anywhere
    • Fidelity® Capital Appreciation Fund (FDCAX)
  • Large Growth
    • Fidelity® Growth Discovery Fund (FDSVX)
  • Small/Mid Growth
    • Fidelity® Small-Cap Growth Fund (FCPGX)
  • Diversifiers
    • Fidelity® Founders Fund (FIFNX)

Our agent will have $ 1,000,000 when the project starts.

To implement this POC we will make use of the FinRL framework as well as some Yahoo technology to acquire financial data.

2. Solution Proposal

Investment decisions are sequential by nature. Furthermore, an optimal decision in the present may turn out not to be optimal over the longer term. Then there are the complexities of the investment landscape like varying market conditions, political disruptive events, and other economic uncertainties. The ideal tool for this kind of problem is Reinforcement Learning.

The solution requires the setup of a digital twin for the investor’s portfolio. In RL terms this model of the portfolio is called an environment. The environment contains states which are modified by applying actions to it.

We will choose the following state vector (measured daily in our case) for the environment:

\[ \Large \begin{aligned} s_1 &= \text{Value of FBCVX holdings} \\ s_2 &= \text{Value of FSMVX holdings} \\ s_3 &= \text{Value of FDGFX holdings} \\ \text{...} \\ s_8 &= \text{Value of FCPGX holdings} \\ s_9 &= \text{Value of FIFNX holdings} \end{aligned} \]

The following action vector (applied daily in our case) will be setup to influence the environment/portfolio:

\[ \Large \begin{aligned} a_1 &= \text{Fraction to be invested in FBCVX this cycle} \\ a_2 &= \text{Fraction to be invested in FSMVX this cycle} \\ a_3 &= \text{Fraction to be invested in FDGFX this cycle} \\ \text{...} \\ a_8 &= \text{Fraction to be invested in FCPGX this cycle} \\ a_9 &= \text{Fraction to be invested in FIFNX this cycle} \end{aligned} \]

All action values are in the [0, 1] interval.

The reward r is given by

\[ \Large \begin{aligned} r(s,a,s') &= log(v'/v) \end{aligned} \]

where v and v’ are the portfolio value at states \(s'\) and \(s\) respectively.

The model of the portfolio/environment will have the following parameters:

\[ \Large \begin{aligned} \theta_1 &= \text{Initial Amount} \\ \theta_2 &= \text{Transaction Cost Pct} \\ \theta_3 &= \text{Size of State and Action Spaces} \\ \theta_4 &= \text{Reward Scaling} \\ \theta_5 &= \text{Technical Indicator List} \end{aligned} \]

3. Implementation of the Solution

To implement the environment we use the OpenAI Gym tools. We will use the FinRL python library to implement the agent. For a function approximator for the agent, two algorithms will be investigated:

  • Advantage Actor-Critic (A2C)
  • Proximal Policy Optimization (PPO)

This implementation allows the agent to allocate the investor’s funds between the 30 DJIA instruments. The goal is to maximize net worth at the end of the investment horizon.

The plotly library will be used for visualization. The code will run on the Google Colab platform. To start with, we install the python packages needed.

# hide
# install plotly and finrl library
!pip install plotly==4.4.1
!wget https://github.com/plotly/orca/releases/download/v1.2.1/orca-1.2.1-x86_64.AppImage -O /usr/local/bin/orca
!chmod +x /usr/local/bin/orca
!apt-get install xvfb libgtk2.0-0 libgconf-2-4
!pip install git+https://github.com/AI4Finance-LLC/FinRL-Library.git
!pip install PyPortfolioOpt
Requirement already satisfied: plotly==4.4.1 in /usr/local/lib/python3.7/dist-packages (4.4.1)
Requirement already satisfied: retrying>=1.3.3 in /usr/local/lib/python3.7/dist-packages (from plotly==4.4.1) (1.3.3)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from plotly==4.4.1) (1.15.0)
--2021-11-19 15:08:13--  https://github.com/plotly/orca/releases/download/v1.2.1/orca-1.2.1-x86_64.AppImage
Resolving github.com (github.com)... 140.82.121.3
Connecting to github.com (github.com)|140.82.121.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/99037241/9dc3a580-286a-11e9-8a21-4312b7c8a512?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20211119%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20211119T150813Z&X-Amz-Expires=300&X-Amz-Signature=b952476c497cf1f16fd402204acd4317bb511e91a51cd8c2ac92419cd447cded&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=99037241&response-content-disposition=attachment%3B%20filename%3Dorca-1.2.1-x86_64.AppImage&response-content-type=application%2Foctet-stream [following]
--2021-11-19 15:08:13--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/99037241/9dc3a580-286a-11e9-8a21-4312b7c8a512?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20211119%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20211119T150813Z&X-Amz-Expires=300&X-Amz-Signature=b952476c497cf1f16fd402204acd4317bb511e91a51cd8c2ac92419cd447cded&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=99037241&response-content-disposition=attachment%3B%20filename%3Dorca-1.2.1-x86_64.AppImage&response-content-type=application%2Foctet-stream
Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.109.133, ...
Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 51607939 (49M) [application/octet-stream]
Saving to: ‘/usr/local/bin/orca’

/usr/local/bin/orca 100%[===================>]  49.22M   160MB/s    in 0.3s    

2021-11-19 15:08:15 (160 MB/s) - ‘/usr/local/bin/orca’ saved [51607939/51607939]

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  gconf-service gconf-service-backend gconf2-common libdbus-glib-1-2
  libgail-common libgail18 libgtk2.0-bin libgtk2.0-common
Suggested packages:
  gvfs
The following NEW packages will be installed:
  gconf-service gconf-service-backend gconf2-common libdbus-glib-1-2
  libgail-common libgail18 libgconf-2-4 libgtk2.0-0 libgtk2.0-bin
  libgtk2.0-common xvfb
0 upgraded, 11 newly installed, 0 to remove and 37 not upgraded.
Need to get 3,715 kB of archives.
After this operation, 17.2 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/main amd64 libdbus-glib-1-2 amd64 0.110-2 [58.3 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 gconf2-common all 3.2.6-4ubuntu1 [700 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libgconf-2-4 amd64 3.2.6-4ubuntu1 [84.8 kB]
Get:4 http://archive.ubuntu.com/ubuntu bionic/universe amd64 gconf-service-backend amd64 3.2.6-4ubuntu1 [58.1 kB]
Get:5 http://archive.ubuntu.com/ubuntu bionic/universe amd64 gconf-service amd64 3.2.6-4ubuntu1 [2,036 B]
Get:6 http://archive.ubuntu.com/ubuntu bionic/main amd64 libgtk2.0-common all 2.24.32-1ubuntu1 [125 kB]
Get:7 http://archive.ubuntu.com/ubuntu bionic/main amd64 libgtk2.0-0 amd64 2.24.32-1ubuntu1 [1,769 kB]
Get:8 http://archive.ubuntu.com/ubuntu bionic/main amd64 libgail18 amd64 2.24.32-1ubuntu1 [14.2 kB]
Get:9 http://archive.ubuntu.com/ubuntu bionic/main amd64 libgail-common amd64 2.24.32-1ubuntu1 [112 kB]
Get:10 http://archive.ubuntu.com/ubuntu bionic/main amd64 libgtk2.0-bin amd64 2.24.32-1ubuntu1 [7,536 B]
Get:11 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 xvfb amd64 2:1.19.6-1ubuntu4.9 [784 kB]
Fetched 3,715 kB in 1s (2,780 kB/s)
Selecting previously unselected package libdbus-glib-1-2:amd64.
(Reading database ... 155219 files and directories currently installed.)
Preparing to unpack .../00-libdbus-glib-1-2_0.110-2_amd64.deb ...
Unpacking libdbus-glib-1-2:amd64 (0.110-2) ...
Selecting previously unselected package gconf2-common.
Preparing to unpack .../01-gconf2-common_3.2.6-4ubuntu1_all.deb ...
Unpacking gconf2-common (3.2.6-4ubuntu1) ...
Selecting previously unselected package libgconf-2-4:amd64.
Preparing to unpack .../02-libgconf-2-4_3.2.6-4ubuntu1_amd64.deb ...
Unpacking libgconf-2-4:amd64 (3.2.6-4ubuntu1) ...
Selecting previously unselected package gconf-service-backend.
Preparing to unpack .../03-gconf-service-backend_3.2.6-4ubuntu1_amd64.deb ...
Unpacking gconf-service-backend (3.2.6-4ubuntu1) ...
Selecting previously unselected package gconf-service.
Preparing to unpack .../04-gconf-service_3.2.6-4ubuntu1_amd64.deb ...
Unpacking gconf-service (3.2.6-4ubuntu1) ...
Selecting previously unselected package libgtk2.0-common.
Preparing to unpack .../05-libgtk2.0-common_2.24.32-1ubuntu1_all.deb ...
Unpacking libgtk2.0-common (2.24.32-1ubuntu1) ...
Selecting previously unselected package libgtk2.0-0:amd64.
Preparing to unpack .../06-libgtk2.0-0_2.24.32-1ubuntu1_amd64.deb ...
Unpacking libgtk2.0-0:amd64 (2.24.32-1ubuntu1) ...
Selecting previously unselected package libgail18:amd64.
Preparing to unpack .../07-libgail18_2.24.32-1ubuntu1_amd64.deb ...
Unpacking libgail18:amd64 (2.24.32-1ubuntu1) ...
Selecting previously unselected package libgail-common:amd64.
Preparing to unpack .../08-libgail-common_2.24.32-1ubuntu1_amd64.deb ...
Unpacking libgail-common:amd64 (2.24.32-1ubuntu1) ...
Selecting previously unselected package libgtk2.0-bin.
Preparing to unpack .../09-libgtk2.0-bin_2.24.32-1ubuntu1_amd64.deb ...
Unpacking libgtk2.0-bin (2.24.32-1ubuntu1) ...
Selecting previously unselected package xvfb.
Preparing to unpack .../10-xvfb_2%3a1.19.6-1ubuntu4.9_amd64.deb ...
Unpacking xvfb (2:1.19.6-1ubuntu4.9) ...
Setting up gconf2-common (3.2.6-4ubuntu1) ...

Creating config file /etc/gconf/2/path with new version
Setting up libgtk2.0-common (2.24.32-1ubuntu1) ...
Setting up libdbus-glib-1-2:amd64 (0.110-2) ...
Setting up xvfb (2:1.19.6-1ubuntu4.9) ...
Setting up libgconf-2-4:amd64 (3.2.6-4ubuntu1) ...
Setting up libgtk2.0-0:amd64 (2.24.32-1ubuntu1) ...
Setting up libgail18:amd64 (2.24.32-1ubuntu1) ...
Setting up libgail-common:amd64 (2.24.32-1ubuntu1) ...
Setting up libgtk2.0-bin (2.24.32-1ubuntu1) ...
Setting up gconf-service-backend (3.2.6-4ubuntu1) ...
Setting up gconf-service (3.2.6-4ubuntu1) ...
Processing triggers for libc-bin (2.27-3ubuntu1.3) ...
/sbin/ldconfig.real: /usr/local/lib/python3.7/dist-packages/ideep4py/lib/libmkldnn.so.0 is not a symbolic link

Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Collecting git+https://github.com/AI4Finance-LLC/FinRL-Library.git
  Cloning https://github.com/AI4Finance-LLC/FinRL-Library.git to /tmp/pip-req-build-_8zb7h83
  Running command git clone -q https://github.com/AI4Finance-LLC/FinRL-Library.git /tmp/pip-req-build-_8zb7h83
Collecting pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2
  Cloning https://github.com/quantopian/pyfolio.git to /tmp/pip-install-13hbve6s/pyfolio_603330aa9d534670a4e38df6b5e676c2
  Running command git clone -q https://github.com/quantopian/pyfolio.git /tmp/pip-install-13hbve6s/pyfolio_603330aa9d534670a4e38df6b5e676c2
Collecting elegantrl@ git+https://github.com/AI4Finance-Foundation/ElegantRL.git#egg=elegantrl
  Cloning https://github.com/AI4Finance-Foundation/ElegantRL.git to /tmp/pip-install-13hbve6s/elegantrl_8915e4c5e3124658acb8713a9aeff6f2
  Running command git clone -q https://github.com/AI4Finance-Foundation/ElegantRL.git /tmp/pip-install-13hbve6s/elegantrl_8915e4c5e3124658acb8713a9aeff6f2
Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.7/dist-packages (from finrl==0.3.3) (1.19.5)
Requirement already satisfied: pandas>=1.1.5 in /usr/local/lib/python3.7/dist-packages (from finrl==0.3.3) (1.1.5)
Collecting stockstats
  Downloading stockstats-0.3.2-py2.py3-none-any.whl (13 kB)
Collecting yfinance
  Downloading yfinance-0.1.66-py2.py3-none-any.whl (25 kB)
Collecting elegantrl
  Downloading elegantrl-0.3.2-py3-none-any.whl (73 kB)
     |████████████████████████████████| 73 kB 1.7 MB/s 
Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from finrl==0.3.3) (3.2.2)
Requirement already satisfied: scikit-learn>=0.21.0 in /usr/local/lib/python3.7/dist-packages (from finrl==0.3.3) (1.0.1)
Requirement already satisfied: gym>=0.17 in /usr/local/lib/python3.7/dist-packages (from finrl==0.3.3) (0.17.3)
Collecting stable-baselines3[extra]
  Downloading stable_baselines3-1.3.0-py3-none-any.whl (174 kB)
     |████████████████████████████████| 174 kB 23.8 MB/s 
Collecting ray[default]
  Downloading ray-1.8.0-cp37-cp37m-manylinux2014_x86_64.whl (54.7 MB)
     |████████████████████████████████| 54.7 MB 44 kB/s 
Collecting lz4
  Downloading lz4-3.1.3-cp37-cp37m-manylinux2010_x86_64.whl (1.8 MB)
     |████████████████████████████████| 1.8 MB 38.1 MB/s 
Collecting tensorboardX
  Downloading tensorboardX-2.4-py2.py3-none-any.whl (124 kB)
     |████████████████████████████████| 124 kB 23.8 MB/s 
Collecting gputil
  Downloading GPUtil-1.4.0.tar.gz (5.5 kB)
Collecting trading_calendars
  Downloading trading_calendars-2.1.1.tar.gz (108 kB)
     |████████████████████████████████| 108 kB 45.5 MB/s 
Collecting alpaca_trade_api
  Downloading alpaca_trade_api-1.4.1-py3-none-any.whl (36 kB)
Collecting ccxt
  Downloading ccxt-1.61.57-py2.py3-none-any.whl (2.1 MB)
     |████████████████████████████████| 2.1 MB 39.7 MB/s 
Collecting jqdatasdk
  Downloading jqdatasdk-1.8.10-py3-none-any.whl (153 kB)
     |████████████████████████████████| 153 kB 49.0 MB/s 
Collecting wrds
  Downloading wrds-3.1.1-py3-none-any.whl (12 kB)
Requirement already satisfied: pytest in /usr/local/lib/python3.7/dist-packages (from finrl==0.3.3) (3.6.4)
Requirement already satisfied: setuptools>=41.4.0 in /usr/local/lib/python3.7/dist-packages (from finrl==0.3.3) (57.4.0)
Requirement already satisfied: wheel>=0.33.6 in /usr/local/lib/python3.7/dist-packages (from finrl==0.3.3) (0.37.0)
Collecting pre-commit
  Downloading pre_commit-2.15.0-py2.py3-none-any.whl (191 kB)
     |████████████████████████████████| 191 kB 37.6 MB/s 
Collecting pybullet
  Downloading pybullet-3.2.0-cp37-cp37m-manylinux1_x86_64.whl (89.3 MB)
     |████████████████████████████████| 89.3 MB 69 kB/s 
Requirement already satisfied: torch in /usr/local/lib/python3.7/dist-packages (from elegantrl@ git+https://github.com/AI4Finance-Foundation/ElegantRL.git#egg=elegantrl->finrl==0.3.3) (1.10.0+cu111)
Requirement already satisfied: opencv-python in /usr/local/lib/python3.7/dist-packages (from elegantrl@ git+https://github.com/AI4Finance-Foundation/ElegantRL.git#egg=elegantrl->finrl==0.3.3) (4.1.2.30)
Collecting box2d-py
  Downloading box2d_py-2.3.8-cp37-cp37m-manylinux1_x86_64.whl (448 kB)
     |████████████████████████████████| 448 kB 20.5 MB/s 
Requirement already satisfied: ipython>=3.2.3 in /usr/local/lib/python3.7/dist-packages (from pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (5.5.0)
Requirement already satisfied: pytz>=2014.10 in /usr/local/lib/python3.7/dist-packages (from pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (2018.9)
Requirement already satisfied: scipy>=0.14.0 in /usr/local/lib/python3.7/dist-packages (from pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (1.4.1)
Requirement already satisfied: seaborn>=0.7.1 in /usr/local/lib/python3.7/dist-packages (from pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (0.11.2)
Collecting empyrical>=0.5.0
  Downloading empyrical-0.5.5.tar.gz (52 kB)
     |████████████████████████████████| 52 kB 1.3 MB/s 
Requirement already satisfied: pandas-datareader>=0.2 in /usr/local/lib/python3.7/dist-packages (from empyrical>=0.5.0->pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (0.9.0)
Requirement already satisfied: cloudpickle<1.7.0,>=1.2.0 in /usr/local/lib/python3.7/dist-packages (from gym>=0.17->finrl==0.3.3) (1.3.0)
Requirement already satisfied: pyglet<=1.5.0,>=1.4.0 in /usr/local/lib/python3.7/dist-packages (from gym>=0.17->finrl==0.3.3) (1.5.0)
Requirement already satisfied: pygments in /usr/local/lib/python3.7/dist-packages (from ipython>=3.2.3->pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (2.6.1)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.7/dist-packages (from ipython>=3.2.3->pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (0.7.5)
Requirement already satisfied: simplegeneric>0.8 in /usr/local/lib/python3.7/dist-packages (from ipython>=3.2.3->pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (0.8.1)
Requirement already satisfied: decorator in /usr/local/lib/python3.7/dist-packages (from ipython>=3.2.3->pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (4.4.2)
Requirement already satisfied: pexpect in /usr/local/lib/python3.7/dist-packages (from ipython>=3.2.3->pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (4.8.0)
Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.7/dist-packages (from ipython>=3.2.3->pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (5.1.1)
Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.4 in /usr/local/lib/python3.7/dist-packages (from ipython>=3.2.3->pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (1.0.18)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->finrl==0.3.3) (2.8.2)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->finrl==0.3.3) (1.3.2)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->finrl==0.3.3) (0.11.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->finrl==0.3.3) (2.4.7)
Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.7/dist-packages (from pandas-datareader>=0.2->empyrical>=0.5.0->pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (2.23.0)
Requirement already satisfied: lxml in /usr/local/lib/python3.7/dist-packages (from pandas-datareader>=0.2->empyrical>=0.5.0->pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (4.2.6)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.7/dist-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython>=3.2.3->pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (0.2.5)
Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.7/dist-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython>=3.2.3->pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (1.15.0)
Requirement already satisfied: future in /usr/local/lib/python3.7/dist-packages (from pyglet<=1.5.0,>=1.4.0->gym>=0.17->finrl==0.3.3) (0.16.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->pandas-datareader>=0.2->empyrical>=0.5.0->pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (2021.10.8)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->pandas-datareader>=0.2->empyrical>=0.5.0->pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->pandas-datareader>=0.2->empyrical>=0.5.0->pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (2.10)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->pandas-datareader>=0.2->empyrical>=0.5.0->pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (1.24.3)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.21.0->finrl==0.3.3) (1.1.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.21.0->finrl==0.3.3) (3.0.0)
Requirement already satisfied: msgpack==1.0.2 in /usr/local/lib/python3.7/dist-packages (from alpaca_trade_api->finrl==0.3.3) (1.0.2)
Collecting websocket-client<2,>=0.56.0
  Downloading websocket_client-1.2.1-py2.py3-none-any.whl (52 kB)
     |████████████████████████████████| 52 kB 1.4 MB/s 
Collecting websockets<10,>=8.0
  Downloading websockets-9.1-cp37-cp37m-manylinux2010_x86_64.whl (103 kB)
     |████████████████████████████████| 103 kB 49.4 MB/s 
Collecting PyYAML==5.4.1
  Downloading PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636 kB)
     |████████████████████████████████| 636 kB 49.6 MB/s 
Collecting aiohttp==3.7.4
  Downloading aiohttp-3.7.4-cp37-cp37m-manylinux2014_x86_64.whl (1.3 MB)
     |████████████████████████████████| 1.3 MB 39.1 MB/s 
Requirement already satisfied: typing-extensions>=3.6.5 in /usr/local/lib/python3.7/dist-packages (from aiohttp==3.7.4->alpaca_trade_api->finrl==0.3.3) (3.10.0.2)
Collecting yarl<2.0,>=1.0
  Downloading yarl-1.7.2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (271 kB)
     |████████████████████████████████| 271 kB 41.2 MB/s 
Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp==3.7.4->alpaca_trade_api->finrl==0.3.3) (21.2.0)
Collecting async-timeout<4.0,>=3.0
  Downloading async_timeout-3.0.1-py3-none-any.whl (8.2 kB)
Collecting multidict<7.0,>=4.5
  Downloading multidict-5.2.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (160 kB)
     |████████████████████████████████| 160 kB 47.1 MB/s 
Collecting cryptography>=2.6.1
  Downloading cryptography-35.0.0-cp36-abi3-manylinux_2_24_x86_64.whl (3.5 MB)
     |████████████████████████████████| 3.5 MB 38.1 MB/s 
Collecting ccxt
  Downloading ccxt-1.61.56-py2.py3-none-any.whl (2.1 MB)
     |████████████████████████████████| 2.1 MB 40.5 MB/s 
  Downloading ccxt-1.61.55-py2.py3-none-any.whl (2.1 MB)
     |████████████████████████████████| 2.1 MB 35.5 MB/s 
  Downloading ccxt-1.61.54-py2.py3-none-any.whl (2.1 MB)
     |████████████████████████████████| 2.1 MB 33.5 MB/s 
  Downloading ccxt-1.61.53-py2.py3-none-any.whl (2.1 MB)
     |████████████████████████████████| 2.1 MB 31.3 MB/s 
  Downloading ccxt-1.61.52-py2.py3-none-any.whl (2.1 MB)
     |████████████████████████████████| 2.1 MB 25.7 MB/s 
  Downloading ccxt-1.61.51-py2.py3-none-any.whl (2.1 MB)
     |████████████████████████████████| 2.1 MB 27.5 MB/s 
Collecting aiodns>=1.1.1
  Downloading aiodns-3.0.0-py3-none-any.whl (5.0 kB)
Collecting yarl<2.0,>=1.0
  Downloading yarl-1.6.3-cp37-cp37m-manylinux2014_x86_64.whl (294 kB)
     |████████████████████████████████| 294 kB 45.1 MB/s 
Collecting pycares>=4.0.0
  Downloading pycares-4.1.2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (291 kB)
     |████████████████████████████████| 291 kB 44.1 MB/s 
Requirement already satisfied: cffi>=1.12 in /usr/local/lib/python3.7/dist-packages (from cryptography>=2.6.1->ccxt->finrl==0.3.3) (1.15.0)
Requirement already satisfied: pycparser in /usr/local/lib/python3.7/dist-packages (from cffi>=1.12->cryptography>=2.6.1->ccxt->finrl==0.3.3) (2.21)
Collecting thriftpy2>=0.3.9
  Downloading thriftpy2-0.4.14.tar.gz (361 kB)
     |████████████████████████████████| 361 kB 45.8 MB/s 
Requirement already satisfied: SQLAlchemy>=1.2.8 in /usr/local/lib/python3.7/dist-packages (from jqdatasdk->finrl==0.3.3) (1.4.26)
Collecting pymysql>=0.7.6
  Downloading PyMySQL-1.0.2-py3-none-any.whl (43 kB)
     |████████████████████████████████| 43 kB 2.1 MB/s 
Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.7/dist-packages (from SQLAlchemy>=1.2.8->jqdatasdk->finrl==0.3.3) (1.1.2)
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from SQLAlchemy>=1.2.8->jqdatasdk->finrl==0.3.3) (4.8.2)
Collecting ply<4.0,>=3.4
  Downloading ply-3.11-py2.py3-none-any.whl (49 kB)
     |████████████████████████████████| 49 kB 5.7 MB/s 
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->SQLAlchemy>=1.2.8->jqdatasdk->finrl==0.3.3) (3.6.0)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.7/dist-packages (from pexpect->ipython>=3.2.3->pyfolio@ git+https://github.com/quantopian/pyfolio.git#egg=pyfolio-0.9.2->finrl==0.3.3) (0.7.0)
Collecting identify>=1.0.0
  Downloading identify-2.4.0-py2.py3-none-any.whl (98 kB)
     |████████████████████████████████| 98 kB 6.6 MB/s 
Collecting cfgv>=2.0.0
  Downloading cfgv-3.3.1-py2.py3-none-any.whl (7.3 kB)
Collecting virtualenv>=20.0.8
  Downloading virtualenv-20.10.0-py2.py3-none-any.whl (5.6 MB)
     |████████████████████████████████| 5.6 MB 15.4 MB/s 
Collecting nodeenv>=0.11.1
  Downloading nodeenv-1.6.0-py2.py3-none-any.whl (21 kB)
Requirement already satisfied: toml in /usr/local/lib/python3.7/dist-packages (from pre-commit->finrl==0.3.3) (0.10.2)
Collecting backports.entry-points-selectable>=1.0.4
  Downloading backports.entry_points_selectable-1.1.1-py2.py3-none-any.whl (6.2 kB)
Collecting distlib<1,>=0.3.1
  Downloading distlib-0.3.3-py2.py3-none-any.whl (496 kB)
     |████████████████████████████████| 496 kB 34.4 MB/s 
Collecting platformdirs<3,>=2
  Downloading platformdirs-2.4.0-py3-none-any.whl (14 kB)
Requirement already satisfied: filelock<4,>=3.2 in /usr/local/lib/python3.7/dist-packages (from virtualenv>=20.0.8->pre-commit->finrl==0.3.3) (3.3.2)
Requirement already satisfied: py>=1.5.0 in /usr/local/lib/python3.7/dist-packages (from pytest->finrl==0.3.3) (1.11.0)
Requirement already satisfied: more-itertools>=4.0.0 in /usr/local/lib/python3.7/dist-packages (from pytest->finrl==0.3.3) (8.11.0)
Requirement already satisfied: atomicwrites>=1.0 in /usr/local/lib/python3.7/dist-packages (from pytest->finrl==0.3.3) (1.4.0)
Requirement already satisfied: pluggy<0.8,>=0.5 in /usr/local/lib/python3.7/dist-packages (from pytest->finrl==0.3.3) (0.7.1)
Requirement already satisfied: grpcio>=1.28.1 in /usr/local/lib/python3.7/dist-packages (from ray[default]->finrl==0.3.3) (1.41.1)
Requirement already satisfied: jsonschema in /usr/local/lib/python3.7/dist-packages (from ray[default]->finrl==0.3.3) (2.6.0)
Requirement already satisfied: protobuf>=3.15.3 in /usr/local/lib/python3.7/dist-packages (from ray[default]->finrl==0.3.3) (3.17.3)
Collecting redis>=3.5.0
  Downloading redis-4.0.1-py3-none-any.whl (118 kB)
     |████████████████████████████████| 118 kB 51.7 MB/s 
Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.7/dist-packages (from ray[default]->finrl==0.3.3) (7.1.2)
Collecting colorful
  Downloading colorful-0.5.4-py2.py3-none-any.whl (201 kB)
     |████████████████████████████████| 201 kB 54.1 MB/s 
Collecting aiohttp-cors
  Downloading aiohttp_cors-0.7.0-py3-none-any.whl (27 kB)
Collecting aioredis<2
  Downloading aioredis-1.3.1-py3-none-any.whl (65 kB)
     |████████████████████████████████| 65 kB 3.7 MB/s 
Requirement already satisfied: prometheus-client>=0.7.1 in /usr/local/lib/python3.7/dist-packages (from ray[default]->finrl==0.3.3) (0.12.0)
Collecting opencensus
  Downloading opencensus-0.8.0-py2.py3-none-any.whl (128 kB)
     |████████████████████████████████| 128 kB 49.9 MB/s 
Collecting py-spy>=0.2.0
  Downloading py_spy-0.3.11-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (3.0 MB)
     |████████████████████████████████| 3.0 MB 37.7 MB/s 
Collecting gpustat>=1.0.0b1
  Downloading gpustat-1.0.0b1.tar.gz (82 kB)
     |████████████████████████████████| 82 kB 205 kB/s 
Collecting hiredis
  Downloading hiredis-2.0.0-cp37-cp37m-manylinux2010_x86_64.whl (85 kB)
     |████████████████████████████████| 85 kB 3.6 MB/s 
Requirement already satisfied: nvidia-ml-py3>=7.352.0 in /usr/local/lib/python3.7/dist-packages (from gpustat>=1.0.0b1->ray[default]->finrl==0.3.3) (7.352.0)
Requirement already satisfied: psutil in /usr/local/lib/python3.7/dist-packages (from gpustat>=1.0.0b1->ray[default]->finrl==0.3.3) (5.4.8)
Collecting blessed>=1.17.1
  Downloading blessed-1.19.0-py2.py3-none-any.whl (57 kB)
     |████████████████████████████████| 57 kB 5.4 MB/s 
Collecting deprecated
  Downloading Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)
Requirement already satisfied: wrapt<2,>=1.10 in /usr/local/lib/python3.7/dist-packages (from deprecated->redis>=3.5.0->ray[default]->finrl==0.3.3) (1.13.3)
Requirement already satisfied: google-api-core<3.0.0,>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from opencensus->ray[default]->finrl==0.3.3) (1.26.3)
Collecting opencensus-context==0.1.2
  Downloading opencensus_context-0.1.2-py2.py3-none-any.whl (4.4 kB)
Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]->finrl==0.3.3) (1.53.0)
Requirement already satisfied: packaging>=14.3 in /usr/local/lib/python3.7/dist-packages (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]->finrl==0.3.3) (21.2)
Requirement already satisfied: google-auth<2.0dev,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]->finrl==0.3.3) (1.35.0)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.7/dist-packages (from google-auth<2.0dev,>=1.21.1->google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]->finrl==0.3.3) (4.7.2)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from google-auth<2.0dev,>=1.21.1->google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]->finrl==0.3.3) (4.2.4)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from google-auth<2.0dev,>=1.21.1->google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]->finrl==0.3.3) (0.2.8)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.7/dist-packages (from pyasn1-modules>=0.2.1->google-auth<2.0dev,>=1.21.1->google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]->finrl==0.3.3) (0.4.8)
Requirement already satisfied: tabulate in /usr/local/lib/python3.7/dist-packages (from ray[default]->finrl==0.3.3) (0.8.9)
Requirement already satisfied: tensorboard>=2.2.0 in /usr/local/lib/python3.7/dist-packages (from stable-baselines3[extra]->finrl==0.3.3) (2.7.0)
Requirement already satisfied: atari-py~=0.2.0 in /usr/local/lib/python3.7/dist-packages (from stable-baselines3[extra]->finrl==0.3.3) (0.2.9)
Requirement already satisfied: pillow in /usr/local/lib/python3.7/dist-packages (from stable-baselines3[extra]->finrl==0.3.3) (7.1.2)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->stable-baselines3[extra]->finrl==0.3.3) (1.8.0)
Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->stable-baselines3[extra]->finrl==0.3.3) (0.6.1)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->stable-baselines3[extra]->finrl==0.3.3) (0.4.6)
Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->stable-baselines3[extra]->finrl==0.3.3) (1.0.1)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->stable-baselines3[extra]->finrl==0.3.3) (3.3.4)
Requirement already satisfied: absl-py>=0.4 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->stable-baselines3[extra]->finrl==0.3.3) (0.12.0)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard>=2.2.0->stable-baselines3[extra]->finrl==0.3.3) (1.3.0)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard>=2.2.0->stable-baselines3[extra]->finrl==0.3.3) (3.1.1)
Collecting int-date>=0.1.7
  Downloading int_date-0.1.8-py2.py3-none-any.whl (5.0 kB)
Requirement already satisfied: toolz in /usr/local/lib/python3.7/dist-packages (from trading_calendars->finrl==0.3.3) (0.11.2)
Collecting psycopg2-binary
  Downloading psycopg2_binary-2.9.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
     |████████████████████████████████| 3.0 MB 35.8 MB/s 
Collecting mock
  Downloading mock-4.0.3-py3-none-any.whl (28 kB)
Requirement already satisfied: multitasking>=0.0.7 in /usr/local/lib/python3.7/dist-packages (from yfinance->finrl==0.3.3) (0.0.9)
Collecting lxml
  Downloading lxml-4.6.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (6.3 MB)
     |████████████████████████████████| 6.3 MB 35.1 MB/s 
Building wheels for collected packages: finrl, elegantrl, pyfolio, empyrical, gputil, thriftpy2, gpustat, trading-calendars
  Building wheel for finrl (setup.py) ... done
  Created wheel for finrl: filename=finrl-0.3.3-py3-none-any.whl size=3883630 sha256=662917b3f91e52d062105b1d2f7ddc9f7ab5cceefe0453efe4a37ebab6729222
  Stored in directory: /tmp/pip-ephem-wheel-cache-9r816opf/wheels/17/ff/bd/1bc602a0352762b0b24041b88536d803ae343ed0a711fcf55e
  Building wheel for elegantrl (setup.py) ... done
  Created wheel for elegantrl: filename=elegantrl-0.3.2-py3-none-any.whl size=50821 sha256=64c6d226d2441d45fe9a3a5ecc257bd7a065a065aa35c65d09b3e3e9e3e3ee80
  Stored in directory: /tmp/pip-ephem-wheel-cache-9r816opf/wheels/99/85/5e/86cb3a9f47adfca5e248295e93113e1b298d60883126d62c84
  Building wheel for pyfolio (setup.py) ... done
  Created wheel for pyfolio: filename=pyfolio-0.9.2+75.g4b901f6-py3-none-any.whl size=75775 sha256=a01954834fc5c0737be96daba28c9ac343e27b75ec3f2d961580433166b87459
  Stored in directory: /tmp/pip-ephem-wheel-cache-9r816opf/wheels/ef/09/e5/2c1bf37c050d22557c080deb1be986d06424627c04aeca19b9
  Building wheel for empyrical (setup.py) ... done
  Created wheel for empyrical: filename=empyrical-0.5.5-py3-none-any.whl size=39777 sha256=1dec0f7852cd8e70d9a9a80eaefcfc572188a74192a53679f308a7e007775dfe
  Stored in directory: /root/.cache/pip/wheels/d9/91/4b/654fcff57477efcf149eaca236da2fce991526cbab431bf312
  Building wheel for gputil (setup.py) ... done
  Created wheel for gputil: filename=GPUtil-1.4.0-py3-none-any.whl size=7411 sha256=2cacf5d9b0ad9b32607ccdfd40a496273512ba9fae4c6a2450ee241a2c161b9e
  Stored in directory: /root/.cache/pip/wheels/6e/f8/83/534c52482d6da64622ddbf72cd93c35d2ef2881b78fd08ff0c
  Building wheel for thriftpy2 (setup.py) ... done
  Created wheel for thriftpy2: filename=thriftpy2-0.4.14-cp37-cp37m-linux_x86_64.whl size=940450 sha256=3b6d699a3655c695c8b0dcc0d22c98ce9ae6e849a03240901d68f7e6fba3bb6c
  Stored in directory: /root/.cache/pip/wheels/2a/f5/49/9c0d851aa64b58db72883cf9393cc824d536bdf13f5c83cff4
  Building wheel for gpustat (setup.py) ... done
  Created wheel for gpustat: filename=gpustat-1.0.0b1-py3-none-any.whl size=15979 sha256=3a7931f5b120d315acb6982e455236505f32446780cf48bc985f71d8c31492c0
  Stored in directory: /root/.cache/pip/wheels/1a/16/e2/3e2437fba4c4b6a97a97bd96fce5d14e66cff5c4966fb1cc8c
  Building wheel for trading-calendars (setup.py) ... done
  Created wheel for trading-calendars: filename=trading_calendars-2.1.1-py3-none-any.whl size=140937 sha256=6710d84917b035adc8a0a8735e0c859773676c0d1d1353b1a14c6bdd075cf3e8
  Stored in directory: /root/.cache/pip/wheels/62/9c/d1/46a21e1b99e064cba79b85e9f95e6a208ac5ba4c29ae5962ec
Successfully built finrl elegantrl pyfolio empyrical gputil thriftpy2 gpustat trading-calendars
Installing collected packages: multidict, yarl, lxml, deprecated, async-timeout, redis, PyYAML, pycares, ply, platformdirs, opencensus-context, hiredis, distlib, blessed, backports.entry-points-selectable, aiohttp, websockets, websocket-client, virtualenv, thriftpy2, tensorboardX, stable-baselines3, ray, pymysql, pybullet, py-spy, psycopg2-binary, opencensus, nodeenv, mock, int-date, identify, gpustat, empyrical, cryptography, colorful, cfgv, box2d-py, aioredis, aiohttp-cors, aiodns, yfinance, wrds, trading-calendars, stockstats, pyfolio, pre-commit, lz4, jqdatasdk, gputil, elegantrl, ccxt, alpaca-trade-api, finrl
  Attempting uninstall: lxml
    Found existing installation: lxml 4.2.6
    Uninstalling lxml-4.2.6:
      Successfully uninstalled lxml-4.2.6
  Attempting uninstall: PyYAML
    Found existing installation: PyYAML 3.13
    Uninstalling PyYAML-3.13:
      Successfully uninstalled PyYAML-3.13
Successfully installed PyYAML-5.4.1 aiodns-3.0.0 aiohttp-3.7.4 aiohttp-cors-0.7.0 aioredis-1.3.1 alpaca-trade-api-1.4.1 async-timeout-3.0.1 backports.entry-points-selectable-1.1.1 blessed-1.19.0 box2d-py-2.3.8 ccxt-1.61.51 cfgv-3.3.1 colorful-0.5.4 cryptography-35.0.0 deprecated-1.2.13 distlib-0.3.3 elegantrl-0.3.2 empyrical-0.5.5 finrl-0.3.3 gpustat-1.0.0b1 gputil-1.4.0 hiredis-2.0.0 identify-2.4.0 int-date-0.1.8 jqdatasdk-1.8.10 lxml-4.6.4 lz4-3.1.3 mock-4.0.3 multidict-5.2.0 nodeenv-1.6.0 opencensus-0.8.0 opencensus-context-0.1.2 platformdirs-2.4.0 ply-3.11 pre-commit-2.15.0 psycopg2-binary-2.9.2 py-spy-0.3.11 pybullet-3.2.0 pycares-4.1.2 pyfolio-0.9.2+75.g4b901f6 pymysql-1.0.2 ray-1.8.0 redis-4.0.1 stable-baselines3-1.3.0 stockstats-0.3.2 tensorboardX-2.4 thriftpy2-0.4.14 trading-calendars-2.1.1 virtualenv-20.10.0 websocket-client-1.2.1 websockets-9.1 wrds-3.1.1 yarl-1.6.3 yfinance-0.1.66
Collecting PyPortfolioOpt
  Downloading PyPortfolioOpt-1.5.1-py3-none-any.whl (61 kB)
     |████████████████████████████████| 61 kB 3.4 MB/s 
Requirement already satisfied: pandas>=0.19 in /usr/local/lib/python3.7/dist-packages (from PyPortfolioOpt) (1.1.5)
Requirement already satisfied: numpy<2.0,>=1.12 in /usr/local/lib/python3.7/dist-packages (from PyPortfolioOpt) (1.19.5)
Requirement already satisfied: scipy<2.0,>=1.3 in /usr/local/lib/python3.7/dist-packages (from PyPortfolioOpt) (1.4.1)
Collecting cvxpy<2.0.0,>=1.1.10
  Downloading cvxpy-1.1.17-cp37-cp37m-manylinux_2_24_x86_64.whl (2.8 MB)
     |████████████████████████████████| 2.8 MB 10.2 MB/s 
Requirement already satisfied: ecos>=2 in /usr/local/lib/python3.7/dist-packages (from cvxpy<2.0.0,>=1.1.10->PyPortfolioOpt) (2.0.7.post1)
Requirement already satisfied: scs>=1.1.6 in /usr/local/lib/python3.7/dist-packages (from cvxpy<2.0.0,>=1.1.10->PyPortfolioOpt) (2.1.4)
Requirement already satisfied: osqp>=0.4.1 in /usr/local/lib/python3.7/dist-packages (from cvxpy<2.0.0,>=1.1.10->PyPortfolioOpt) (0.6.2.post0)
Requirement already satisfied: qdldl in /usr/local/lib/python3.7/dist-packages (from osqp>=0.4.1->cvxpy<2.0.0,>=1.1.10->PyPortfolioOpt) (0.1.5.post0)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.19->PyPortfolioOpt) (2.8.2)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.19->PyPortfolioOpt) (2018.9)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas>=0.19->PyPortfolioOpt) (1.15.0)
Installing collected packages: cvxpy, PyPortfolioOpt
  Attempting uninstall: cvxpy
    Found existing installation: cvxpy 1.0.31
    Uninstalling cvxpy-1.0.31:
      Successfully uninstalled cvxpy-1.0.31
Successfully installed PyPortfolioOpt-1.5.1 cvxpy-1.1.17

Import the packages needed:

import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('Agg')
%matplotlib inline
import datetime
from finrl.apps import config
from finrl.neo_finrl.preprocessor.yahoodownloader import YahooDownloader
from finrl.neo_finrl.preprocessor.preprocessors import FeatureEngineer, data_split
from finrl.neo_finrl.env_portfolio_allocation.env_portfolio import StockPortfolioEnv
from finrl.drl_agents.stablebaselines3.models import DRLAgent
from finrl.plot import backtest_stats, backtest_plot, get_daily_return, get_baseline,convert_daily_return_to_pyfolio_ts

import gym
from gym.utils import seeding
from gym import spaces
from stable_baselines3.common.vec_env import DummyVecEnv

from pyfolio import timeseries
import plotly
import plotly.graph_objs as go
/usr/local/lib/python3.7/dist-packages/pyfolio/pos.py:27: UserWarning: Module "zipline.assets" not found; multipliers will not be applied to position notionals.
  'Module "zipline.assets" not found; multipliers will not be applied'
import sys
sys.path.append("../FinRL-Library")
# hide
pd.set_option('display.max_rows', 100)

Setup some directories:

import os
if not os.path.exists("./" + config.DATA_SAVE_DIR):
    os.makedirs("./" + config.DATA_SAVE_DIR)
if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
    os.makedirs("./" + config.TRAINED_MODEL_DIR)
if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
    os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
if not os.path.exists("./" + config.RESULTS_DIR):
    os.makedirs("./" + config.RESULTS_DIR)

3.0 Parameters

DATA_START = '2008-01-01' 
TRAIN_START = '2009-01-01'
TRADE_START = '2020-07-01'
DATA_END = '2021-09-01'
LOOKBACK = 252 #trading days in one year
INITIAL_AMOUNT = 1_000_000 #dollars
TRANSACTION_COST_PCT = 0
REWARD_SCALING = 1e-1 #scaling factor applied to the reward signal

3.1 Download Data

We use the data from Yahoo Finance.

# hide
config
<module 'finrl.apps.config' from '/usr/local/lib/python3.7/dist-packages/finrl/apps/config.py'>
# hide
config.DOW_30_TICKER
['AXP',
 'AMGN',
 'AAPL',
 'BA',
 'CAT',
 'CSCO',
 'CVX',
 'GS',
 'HD',
 'HON',
 'IBM',
 'INTC',
 'JNJ',
 'KO',
 'JPM',
 'MCD',
 'MMM',
 'MRK',
 'MSFT',
 'NKE',
 'PG',
 'TRV',
 'UNH',
 'CRM',
 'VZ',
 'V',
 'WBA',
 'WMT',
 'DIS',
 'DOW']
# https://www.fidelity.com/mutual-funds/fidelity-funds/overview
my_tickers = [
  # Large Value
  'FBCVX', #Fidelity® Blue Chip Value Fund
  # 'FLVEX', #Fidelity® Large-Cap Value Enhanced Index Fund
  # 'FSLVX', #Fidelity® Stock Selector Large-Cap Value Fund
  # 'FVDFX', #Fidelity® Value Discovery Fund

  # Small/Mid Value
  # 'FLPSX', #Fidelity® Low-Priced Stock Fund
  'FSMVX', #Fidelity® Mid-Cap Value Fund
  # 'FDVLX', #Fidelity® Value Fund
  # 'FSLSX', #Fidelity® Value Strategies Fund
  # 'FCPVX', #Fidelity® Small-Cap Value Fund

  # Income-Oriented
  # 'FEQTX', #Fidelity® Equity Dividend Income Fund
  # 'FEQIX', #Fidelity® Equity-Income Fund
  # 'FGRIX', #Fidelity® Growth & Income Portfolio Fund
  'FDGFX', #Fidelity® Dividend Growth Fund

  # Large Blend
  # 'FSEBX', #Fidelity® Sustainability U.S. Equity Fund, NEW
  'FULVX', #Fidelity® US Low Volatility Equity Fund
  # 'FDEQX', #Fidelity® Disciplined Equity Fund
  # 'FLCEX', #Fidelity® Large-Cap Core Enhanced Index Fund
  # 'FLCSX', #Fidelity® Large-Cap Stock Fund
  # 'FGRTX', #Fidelity® Mega-Cap Stock Fund

  # Small/Mid Blend
  # 'FMEIX', #Fidelity® Mid-Cap Enhanced Index Fund
  # 'FCPEX', #Fidelity® Small-Cap Enhanced Index Fund
  # 'FSLCX', #Fidelity® Small-Cap Stock Fund
  'FDSCX', #Fidelity® Stock Selector Small-Cap Fund
  # 'FSCRX', #Fidelity® Small-Cap Discovery Fund

  # Go-Anywhere
  'FDCAX', #Fidelity® Capital Appreciation Fund
  # 'FCNTX', #Fidelity® Contrafund®
  # 'FMAGX', #Fidelity® Magellan® Fund
  # 'FMILX', #Fidelity® New Millennium Fund

  # Large Growth
  # 'FBGRX', #Fidelity® Blue Chip Growth Fund
  # 'FEXPX', #Fidelity® Export & Multinational Fund
  # 'FTQGX', #Fidelity® Focused Stock Fund
  # 'FFIDX', #Fidelity® Fund
  'FDSVX', #Fidelity® Growth Discovery Fund
  # 'FDGRX', #Fidelity® Growth Company Fund
  # 'FLGEX', #Fidelity® Large-Cap Growth Enhanced Index Fund
  # 'FOCPX', #Fidelity® OTC Portfolio
  # 'FDSSX', #Fidelity® Stock Selector All Cap Fund
  # 'FTRNX', #Fidelity® Trend Fund

  # Small/Mid Growth
  # 'FDEGX', #Fidelity® Growth Strategies Fund
  # 'FMCSX', #Fidelity® Mid-Cap Stock Fund
  # 'FSSMX', #Fidelity® Stock Selector Mid-Cap Fund
  'FCPGX', #Fidelity® Small-Cap Growth Fund

  # Diversifiers
  # 'FWOMX', #Fidelity® Women's Leadership Fund
  'FIFNX', #Fidelity® Founders Fund
  # 'FLVCX', #Fidelity® Leveraged Company Stock Fund
]
df = YahooDownloader(start_date = DATA_START,
                     end_date = DATA_END,
                     # ticker_list = config.DOW_30_TICKER).fetch_data()
                     ticker_list = my_tickers).fetch_data()
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
Shape of DataFrame:  (25183, 8)
df
date open high low close volume tic day
0 2008-01-02 14.420000 14.420000 14.420000 11.721002 0 FBCVX 2
1 2008-01-02 15.660000 15.660000 15.660000 6.663240 0 FCPGX 2
2 2008-01-02 26.370001 26.370001 26.370001 11.040798 0 FDCAX 2
3 2008-01-02 28.930000 28.930000 28.930000 10.950327 0 FDGFX 2
4 2008-01-02 19.670000 19.670000 19.670000 11.192782 0 FDSCX 2
... ... ... ... ... ... ... ... ...
25178 2021-08-31 36.700001 36.700001 36.700001 36.700001 0 FDSCX 1
25179 2021-08-31 56.790001 56.790001 56.790001 56.790001 0 FDSVX 1
25180 2021-08-31 19.110001 19.110001 19.110001 19.110001 0 FIFNX 1
25181 2021-08-31 29.379999 29.379999 29.379999 29.379999 0 FSMVX 1
25182 2021-08-31 12.160000 12.160000 12.160000 12.160000 0 FULVX 1

25183 rows × 8 columns

# hide
print(df['day'].unique())
# df.loc[100:150, 'day'] #assume day-of-week: 0 - 4
[2 3 4 0 1]
# 
# Verify 2 unique tickers
# https://www.investopedia.com/ask/answers/who-or-what-is-dow-jones/
lst = list(df['tic'].unique())
print(len(lst), lst)
9 ['FBCVX', 'FCPGX', 'FDCAX', 'FDGFX', 'FDSCX', 'FDSVX', 'FSMVX', 'FIFNX', 'FULVX']

3.2 Data Understanding and Preparation

We will keep showing snippets of the data set as it evolves to assist with understanding. There is a need to check for missing data and also to do some feature engineering. We rely on the FeatureEngineer class to take care of these needs. Some indicators used are:

  • Moving Average Convergence Divergence (MACD)

The MACD is primarily used to gauge the strength of stock price movement. It does this by measuring the divergence of two exponential moving averages (EMAs), commonly a 12-period EMA and a 26-period EMA.

  • Relative Strength Index (RSI)

The RSI aims to indicate whether a market is considered to be overbought or oversold in relation to recent price levels.

  • Commodity Channel Index (CCI)

The Commodity Channel Index​ (CCI) is a momentum-based oscillator used to help determine when an investment vehicle is reaching a condition of being overbought or oversold.

FinRL also uses the financial turbulence index that measures extreme asset price fluctuation.

3.2.1 Add technical indicators

%%time
fe = FeatureEngineer(
  use_technical_indicator=True,
  use_turbulence=False,
  user_defined_feature=False)
df = fe.preprocess_data(df)
Successfully added technical indicators
CPU times: user 15.6 s, sys: 1.76 s, total: 17.4 s
Wall time: 15.7 s
# df.head(100)
df
date open high low close volume tic day macd boll_ub boll_lb rsi_30 cci_30 dx_30 close_30_sma close_60_sma
0 2008-01-02 14.420000 14.420000 14.420000 11.721002 0 FBCVX 2 0.000000 11.743293 11.674326 0.000000 -66.666667 100.000000 11.721002 11.721002
3441 2008-01-02 15.660000 15.660000 15.660000 6.663240 0 FCPGX 2 0.000000 11.743293 11.674326 0.000000 -66.666667 100.000000 6.663240 6.663240
6882 2008-01-02 26.370001 26.370001 26.370001 11.040798 0 FDCAX 2 0.000000 11.743293 11.674326 0.000000 -66.666667 100.000000 11.040798 11.040798
10323 2008-01-02 28.930000 28.930000 28.930000 10.950327 0 FDGFX 2 0.000000 11.743293 11.674326 0.000000 -66.666667 100.000000 10.950327 10.950327
13764 2008-01-02 19.670000 19.670000 19.670000 11.192782 0 FDSCX 2 0.000000 11.743293 11.674326 0.000000 -66.666667 100.000000 11.192782 11.192782
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
10322 2021-08-31 50.099998 50.099998 50.099998 50.099998 0 FDCAX 1 0.485560 50.233594 47.837405 63.290963 186.262675 33.935129 48.798333 48.066166
13763 2021-08-31 37.290001 37.290001 37.290001 35.175362 0 FDGFX 1 0.160866 35.429981 34.401928 56.769455 108.062182 14.914749 34.818795 34.498705
17204 2021-08-31 36.700001 36.700001 36.700001 36.700001 0 FDSCX 1 0.355394 36.956232 34.422768 56.811784 158.442705 22.592461 35.431666 35.356833
20645 2021-08-31 56.790001 56.790001 56.790001 56.790001 0 FDSVX 1 0.616046 56.891673 53.754767 63.869985 13.372480 1.499019 55.064085 54.107896
24086 2021-08-31 29.379999 29.379999 29.379999 29.379999 0 FSMVX 1 0.269141 29.717076 28.292924 56.730943 100.071603 17.984678 28.728666 28.399833

24087 rows × 16 columns

We see that the FeatureEngineer has added some features:

  • macd
  • boll_ub (upper Bollinger Band)
  • boll_lb (lower Bollinger Band)
  • rsi_30 (with a lookback of 30)
  • cci_30 (with a lookback of 30)
  • dx_30 (with a lookback of 30)
  • close_30_sma (close price simple moving average with a lookback of 30)
  • close_60_sma (close price simple moving average with a lookback of 60)
# hide
# on stockstats library: (seems like FeatureEngineer makes use of it)
# https://medium.com/codex/this-python-library-will-help-you-get-stock-technical-indicators-in-one-line-of-code-c11ed2c8e45f

3.2.2 Add covariance matrix as a feature

Adding the portfolio’s covariance matrix as a feature has some advantages. It can be used to quantify the risk (standard deviation) associated with a portfolio.

df = df.sort_values(['date','tic'], ignore_index=True)
df
date open high low close volume tic day macd boll_ub boll_lb rsi_30 cci_30 dx_30 close_30_sma close_60_sma
0 2008-01-02 14.420000 14.420000 14.420000 11.721002 0 FBCVX 2 0.000000 11.743293 11.674326 0.000000 -66.666667 100.000000 11.721002 11.721002
1 2008-01-02 15.660000 15.660000 15.660000 6.663240 0 FCPGX 2 0.000000 11.743293 11.674326 0.000000 -66.666667 100.000000 6.663240 6.663240
2 2008-01-02 26.370001 26.370001 26.370001 11.040798 0 FDCAX 2 0.000000 11.743293 11.674326 0.000000 -66.666667 100.000000 11.040798 11.040798
3 2008-01-02 28.930000 28.930000 28.930000 10.950327 0 FDGFX 2 0.000000 11.743293 11.674326 0.000000 -66.666667 100.000000 10.950327 10.950327
4 2008-01-02 19.670000 19.670000 19.670000 11.192782 0 FDSCX 2 0.000000 11.743293 11.674326 0.000000 -66.666667 100.000000 11.192782 11.192782
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
24082 2021-08-31 50.099998 50.099998 50.099998 50.099998 0 FDCAX 1 0.485560 50.233594 47.837405 63.290963 186.262675 33.935129 48.798333 48.066166
24083 2021-08-31 37.290001 37.290001 37.290001 35.175362 0 FDGFX 1 0.160866 35.429981 34.401928 56.769455 108.062182 14.914749 34.818795 34.498705
24084 2021-08-31 36.700001 36.700001 36.700001 36.700001 0 FDSCX 1 0.355394 36.956232 34.422768 56.811784 158.442705 22.592461 35.431666 35.356833
24085 2021-08-31 56.790001 56.790001 56.790001 56.790001 0 FDSVX 1 0.616046 56.891673 53.754767 63.869985 13.372480 1.499019 55.064085 54.107896
24086 2021-08-31 29.379999 29.379999 29.379999 29.379999 0 FSMVX 1 0.269141 29.717076 28.292924 56.730943 100.071603 17.984678 28.728666 28.399833

24087 rows × 16 columns

df.index = df.date.factorize()[0] #. now each new date has a new index
df
date open high low close volume tic day macd boll_ub boll_lb rsi_30 cci_30 dx_30 close_30_sma close_60_sma
0 2008-01-02 14.420000 14.420000 14.420000 11.721002 0 FBCVX 2 0.000000 11.743293 11.674326 0.000000 -66.666667 100.000000 11.721002 11.721002
0 2008-01-02 15.660000 15.660000 15.660000 6.663240 0 FCPGX 2 0.000000 11.743293 11.674326 0.000000 -66.666667 100.000000 6.663240 6.663240
0 2008-01-02 26.370001 26.370001 26.370001 11.040798 0 FDCAX 2 0.000000 11.743293 11.674326 0.000000 -66.666667 100.000000 11.040798 11.040798
0 2008-01-02 28.930000 28.930000 28.930000 10.950327 0 FDGFX 2 0.000000 11.743293 11.674326 0.000000 -66.666667 100.000000 10.950327 10.950327
0 2008-01-02 19.670000 19.670000 19.670000 11.192782 0 FDSCX 2 0.000000 11.743293 11.674326 0.000000 -66.666667 100.000000 11.192782 11.192782
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3440 2021-08-31 50.099998 50.099998 50.099998 50.099998 0 FDCAX 1 0.485560 50.233594 47.837405 63.290963 186.262675 33.935129 48.798333 48.066166
3440 2021-08-31 37.290001 37.290001 37.290001 35.175362 0 FDGFX 1 0.160866 35.429981 34.401928 56.769455 108.062182 14.914749 34.818795 34.498705
3440 2021-08-31 36.700001 36.700001 36.700001 36.700001 0 FDSCX 1 0.355394 36.956232 34.422768 56.811784 158.442705 22.592461 35.431666 35.356833
3440 2021-08-31 56.790001 56.790001 56.790001 56.790001 0 FDSVX 1 0.616046 56.891673 53.754767 63.869985 13.372480 1.499019 55.064085 54.107896
3440 2021-08-31 29.379999 29.379999 29.379999 29.379999 0 FSMVX 1 0.269141 29.717076 28.292924 56.730943 100.071603 17.984678 28.728666 28.399833

24087 rows × 16 columns

# hide
# len(df.index.unique())
# range(lookback, len(df.index.unique()))
lst = [i for i in range(LOOKBACK, len(df.index.unique()))]
lst[:20], lst[-1]
([252,
  253,
  254,
  255,
  256,
  257,
  258,
  259,
  260,
  261,
  262,
  263,
  264,
  265,
  266,
  267,
  268,
  269,
  270,
  271],
 3440)
%%time
cov_list = []
return_list = []
for i in range(LOOKBACK, len(df.index.unique())):
  data_lookback = df.loc[i-LOOKBACK:i, :]
  price_lookback = data_lookback.pivot_table(index='date', columns='tic', values='close')
  return_lookback = price_lookback.pct_change().dropna()
  return_list.append(return_lookback)
  covs = return_lookback.cov().values 
  cov_list.append(covs)
CPU times: user 43.9 s, sys: 581 ms, total: 44.4 s
Wall time: 43.8 s
len(df['date'].unique()), len(df['date'].unique()[LOOKBACK:])
(3441, 3189)
len(cov_list), len(return_list)
(3189, 3189)
# hide
# cov_list[0]
# return_list[:1]
# 
# form a dataframe with the cov_list and return_list
df_cov = pd.DataFrame({'date':df.date.unique()[LOOKBACK:], 'cov_list':cov_list, 'return_list':return_list})
df_cov
date cov_list return_list
0 2008-12-31 [[0.0008916783690050918, 0.0007199340713349982... tic FBCVX FCPGX FDCAX ... ...
1 2009-01-02 [[0.0008961331319844811, 0.0007229547753450831... tic FBCVX FCPGX FDCAX ... ...
2 2009-01-05 [[0.0008942290687322837, 0.0007206030648001816... tic FBCVX FCPGX FDCAX ... ...
3 2009-01-06 [[0.0008951474693283972, 0.0007217056039965993... tic FBCVX FCPGX FDCAX ... ...
4 2009-01-07 [[0.0008975149225319093, 0.0007241915142094865... tic FBCVX FCPGX FDCAX ... ...
... ... ... ...
3184 2021-08-25 [[8.139554607177268e-05, 7.076161408798078e-05... tic FBCVX FCPGX FDCAX ... ...
3185 2021-08-26 [[8.15840000747442e-05, 7.098311940863228e-05,... tic FBCVX FCPGX FDCAX ... ...
3186 2021-08-27 [[8.155510926610815e-05, 7.162918231818502e-05... tic FBCVX FCPGX FDCAX ... ...
3187 2021-08-30 [[8.162658991218995e-05, 7.155445795151861e-05... tic FBCVX FCPGX FDCAX ... ...
3188 2021-08-31 [[8.137781480231036e-05, 7.150181363650654e-05... tic FBCVX FCPGX FDCAX ... ...

3189 rows × 3 columns

# 
# merge df_cov with the main dataframe
df = df.merge(df_cov, on='date')
df
date open high low close volume tic day macd boll_ub boll_lb rsi_30 cci_30 dx_30 close_30_sma close_60_sma cov_list return_list
0 2008-12-31 7.900000 7.900000 7.900000 6.548186 0 FBCVX 2 -0.003032 6.625188 6.032323 48.240961 102.573711 6.437515 6.187183 6.453092 [[0.0008916783690050918, 0.0007199340713349982... tic FBCVX FCPGX FDCAX ... ...
1 2008-12-31 8.690000 8.690000 8.690000 3.697545 0 FCPGX 2 0.015371 3.687956 3.319084 48.638490 137.594713 11.647507 3.427356 3.594576 [[0.0008916783690050918, 0.0007199340713349982... tic FBCVX FCPGX FDCAX ... ...
2 2008-12-31 15.730000 15.730000 15.730000 6.669960 0 FDCAX 2 0.025542 6.761376 6.140492 48.908868 110.240063 8.794697 6.289111 6.487707 [[0.0008916783690050918, 0.0007199340713349982... tic FBCVX FCPGX FDCAX ... ...
3 2008-12-31 15.790000 15.790000 15.790000 6.350790 0 FDGFX 2 0.025277 6.407065 5.710326 48.573492 115.808269 8.665272 5.894904 6.192935 [[0.0008916783690050918, 0.0007199340713349982... tic FBCVX FCPGX FDCAX ... ...
4 2008-12-31 10.530000 10.530000 10.530000 6.006501 0 FDSCX 2 0.025076 6.001624 5.402930 48.564497 128.462223 10.677362 5.559625 5.839951 [[0.0008916783690050918, 0.0007199340713349982... tic FBCVX FCPGX FDCAX ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
22318 2021-08-31 50.099998 50.099998 50.099998 50.099998 0 FDCAX 1 0.485560 50.233594 47.837405 63.290963 186.262675 33.935129 48.798333 48.066166 [[8.137781480231036e-05, 7.150181363650654e-05... tic FBCVX FCPGX FDCAX ... ...
22319 2021-08-31 37.290001 37.290001 37.290001 35.175362 0 FDGFX 1 0.160866 35.429981 34.401928 56.769455 108.062182 14.914749 34.818795 34.498705 [[8.137781480231036e-05, 7.150181363650654e-05... tic FBCVX FCPGX FDCAX ... ...
22320 2021-08-31 36.700001 36.700001 36.700001 36.700001 0 FDSCX 1 0.355394 36.956232 34.422768 56.811784 158.442705 22.592461 35.431666 35.356833 [[8.137781480231036e-05, 7.150181363650654e-05... tic FBCVX FCPGX FDCAX ... ...
22321 2021-08-31 56.790001 56.790001 56.790001 56.790001 0 FDSVX 1 0.616046 56.891673 53.754767 63.869985 13.372480 1.499019 55.064085 54.107896 [[8.137781480231036e-05, 7.150181363650654e-05... tic FBCVX FCPGX FDCAX ... ...
22322 2021-08-31 29.379999 29.379999 29.379999 29.379999 0 FSMVX 1 0.269141 29.717076 28.292924 56.730943 100.071603 17.984678 28.728666 28.399833 [[8.137781480231036e-05, 7.150181363650654e-05... tic FBCVX FCPGX FDCAX ... ...

22323 rows × 18 columns

df = df.sort_values(['date','tic']).reset_index(drop=True)
df
date open high low close volume tic day macd boll_ub boll_lb rsi_30 cci_30 dx_30 close_30_sma close_60_sma cov_list return_list
0 2008-12-31 7.900000 7.900000 7.900000 6.548186 0 FBCVX 2 -0.003032 6.625188 6.032323 48.240961 102.573711 6.437515 6.187183 6.453092 [[0.0008916783690050918, 0.0007199340713349982... tic FBCVX FCPGX FDCAX ... ...
1 2008-12-31 8.690000 8.690000 8.690000 3.697545 0 FCPGX 2 0.015371 3.687956 3.319084 48.638490 137.594713 11.647507 3.427356 3.594576 [[0.0008916783690050918, 0.0007199340713349982... tic FBCVX FCPGX FDCAX ... ...
2 2008-12-31 15.730000 15.730000 15.730000 6.669960 0 FDCAX 2 0.025542 6.761376 6.140492 48.908868 110.240063 8.794697 6.289111 6.487707 [[0.0008916783690050918, 0.0007199340713349982... tic FBCVX FCPGX FDCAX ... ...
3 2008-12-31 15.790000 15.790000 15.790000 6.350790 0 FDGFX 2 0.025277 6.407065 5.710326 48.573492 115.808269 8.665272 5.894904 6.192935 [[0.0008916783690050918, 0.0007199340713349982... tic FBCVX FCPGX FDCAX ... ...
4 2008-12-31 10.530000 10.530000 10.530000 6.006501 0 FDSCX 2 0.025076 6.001624 5.402930 48.564497 128.462223 10.677362 5.559625 5.839951 [[0.0008916783690050918, 0.0007199340713349982... tic FBCVX FCPGX FDCAX ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
22318 2021-08-31 50.099998 50.099998 50.099998 50.099998 0 FDCAX 1 0.485560 50.233594 47.837405 63.290963 186.262675 33.935129 48.798333 48.066166 [[8.137781480231036e-05, 7.150181363650654e-05... tic FBCVX FCPGX FDCAX ... ...
22319 2021-08-31 37.290001 37.290001 37.290001 35.175362 0 FDGFX 1 0.160866 35.429981 34.401928 56.769455 108.062182 14.914749 34.818795 34.498705 [[8.137781480231036e-05, 7.150181363650654e-05... tic FBCVX FCPGX FDCAX ... ...
22320 2021-08-31 36.700001 36.700001 36.700001 36.700001 0 FDSCX 1 0.355394 36.956232 34.422768 56.811784 158.442705 22.592461 35.431666 35.356833 [[8.137781480231036e-05, 7.150181363650654e-05... tic FBCVX FCPGX FDCAX ... ...
22321 2021-08-31 56.790001 56.790001 56.790001 56.790001 0 FDSVX 1 0.616046 56.891673 53.754767 63.869985 13.372480 1.499019 55.064085 54.107896 [[8.137781480231036e-05, 7.150181363650654e-05... tic FBCVX FCPGX FDCAX ... ...
22322 2021-08-31 29.379999 29.379999 29.379999 29.379999 0 FSMVX 1 0.269141 29.717076 28.292924 56.730943 100.071603 17.984678 28.728666 28.399833 [[8.137781480231036e-05, 7.150181363650654e-05... tic FBCVX FCPGX FDCAX ... ...

22323 rows × 18 columns

3.3 Modeling

The portfolio within the market will be modeled by the OpenAI Gym framework. This is referred to as the environment. For the agent, we will use the FinRL framework. As the agent interacts with the environment it will gradually learn a trading strategy based on the reward function. The agent is rewarded according to the total value of the portfolio.

3.3.1 Training data

TRAIN_START, TRADE_START
('2009-01-01', '2020-07-01')
train = data_split(df, TRAIN_START, TRADE_START)
train
date open high low close volume tic day macd boll_ub boll_lb rsi_30 cci_30 dx_30 close_30_sma close_60_sma cov_list return_list
0 2009-01-02 8.150000 8.150000 8.150000 6.755406 0 FBCVX 4 0.030432 6.702175 6.010826 49.991970 152.551464 13.863211 6.209512 6.445367 [[0.0008961331319844811, 0.0007229547753450831... tic FBCVX FCPGX FDCAX ... ...
0 2009-01-02 8.870000 8.870000 8.870000 3.774134 0 FCPGX 4 0.032454 3.737249 3.305107 49.880034 162.366452 16.518011 3.442107 3.588619 [[0.0008961331319844811, 0.0007229547753450831... tic FBCVX FCPGX FDCAX ... ...
0 2009-01-02 16.280001 16.280001 16.280001 6.903177 0 FDCAX 4 0.058199 6.834997 6.136275 51.154864 170.139325 17.925843 6.316292 6.487900 [[0.0008961331319844811, 0.0007229547753450831... tic FBCVX FCPGX FDCAX ... ...
0 2009-01-02 16.370001 16.370001 16.370001 6.584068 0 FDGFX 4 0.060919 6.498272 5.695824 50.606719 165.116529 16.838559 5.921930 6.184977 [[0.0008961331319844811, 0.0007229547753450831... tic FBCVX FCPGX FDCAX ... ...
0 2009-01-02 10.740000 10.740000 10.740000 6.126287 0 FDSCX 4 0.052369 6.074467 5.390190 49.704302 150.989986 15.152331 5.582884 5.829957 [[0.0008961331319844811, 0.0007229547753450831... tic FBCVX FCPGX FDCAX ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2892 2020-06-30 35.930000 35.930000 35.930000 33.073399 0 FDCAX 1 0.501019 33.705111 31.615998 57.598799 82.309417 17.475881 32.193710 30.489567 [[0.000538045795431353, 0.0004554362595919771,... tic FBCVX FCPGX FDCAX ... ...
2892 2020-06-30 25.700001 25.700001 25.700001 23.743460 0 FDGFX 1 0.034722 26.677363 22.296985 50.881596 -25.411227 0.081819 24.063118 22.812814 [[0.000538045795431353, 0.0004554362595919771,... tic FBCVX FCPGX FDCAX ... ...
2892 2020-06-30 22.870001 22.870001 22.870001 22.532381 0 FDSCX 1 0.174035 23.630273 21.395077 53.642964 31.031452 8.899931 22.311030 21.079976 [[0.000538045795431353, 0.0004554362595919771,... tic FBCVX FCPGX FDCAX ... ...
2892 2020-06-30 45.220001 45.220001 45.220001 37.346981 0 FDSVX 1 0.708226 37.957114 35.252713 59.415280 99.055416 21.233245 36.033532 33.922269 [[0.000538045795431353, 0.0004554362595919771,... tic FBCVX FCPGX FDCAX ... ...
2892 2020-06-30 18.670000 18.670000 18.670000 18.302086 0 FSMVX 1 0.013502 20.369239 17.161312 51.025374 -19.628866 1.400737 18.481807 17.546607 [[0.000538045795431353, 0.0004554362595919771,... tic FBCVX FCPGX FDCAX ... ...

20251 rows × 18 columns

3.3.2 Portfolio (Environment)

class Portfolio(gym.Env):
    """A portfolio/market environment
    Attributes
    ----------
        df: DataFrame
            input data
        stock_dim : int
            number of unique stocks
        hmax : int
            maximum number of shares to trade
        initial_amount : int
            start money
        transaction_cost_pct: float
            transaction cost percentage per trade
        reward_scaling: float
            scaling factor for reward, good for training
        state_space: int
            the dimension of input features
        action_space: int
            equals stock dimension
        tech_indicator_list: list
            a list of technical indicator names
        turbulence_threshold: int
            a threshold to control risk aversion
        day: int
            an increment number to control date
    Methods
    -------
    _sell_stock()
        perform sell action based on the sign of the action
    _buy_stock()
        perform buy action based on the sign of the action
    step()
        at each step the agent will return actions, then 
        we will calculate the reward, and return the next observation.
    reset()
        reset the environment
    render()
        use render to return other functions
    save_asset_memory()
        return account value at each time step
    save_action_memory()
        return actions/positions at each time step
    """
    metadata = {'render.modes': ['human']}

    def __init__(self, 
                df,
                stock_dim,
                hmax,
                initial_amount,
                transaction_cost_pct,
                reward_scaling,
                state_space,
                action_space,
                tech_indicator_list,
                turbulence_threshold=None,
                lookback=LOOKBACK,
                day=0):
        #super(StockEnv, self).__init__()
        #money = 10 , scope = 1
        self.day = day
        self.lookback = lookback
        self.df = df
        self.stock_dim = stock_dim
        self.hmax = hmax
        self.initial_amount = initial_amount
        self.transaction_cost_pct = transaction_cost_pct
        self.reward_scaling = reward_scaling
        self.state_space = state_space
        self.action_space = action_space
        self.tech_indicator_list = tech_indicator_list

        # action_space normalization and shape is self.stock_dim
        self.action_space = spaces.Box(low=0, high=1, shape=(self.action_space,)) 
        self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape=(self.state_space + len(self.tech_indicator_list), self.state_space))

        # load data from a pandas dataframe
        self.data = self.df.loc[self.day, :]
        self.covs = self.data['cov_list'].values[0]
        self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
        self.terminal = False
        self.turbulence_threshold = turbulence_threshold
        # initalize state: inital portfolio return + individual stock return + individual weights
        self.portfolio_value = self.initial_amount

        # memorize portfolio value each step
        self.asset_memory = [self.initial_amount]
        # memorize portfolio return each step
        self.portfolio_return_memory = [0]
        self.actions_memory = [[1/self.stock_dim]*self.stock_dim]
        self.date_memory = [self.data.date.unique()[0]]

    def step(self, actions):
        self.terminal = self.day >= len(self.df.index.unique()) - 1

        if self.terminal:
            df = pd.DataFrame(self.portfolio_return_memory)
            df.columns = ['daily_return']
            plt.plot(df.daily_return.cumsum(), 'r')
            plt.savefig('results/cumulative_reward.png')
            plt.close()
            
            plt.plot(self.portfolio_return_memory, 'r')
            plt.savefig('results/rewards.png')
            plt.close()

            print("=================================")
            print("begin_total_asset:{}".format(self.asset_memory[0]))           
            print("end_total_asset:{}".format(self.portfolio_value))

            df_daily_return = pd.DataFrame(self.portfolio_return_memory)
            df_daily_return.columns = ['daily_return']
            if df_daily_return['daily_return'].std() !=0:
              sharpe = (252**0.5)*df_daily_return['daily_return'].mean()/ \
                       df_daily_return['daily_return'].std()
              print("Sharpe: ",sharpe)
            print("=================================")
            
            return self.state, self.reward, self.terminal, {}
        else:
            weights = self.softmax_normalization(actions) 
            self.actions_memory.append(weights)
            last_day_memory = self.data

            #load next state
            self.day += 1
            self.data = self.df.loc[self.day,:]
            self.covs = self.data['cov_list'].values[0]
            self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
            portfolio_return = sum(((self.data.close.values / last_day_memory.close.values)-1)*weights)
            log_portfolio_return = np.log(sum((self.data.close.values / last_day_memory.close.values)*weights))
            # update portfolio value
            new_portfolio_value = self.portfolio_value*(1+portfolio_return)
            self.portfolio_value = new_portfolio_value

            # save into memory
            self.portfolio_return_memory.append(portfolio_return)
            self.date_memory.append(self.data.date.unique()[0])            
            self.asset_memory.append(new_portfolio_value)

            # the reward is the new portfolio value or end portfolo value
            self.reward = new_portfolio_value
        return self.state, self.reward, self.terminal, {}

    def reset(self):
        self.asset_memory = [self.initial_amount]
        self.day = 0
        self.data = self.df.loc[self.day,:]
        # load states
        self.covs = self.data['cov_list'].values[0]
        self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
        self.portfolio_value = self.initial_amount
        #self.cost = 0
        #self.trades = 0
        self.terminal = False 
        self.portfolio_return_memory = [0]
        self.actions_memory=[[1/self.stock_dim]*self.stock_dim]
        self.date_memory=[self.data.date.unique()[0]] 
        return self.state
    
    def render(self, mode='human'):
        return self.state
        
    def softmax_normalization(self, actions):
        numerator = np.exp(actions)
        denominator = np.sum(np.exp(actions))
        softmax_output = numerator/denominator
        return softmax_output

    def save_asset_memory(self):
        date_list = self.date_memory
        portfolio_return = self.portfolio_return_memory
        #print(len(date_list))
        #print(len(asset_list))
        df_account_value = pd.DataFrame({'date':date_list,'daily_return':portfolio_return})
        return df_account_value

    def save_action_memory(self):
        # date and close price length must match actions length
        date_list = self.date_memory
        df_date = pd.DataFrame(date_list)
        df_date.columns = ['date']
        
        action_list = self.actions_memory
        df_actions = pd.DataFrame(action_list)
        df_actions.columns = self.data.tic.values
        df_actions.index = df_date.date
        #df_actions = pd.DataFrame({'date':date_list,'actions':action_list})
        return df_actions

    def _seed(self, seed=None):
        self.np_random, seed = seeding.np_random(seed)
        return [seed]

    def get_sb_env(self):
        e = DummyVecEnv([lambda: self])
        obs = e.reset()
        return e, obs
# hide
#. was 29 in original notebook !
stock_dimension = len(train['tic'].unique())
state_space = stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")
Stock Dimension: 7, State Space: 7
# hide
config.TECHNICAL_INDICATORS_LIST
['macd',
 'boll_ub',
 'boll_lb',
 'rsi_30',
 'cci_30',
 'dx_30',
 'close_30_sma',
 'close_60_sma']
env_kwargs = {
    "hmax": 100, 
    "initial_amount": INITIAL_AMOUNT, 
    "transaction_cost_pct": TRANSACTION_COST_PCT, 
    "state_space": state_space, 
    "stock_dim": stock_dimension, 
    "tech_indicator_list": config.TECHNICAL_INDICATORS_LIST, 
    "action_space": stock_dimension, 
    "reward_scaling": REWARD_SCALING,
}
e_train_gym = Portfolio(df=train, **env_kwargs)
env_train, _ = e_train_gym.get_sb_env()
print(type(env_train))
<class 'stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv'>

3.3.3 Portfolio Manager (Agent)

  • We investigate the performance of two models for the agent:

    • A2C
    • PPO

Both models are based on algorithm implementations in the OpenAI Baselines and Stable Baselines libraries.

# hide
# https://towardsdatascience.com/finrl-for-quantitative-finance-tutorial-for-portfolio-allocation-9b417660c7cd
# hide
# agent = DRLAgent(env=env_train)
# DDPG_PARAMS = {
#   # "n_steps": 10, 
#   # "ent_coef": 0.005, 
#   "learning_rate": 0.0004,
# }
# model_ddpg = agent.get_model(model_name="ddpg", model_kwargs=DDPG_PARAMS)
# model_ddpg
# hide
# %%time
# trained_ddpg = agent.train_model(model=model_ddpg, tb_log_name='ddpg', total_timesteps=1_000)
# hide
# agent = DRLAgent(env=env_train)
# SAC_PARAMS = {
#   # "n_steps": 10, 
#   # "ent_coef": 0.005, 
#   "learning_rate": 0.0004,
# }
# model_sac = agent.get_model(model_name="sac", model_kwargs=SAC_PARAMS)
# model_sac
# hide
# %%time
# trained_sac = agent.train_model(model=model_sac, tb_log_name='sac', total_timesteps=1_000)
# hide
# agent = DRLAgent(env=env_train)
# TD3_PARAMS = {
#   # "n_steps": 10, 
#   # "ent_coef": 0.005, 
#   "learning_rate": 0.0004,
# }
# model_td3 = agent.get_model(model_name="td3", model_kwargs=TD3_PARAMS)
# model_td3
# hide
# %%time
# trained_sac = agent.train_model(model=model_sac, tb_log_name='sac', total_timesteps=1_000)
# hide
# agent = DRLAgent(env=env_train)
# MADDDPG_PARAMS = {
#   # "n_steps": 10, 
#   # "ent_coef": 0.005, 
#   "learning_rate": 0.0004,
# }
# model_madddpg = agent.get_model(model_name="madddpg", model_kwargs=MADDDPG_PARAMS)
# model_madddpg
# hide
# %%time
# trained_madddpg = agent.train_model(model=model_madddpg, tb_log_name='madddpg', total_timesteps=1_000)
3.3.3.1 A2C
agent = DRLAgent(env=env_train)
A2C_PARAMS = {
  "n_steps": 10, 
  "ent_coef": 0.005, 
  "learning_rate": 0.0004,
}
model_a2c = agent.get_model(model_name="a2c", model_kwargs=A2C_PARAMS)
model_a2c
{'n_steps': 10, 'ent_coef': 0.005, 'learning_rate': 0.0004}
Using cuda device
<stable_baselines3.a2c.a2c.A2C at 0x7fc9fc325d10>
%%time
trained_a2c = agent.train_model(model=model_a2c, tb_log_name='a2c', total_timesteps=50_000)
Logging to tensorboard_log/a2c/a2c_1
-------------------------------------
| time/                 |           |
|    fps                | 84        |
|    iterations         | 100       |
|    time_elapsed       | 11        |
|    total_timesteps    | 1000      |
| train/                |           |
|    entropy_loss       | -9.88     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0004    |
|    n_updates          | 99        |
|    policy_loss        | 9.6e+07   |
|    reward             | 1770969.9 |
|    std                | 0.993     |
|    value_loss         | 1.09e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 126       |
|    iterations         | 200       |
|    time_elapsed       | 15        |
|    total_timesteps    | 2000      |
| train/                |           |
|    entropy_loss       | -9.87     |
|    explained_variance | -2.38e-07 |
|    learning_rate      | 0.0004    |
|    n_updates          | 199       |
|    policy_loss        | 1.51e+08  |
|    reward             | 2978305.2 |
|    std                | 0.992     |
|    value_loss         | 3.02e+14  |
-------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4110079.559657067
Sharpe:  0.6979807594386715
=================================
-------------------------------------
| time/                 |           |
|    fps                | 148       |
|    iterations         | 300       |
|    time_elapsed       | 20        |
|    total_timesteps    | 3000      |
| train/                |           |
|    entropy_loss       | -9.88     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 299       |
|    policy_loss        | 5.4e+07   |
|    reward             | 1094988.2 |
|    std                | 0.992     |
|    value_loss         | 3.82e+13  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 164       |
|    iterations         | 400       |
|    time_elapsed       | 24        |
|    total_timesteps    | 4000      |
| train/                |           |
|    entropy_loss       | -9.89     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0004    |
|    n_updates          | 399       |
|    policy_loss        | 1.04e+08  |
|    reward             | 2092528.2 |
|    std                | 0.993     |
|    value_loss         | 1.53e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 175       |
|    iterations         | 500       |
|    time_elapsed       | 28        |
|    total_timesteps    | 5000      |
| train/                |           |
|    entropy_loss       | -9.88     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 499       |
|    policy_loss        | 1.49e+08  |
|    reward             | 3133837.2 |
|    std                | 0.992     |
|    value_loss         | 3.65e+14  |
-------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:3976451.6208797866
Sharpe:  0.6857333704791829
=================================
-------------------------------------
| time/                 |           |
|    fps                | 182       |
|    iterations         | 600       |
|    time_elapsed       | 32        |
|    total_timesteps    | 6000      |
| train/                |           |
|    entropy_loss       | -9.86     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 599       |
|    policy_loss        | 6.88e+07  |
|    reward             | 1255545.6 |
|    std                | 0.99      |
|    value_loss         | 6.26e+13  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 190       |
|    iterations         | 700       |
|    time_elapsed       | 36        |
|    total_timesteps    | 7000      |
| train/                |           |
|    entropy_loss       | -9.85     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 699       |
|    policy_loss        | 1.3e+08   |
|    reward             | 2371918.2 |
|    std                | 0.989     |
|    value_loss         | 1.86e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 196       |
|    iterations         | 800       |
|    time_elapsed       | 40        |
|    total_timesteps    | 8000      |
| train/                |           |
|    entropy_loss       | -9.86     |
|    explained_variance | 1.19e-07  |
|    learning_rate      | 0.0004    |
|    n_updates          | 799       |
|    policy_loss        | 1.8e+08   |
|    reward             | 3547805.2 |
|    std                | 0.989     |
|    value_loss         | 4.39e+14  |
-------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4161686.6336636837
Sharpe:  0.7041053033538833
=================================
-------------------------------------
| time/                 |           |
|    fps                | 199       |
|    iterations         | 900       |
|    time_elapsed       | 45        |
|    total_timesteps    | 9000      |
| train/                |           |
|    entropy_loss       | -9.84     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 899       |
|    policy_loss        | 7.17e+07  |
|    reward             | 1521407.1 |
|    std                | 0.987     |
|    value_loss         | 7.52e+13  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 203       |
|    iterations         | 1000      |
|    time_elapsed       | 49        |
|    total_timesteps    | 10000     |
| train/                |           |
|    entropy_loss       | -9.83     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 999       |
|    policy_loss        | 1.41e+08  |
|    reward             | 2594842.5 |
|    std                | 0.986     |
|    value_loss         | 2.43e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 206       |
|    iterations         | 1100      |
|    time_elapsed       | 53        |
|    total_timesteps    | 11000     |
| train/                |           |
|    entropy_loss       | -9.82     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 1099      |
|    policy_loss        | 2.02e+08  |
|    reward             | 3621684.2 |
|    std                | 0.985     |
|    value_loss         | 5.13e+14  |
-------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4182719.841788154
Sharpe:  0.7065226810322777
=================================
-------------------------------------
| time/                 |           |
|    fps                | 208       |
|    iterations         | 1200      |
|    time_elapsed       | 57        |
|    total_timesteps    | 12000     |
| train/                |           |
|    entropy_loss       | -9.8      |
|    explained_variance | 5.96e-08  |
|    learning_rate      | 0.0004    |
|    n_updates          | 1199      |
|    policy_loss        | 6.41e+07  |
|    reward             | 1435004.8 |
|    std                | 0.982     |
|    value_loss         | 6.42e+13  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 211       |
|    iterations         | 1300      |
|    time_elapsed       | 61        |
|    total_timesteps    | 13000     |
| train/                |           |
|    entropy_loss       | -9.79     |
|    explained_variance | 5.96e-08  |
|    learning_rate      | 0.0004    |
|    n_updates          | 1299      |
|    policy_loss        | 1.38e+08  |
|    reward             | 2707523.2 |
|    std                | 0.98      |
|    value_loss         | 2.51e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 213       |
|    iterations         | 1400      |
|    time_elapsed       | 65        |
|    total_timesteps    | 14000     |
| train/                |           |
|    entropy_loss       | -9.78     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 1399      |
|    policy_loss        | 2.29e+08  |
|    reward             | 4180936.8 |
|    std                | 0.979     |
|    value_loss         | 6.08e+14  |
-------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4271331.5898148315
Sharpe:  0.7141078133246254
=================================
-------------------------------------
| time/                 |           |
|    fps                | 214       |
|    iterations         | 1500      |
|    time_elapsed       | 69        |
|    total_timesteps    | 15000     |
| train/                |           |
|    entropy_loss       | -9.78     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0004    |
|    n_updates          | 1499      |
|    policy_loss        | 8.03e+07  |
|    reward             | 1770714.2 |
|    std                | 0.978     |
|    value_loss         | 1.03e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 216       |
|    iterations         | 1600      |
|    time_elapsed       | 73        |
|    total_timesteps    | 16000     |
| train/                |           |
|    entropy_loss       | -9.78     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 1599      |
|    policy_loss        | 1.33e+08  |
|    reward             | 2799781.5 |
|    std                | 0.979     |
|    value_loss         | 2.76e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 218       |
|    iterations         | 1700      |
|    time_elapsed       | 77        |
|    total_timesteps    | 17000     |
| train/                |           |
|    entropy_loss       | -9.77     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0004    |
|    n_updates          | 1699      |
|    policy_loss        | 2.09e+08  |
|    reward             | 3782954.2 |
|    std                | 0.977     |
|    value_loss         | 4.7e+14   |
-------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4133498.8876433456
Sharpe:  0.7009423253577455
=================================
-------------------------------------
| time/                 |           |
|    fps                | 218       |
|    iterations         | 1800      |
|    time_elapsed       | 82        |
|    total_timesteps    | 18000     |
| train/                |           |
|    entropy_loss       | -9.75     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 1799      |
|    policy_loss        | 8.77e+07  |
|    reward             | 1747342.0 |
|    std                | 0.974     |
|    value_loss         | 1.08e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 220       |
|    iterations         | 1900      |
|    time_elapsed       | 86        |
|    total_timesteps    | 19000     |
| train/                |           |
|    entropy_loss       | -9.73     |
|    explained_variance | 1.79e-07  |
|    learning_rate      | 0.0004    |
|    n_updates          | 1899      |
|    policy_loss        | 1.55e+08  |
|    reward             | 2885395.5 |
|    std                | 0.972     |
|    value_loss         | 2.98e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 221       |
|    iterations         | 2000      |
|    time_elapsed       | 90        |
|    total_timesteps    | 20000     |
| train/                |           |
|    entropy_loss       | -9.72     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 1999      |
|    policy_loss        | 2.15e+08  |
|    reward             | 3976518.0 |
|    std                | 0.971     |
|    value_loss         | 5.31e+14  |
-------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4046834.5908127716
Sharpe:  0.6919991450241213
=================================
-------------------------------------
| time/                 |           |
|    fps                | 221       |
|    iterations         | 2100      |
|    time_elapsed       | 94        |
|    total_timesteps    | 21000     |
| train/                |           |
|    entropy_loss       | -9.71     |
|    explained_variance | 1.19e-07  |
|    learning_rate      | 0.0004    |
|    n_updates          | 2099      |
|    policy_loss        | 8.41e+07  |
|    reward             | 1543899.6 |
|    std                | 0.968     |
|    value_loss         | 8.56e+13  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 222       |
|    iterations         | 2200      |
|    time_elapsed       | 98        |
|    total_timesteps    | 22000     |
| train/                |           |
|    entropy_loss       | -9.71     |
|    explained_variance | 1.19e-07  |
|    learning_rate      | 0.0004    |
|    n_updates          | 2199      |
|    policy_loss        | 1.46e+08  |
|    reward             | 2804042.8 |
|    std                | 0.968     |
|    value_loss         | 3.04e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 223       |
|    iterations         | 2300      |
|    time_elapsed       | 102       |
|    total_timesteps    | 23000     |
| train/                |           |
|    entropy_loss       | -9.7      |
|    explained_variance | 5.96e-08  |
|    learning_rate      | 0.0004    |
|    n_updates          | 2299      |
|    policy_loss        | 1.99e+08  |
|    reward             | 4275337.5 |
|    std                | 0.967     |
|    value_loss         | 6.46e+14  |
-------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4052447.3183236937
Sharpe:  0.6928298846678986
=================================
-------------------------------------
| time/                 |           |
|    fps                | 223       |
|    iterations         | 2400      |
|    time_elapsed       | 107       |
|    total_timesteps    | 24000     |
| train/                |           |
|    entropy_loss       | -9.68     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0004    |
|    n_updates          | 2399      |
|    policy_loss        | 8.6e+07   |
|    reward             | 1638133.4 |
|    std                | 0.965     |
|    value_loss         | 1.05e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 224       |
|    iterations         | 2500      |
|    time_elapsed       | 111       |
|    total_timesteps    | 25000     |
| train/                |           |
|    entropy_loss       | -9.68     |
|    explained_variance | -2.38e-07 |
|    learning_rate      | 0.0004    |
|    n_updates          | 2499      |
|    policy_loss        | 1.35e+08  |
|    reward             | 2705907.2 |
|    std                | 0.965     |
|    value_loss         | 2.72e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 225       |
|    iterations         | 2600      |
|    time_elapsed       | 115       |
|    total_timesteps    | 26000     |
| train/                |           |
|    entropy_loss       | -9.67     |
|    explained_variance | 5.96e-08  |
|    learning_rate      | 0.0004    |
|    n_updates          | 2599      |
|    policy_loss        | 1.97e+08  |
|    reward             | 3837006.5 |
|    std                | 0.964     |
|    value_loss         | 4.66e+14  |
-------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4123691.04929031
Sharpe:  0.700619600708785
=================================
-------------------------------------
| time/                 |           |
|    fps                | 225       |
|    iterations         | 2700      |
|    time_elapsed       | 119       |
|    total_timesteps    | 27000     |
| train/                |           |
|    entropy_loss       | -9.68     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 2699      |
|    policy_loss        | 8.79e+07  |
|    reward             | 1748944.0 |
|    std                | 0.965     |
|    value_loss         | 1.14e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 226       |
|    iterations         | 2800      |
|    time_elapsed       | 123       |
|    total_timesteps    | 28000     |
| train/                |           |
|    entropy_loss       | -9.67     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 2799      |
|    policy_loss        | 1.77e+08  |
|    reward             | 2919523.0 |
|    std                | 0.964     |
|    value_loss         | 3.12e+14  |
-------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4164192.583783833
Sharpe:  0.7051087865111408
=================================
-------------------------------------
| time/                 |           |
|    fps                | 226       |
|    iterations         | 2900      |
|    time_elapsed       | 127       |
|    total_timesteps    | 29000     |
| train/                |           |
|    entropy_loss       | -9.65     |
|    explained_variance | 1.19e-07  |
|    learning_rate      | 0.0004    |
|    n_updates          | 2899      |
|    policy_loss        | 4.63e+07  |
|    reward             | 951793.56 |
|    std                | 0.961     |
|    value_loss         | 2.8e+13   |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 227       |
|    iterations         | 3000      |
|    time_elapsed       | 132       |
|    total_timesteps    | 30000     |
| train/                |           |
|    entropy_loss       | -9.65     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 2999      |
|    policy_loss        | 9.13e+07  |
|    reward             | 1990526.0 |
|    std                | 0.96      |
|    value_loss         | 1.45e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 227       |
|    iterations         | 3100      |
|    time_elapsed       | 136       |
|    total_timesteps    | 31000     |
| train/                |           |
|    entropy_loss       | -9.63     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 3099      |
|    policy_loss        | 1.58e+08  |
|    reward             | 3170679.8 |
|    std                | 0.958     |
|    value_loss         | 3.69e+14  |
-------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4067124.4699334074
Sharpe:  0.6948431965746262
=================================
-------------------------------------
| time/                 |           |
|    fps                | 227       |
|    iterations         | 3200      |
|    time_elapsed       | 140       |
|    total_timesteps    | 32000     |
| train/                |           |
|    entropy_loss       | -9.64     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 3199      |
|    policy_loss        | 6.1e+07   |
|    reward             | 1306404.8 |
|    std                | 0.959     |
|    value_loss         | 5.29e+13  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 228       |
|    iterations         | 3300      |
|    time_elapsed       | 144       |
|    total_timesteps    | 33000     |
| train/                |           |
|    entropy_loss       | -9.63     |
|    explained_variance | 1.19e-07  |
|    learning_rate      | 0.0004    |
|    n_updates          | 3299      |
|    policy_loss        | 9.71e+07  |
|    reward             | 2191711.2 |
|    std                | 0.958     |
|    value_loss         | 1.7e+14   |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 228       |
|    iterations         | 3400      |
|    time_elapsed       | 148       |
|    total_timesteps    | 34000     |
| train/                |           |
|    entropy_loss       | -9.63     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 3399      |
|    policy_loss        | 1.56e+08  |
|    reward             | 3346152.2 |
|    std                | 0.958     |
|    value_loss         | 4.1e+14   |
-------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4034374.463013146
Sharpe:  0.6906021236478135
=================================
-------------------------------------
| time/                 |           |
|    fps                | 228       |
|    iterations         | 3500      |
|    time_elapsed       | 153       |
|    total_timesteps    | 35000     |
| train/                |           |
|    entropy_loss       | -9.62     |
|    explained_variance | 5.96e-08  |
|    learning_rate      | 0.0004    |
|    n_updates          | 3499      |
|    policy_loss        | 6.01e+07  |
|    reward             | 1337613.8 |
|    std                | 0.957     |
|    value_loss         | 6.05e+13  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 229       |
|    iterations         | 3600      |
|    time_elapsed       | 157       |
|    total_timesteps    | 36000     |
| train/                |           |
|    entropy_loss       | -9.64     |
|    explained_variance | 1.79e-07  |
|    learning_rate      | 0.0004    |
|    n_updates          | 3599      |
|    policy_loss        | 1.4e+08   |
|    reward             | 2489865.5 |
|    std                | 0.96      |
|    value_loss         | 2.28e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 229       |
|    iterations         | 3700      |
|    time_elapsed       | 161       |
|    total_timesteps    | 37000     |
| train/                |           |
|    entropy_loss       | -9.64     |
|    explained_variance | 1.19e-07  |
|    learning_rate      | 0.0004    |
|    n_updates          | 3699      |
|    policy_loss        | 1.88e+08  |
|    reward             | 4019591.0 |
|    std                | 0.96      |
|    value_loss         | 5.58e+14  |
-------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4119809.4408771424
Sharpe:  0.7003255647826773
=================================
-------------------------------------
| time/                 |           |
|    fps                | 229       |
|    iterations         | 3800      |
|    time_elapsed       | 165       |
|    total_timesteps    | 38000     |
| train/                |           |
|    entropy_loss       | -9.65     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0004    |
|    n_updates          | 3799      |
|    policy_loss        | 6.5e+07   |
|    reward             | 1363994.8 |
|    std                | 0.96      |
|    value_loss         | 6.05e+13  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 229       |
|    iterations         | 3900      |
|    time_elapsed       | 169       |
|    total_timesteps    | 39000     |
| train/                |           |
|    entropy_loss       | -9.64     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 3899      |
|    policy_loss        | 1.15e+08  |
|    reward             | 2599753.0 |
|    std                | 0.959     |
|    value_loss         | 2.45e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 230       |
|    iterations         | 4000      |
|    time_elapsed       | 173       |
|    total_timesteps    | 40000     |
| train/                |           |
|    entropy_loss       | -9.62     |
|    explained_variance | 5.96e-08  |
|    learning_rate      | 0.0004    |
|    n_updates          | 3999      |
|    policy_loss        | 2.09e+08  |
|    reward             | 3773874.0 |
|    std                | 0.957     |
|    value_loss         | 5.35e+14  |
-------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4030151.780621599
Sharpe:  0.691531028386293
=================================
-------------------------------------
| time/                 |           |
|    fps                | 230       |
|    iterations         | 4100      |
|    time_elapsed       | 178       |
|    total_timesteps    | 41000     |
| train/                |           |
|    entropy_loss       | -9.62     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 4099      |
|    policy_loss        | 9.04e+07  |
|    reward             | 1612590.9 |
|    std                | 0.957     |
|    value_loss         | 8.73e+13  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 230       |
|    iterations         | 4200      |
|    time_elapsed       | 182       |
|    total_timesteps    | 42000     |
| train/                |           |
|    entropy_loss       | -9.62     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0004    |
|    n_updates          | 4199      |
|    policy_loss        | 1.48e+08  |
|    reward             | 2647467.8 |
|    std                | 0.958     |
|    value_loss         | 2.66e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 230       |
|    iterations         | 4300      |
|    time_elapsed       | 186       |
|    total_timesteps    | 43000     |
| train/                |           |
|    entropy_loss       | -9.62     |
|    explained_variance | 5.96e-08  |
|    learning_rate      | 0.0004    |
|    n_updates          | 4299      |
|    policy_loss        | 1.79e+08  |
|    reward             | 3713941.2 |
|    std                | 0.957     |
|    value_loss         | 5.12e+14  |
-------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4077198.0467443867
Sharpe:  0.695992993463236
=================================
-------------------------------------
| time/                 |           |
|    fps                | 230       |
|    iterations         | 4400      |
|    time_elapsed       | 190       |
|    total_timesteps    | 44000     |
| train/                |           |
|    entropy_loss       | -9.62     |
|    explained_variance | 0         |
|    learning_rate      | 0.0004    |
|    n_updates          | 4399      |
|    policy_loss        | 8.99e+07  |
|    reward             | 1753287.2 |
|    std                | 0.957     |
|    value_loss         | 1.12e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 231       |
|    iterations         | 4500      |
|    time_elapsed       | 194       |
|    total_timesteps    | 45000     |
| train/                |           |
|    entropy_loss       | -9.62     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0004    |
|    n_updates          | 4499      |
|    policy_loss        | 1.6e+08   |
|    reward             | 3013050.0 |
|    std                | 0.957     |
|    value_loss         | 3.14e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 231       |
|    iterations         | 4600      |
|    time_elapsed       | 198       |
|    total_timesteps    | 46000     |
| train/                |           |
|    entropy_loss       | -9.63     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0004    |
|    n_updates          | 4599      |
|    policy_loss        | 1.91e+08  |
|    reward             | 4056158.0 |
|    std                | 0.958     |
|    value_loss         | 6.01e+14  |
-------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4110861.0279944185
Sharpe:  0.6995703979641089
=================================
-------------------------------------
| time/                 |           |
|    fps                | 231       |
|    iterations         | 4700      |
|    time_elapsed       | 203       |
|    total_timesteps    | 47000     |
| train/                |           |
|    entropy_loss       | -9.63     |
|    explained_variance | -1.19e-07 |
|    learning_rate      | 0.0004    |
|    n_updates          | 4699      |
|    policy_loss        | 6.95e+07  |
|    reward             | 1597928.8 |
|    std                | 0.958     |
|    value_loss         | 7.49e+13  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 231       |
|    iterations         | 4800      |
|    time_elapsed       | 207       |
|    total_timesteps    | 48000     |
| train/                |           |
|    entropy_loss       | -9.63     |
|    explained_variance | 5.96e-08  |
|    learning_rate      | 0.0004    |
|    n_updates          | 4799      |
|    policy_loss        | 1.2e+08   |
|    reward             | 2745886.8 |
|    std                | 0.958     |
|    value_loss         | 2.58e+14  |
-------------------------------------
-------------------------------------
| time/                 |           |
|    fps                | 232       |
|    iterations         | 4900      |
|    time_elapsed       | 211       |
|    total_timesteps    | 49000     |
| train/                |           |
|    entropy_loss       | -9.62     |
|    explained_variance | 5.96e-08  |
|    learning_rate      | 0.0004    |
|    n_updates          | 4899      |
|    policy_loss        | 1.83e+08  |
|    reward             | 4007404.8 |
|    std                | 0.957     |
|    value_loss         | 5.87e+14  |
-------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4086023.462167963
Sharpe:  0.6967232525311317
=================================
-------------------------------------
| time/                 |           |
|    fps                | 231       |
|    iterations         | 5000      |
|    time_elapsed       | 215       |
|    total_timesteps    | 50000     |
| train/                |           |
|    entropy_loss       | -9.63     |
|    explained_variance | 1.19e-07  |
|    learning_rate      | 0.0004    |
|    n_updates          | 4999      |
|    policy_loss        | 9.05e+07  |
|    reward             | 1792232.6 |
|    std                | 0.958     |
|    value_loss         | 1.13e+14  |
-------------------------------------
CPU times: user 3min 29s, sys: 2.08 s, total: 3min 31s
Wall time: 3min 35s
3.3.3.2 PPO
agent = DRLAgent(env = env_train)
PPO_PARAMS = {
  "n_steps": 2048,
  "ent_coef": 0.005,
  "learning_rate": 0.001,
  "batch_size": 128,
}
model_ppo = agent.get_model("ppo",model_kwargs = PPO_PARAMS)
model_ppo
{'n_steps': 2048, 'ent_coef': 0.005, 'learning_rate': 0.001, 'batch_size': 128}
Using cuda device
<stable_baselines3.ppo.ppo.PPO at 0x7fc981276f10>
%%time
# train PPO agent
trained_ppo = agent.train_model(model=model_ppo, tb_log_name='ppo', total_timesteps=50_000)
Logging to tensorboard_log/ppo/ppo_1
----------------------------------
| time/              |           |
|    fps             | 307       |
|    iterations      | 1         |
|    time_elapsed    | 6         |
|    total_timesteps | 2048      |
| train/             |           |
|    reward          | 3199747.8 |
----------------------------------
=================================
begin_total_asset:1000000
end_total_asset:3913564.3893383564
Sharpe:  0.6788581038411543
=================================
------------------------------------------
| time/                   |              |
|    fps                  | 275          |
|    iterations           | 2            |
|    time_elapsed         | 14           |
|    total_timesteps      | 4096         |
| train/                  |              |
|    approx_kl            | 8.119969e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -9.93        |
|    explained_variance   | 5.96e-08     |
|    learning_rate        | 0.001        |
|    loss                 | 6.38e+14     |
|    n_updates            | 10           |
|    policy_gradient_loss | -1.26e-06    |
|    reward               | 2267978.8    |
|    std                  | 1            |
|    value_loss           | 1.28e+15     |
------------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:3994425.904324733
Sharpe:  0.6872894807654567
=================================
------------------------------------------
| time/                   |              |
|    fps                  | 267          |
|    iterations           | 3            |
|    time_elapsed         | 22           |
|    total_timesteps      | 6144         |
| train/                  |              |
|    approx_kl            | 8.032657e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -9.93        |
|    explained_variance   | 0            |
|    learning_rate        | 0.001        |
|    loss                 | 9.55e+14     |
|    n_updates            | 20           |
|    policy_gradient_loss | -1.22e-06    |
|    reward               | 1323024.1    |
|    std                  | 1            |
|    value_loss           | 1.95e+15     |
------------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 266          |
|    iterations           | 4            |
|    time_elapsed         | 30           |
|    total_timesteps      | 8192         |
| train/                  |              |
|    approx_kl            | 8.294592e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -9.93        |
|    explained_variance   | -1.19e-07    |
|    learning_rate        | 0.001        |
|    loss                 | 1.13e+15     |
|    n_updates            | 30           |
|    policy_gradient_loss | -1.11e-06    |
|    reward               | 4083331.2    |
|    std                  | 1            |
|    value_loss           | 2.51e+15     |
------------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4150878.4127395344
Sharpe:  0.7027972589326308
=================================
----------------------------------------
| time/                   |            |
|    fps                  | 262        |
|    iterations           | 5          |
|    time_elapsed         | 39         |
|    total_timesteps      | 10240      |
| train/                  |            |
|    approx_kl            | 7.5961e-09 |
|    clip_fraction        | 0          |
|    clip_range           | 0.2        |
|    entropy_loss         | -9.93      |
|    explained_variance   | 2.38e-07   |
|    learning_rate        | 0.001      |
|    loss                 | 9.87e+14   |
|    n_updates            | 40         |
|    policy_gradient_loss | -9.28e-07  |
|    reward               | 2970283.2  |
|    std                  | 1          |
|    value_loss           | 1.95e+15   |
----------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:3998877.2903906824
Sharpe:  0.6871327226982832
=================================
------------------------------------------
| time/                   |              |
|    fps                  | 261          |
|    iterations           | 6            |
|    time_elapsed         | 46           |
|    total_timesteps      | 12288        |
| train/                  |              |
|    approx_kl            | 7.508788e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -9.93        |
|    explained_variance   | 0            |
|    learning_rate        | 0.001        |
|    loss                 | 8.7e+14      |
|    n_updates            | 50           |
|    policy_gradient_loss | -9.23e-07    |
|    reward               | 1590401.4    |
|    std                  | 1            |
|    value_loss           | 1.84e+15     |
------------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 261          |
|    iterations           | 7            |
|    time_elapsed         | 54           |
|    total_timesteps      | 14336        |
| train/                  |              |
|    approx_kl            | 8.556526e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -9.93        |
|    explained_variance   | 0            |
|    learning_rate        | 0.001        |
|    loss                 | 1.26e+15     |
|    n_updates            | 60           |
|    policy_gradient_loss | -1.26e-06    |
|    reward               | 4469934.5    |
|    std                  | 1            |
|    value_loss           | 2.36e+15     |
------------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4096060.7173107606
Sharpe:  0.6982076816270193
=================================
------------------------------------------
| time/                   |              |
|    fps                  | 259          |
|    iterations           | 8            |
|    time_elapsed         | 63           |
|    total_timesteps      | 16384        |
| train/                  |              |
|    approx_kl            | 6.315531e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -9.93        |
|    explained_variance   | -1.19e-07    |
|    learning_rate        | 0.001        |
|    loss                 | 1.29e+15     |
|    n_updates            | 70           |
|    policy_gradient_loss | -9.07e-07    |
|    reward               | 2862059.0    |
|    std                  | 1            |
|    value_loss           | 2.49e+15     |
------------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4058297.672070343
Sharpe:  0.694059504812043
=================================
------------------------------------------
| time/                   |              |
|    fps                  | 258          |
|    iterations           | 9            |
|    time_elapsed         | 71           |
|    total_timesteps      | 18432        |
| train/                  |              |
|    approx_kl            | 8.789357e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -9.93        |
|    explained_variance   | 0            |
|    learning_rate        | 0.001        |
|    loss                 | 7.67e+14     |
|    n_updates            | 80           |
|    policy_gradient_loss | -1.49e-06    |
|    reward               | 2085537.0    |
|    std                  | 1            |
|    value_loss           | 1.35e+15     |
------------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4194078.3287386466
Sharpe:  0.7088772216329564
=================================
------------------------------------------
| time/                   |              |
|    fps                  | 258          |
|    iterations           | 10           |
|    time_elapsed         | 79           |
|    total_timesteps      | 20480        |
| train/                  |              |
|    approx_kl            | 7.683411e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -9.93        |
|    explained_variance   | 1.19e-07     |
|    learning_rate        | 0.001        |
|    loss                 | 1.05e+15     |
|    n_updates            | 90           |
|    policy_gradient_loss | -1.07e-06    |
|    reward               | 1272599.6    |
|    std                  | 1            |
|    value_loss           | 2.08e+15     |
------------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 258          |
|    iterations           | 11           |
|    time_elapsed         | 87           |
|    total_timesteps      | 22528        |
| train/                  |              |
|    approx_kl            | 7.945346e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -9.93        |
|    explained_variance   | 0            |
|    learning_rate        | 0.001        |
|    loss                 | 1.46e+15     |
|    n_updates            | 100          |
|    policy_gradient_loss | -1.14e-06    |
|    reward               | 4016451.2    |
|    std                  | 1            |
|    value_loss           | 2.78e+15     |
------------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4195896.423487038
Sharpe:  0.7074945647632417
=================================
------------------------------------------
| time/                   |              |
|    fps                  | 257          |
|    iterations           | 12           |
|    time_elapsed         | 95           |
|    total_timesteps      | 24576        |
| train/                  |              |
|    approx_kl            | 8.178176e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -9.93        |
|    explained_variance   | 0            |
|    learning_rate        | 0.001        |
|    loss                 | 8.19e+14     |
|    n_updates            | 110          |
|    policy_gradient_loss | -1.2e-06     |
|    reward               | 2698685.2    |
|    std                  | 1            |
|    value_loss           | 1.71e+15     |
------------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4200279.1240972085
Sharpe:  0.7078889896224082
=================================
-------------------------------------------
| time/                   |               |
|    fps                  | 256           |
|    iterations           | 13            |
|    time_elapsed         | 103           |
|    total_timesteps      | 26624         |
| train/                  |               |
|    approx_kl            | 6.9849193e-09 |
|    clip_fraction        | 0             |
|    clip_range           | 0.2           |
|    entropy_loss         | -9.93         |
|    explained_variance   | -2.38e-07     |
|    learning_rate        | 0.001         |
|    loss                 | 9.95e+14      |
|    n_updates            | 120           |
|    policy_gradient_loss | -9.4e-07      |
|    reward               | 1757732.2     |
|    std                  | 1             |
|    value_loss           | 1.96e+15      |
-------------------------------------------
-------------------------------------------
| time/                   |               |
|    fps                  | 257           |
|    iterations           | 14            |
|    time_elapsed         | 111           |
|    total_timesteps      | 28672         |
| train/                  |               |
|    approx_kl            | 6.9849193e-09 |
|    clip_fraction        | 0             |
|    clip_range           | 0.2           |
|    entropy_loss         | -9.93         |
|    explained_variance   | 1.19e-07      |
|    learning_rate        | 0.001         |
|    loss                 | 1.29e+15      |
|    n_updates            | 130           |
|    policy_gradient_loss | -9.14e-07     |
|    reward               | 3963198.2     |
|    std                  | 1             |
|    value_loss           | 2.63e+15      |
-------------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4117226.909721386
Sharpe:  0.7009549483123549
=================================
-------------------------------------------
| time/                   |               |
|    fps                  | 256           |
|    iterations           | 15            |
|    time_elapsed         | 119           |
|    total_timesteps      | 30720         |
| train/                  |               |
|    approx_kl            | 8.8475645e-09 |
|    clip_fraction        | 0             |
|    clip_range           | 0.2           |
|    entropy_loss         | -9.93         |
|    explained_variance   | 1.19e-07      |
|    learning_rate        | 0.001         |
|    loss                 | 1.14e+15      |
|    n_updates            | 140           |
|    policy_gradient_loss | -1.46e-06     |
|    reward               | 2481026.2     |
|    std                  | 1             |
|    value_loss           | 2.14e+15      |
-------------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:3976828.667046468
Sharpe:  0.686149853760772
=================================
------------------------------------------
| time/                   |              |
|    fps                  | 255          |
|    iterations           | 16           |
|    time_elapsed         | 128          |
|    total_timesteps      | 32768        |
| train/                  |              |
|    approx_kl            | 9.313226e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -9.93        |
|    explained_variance   | -1.19e-07    |
|    learning_rate        | 0.001        |
|    loss                 | 7.51e+14     |
|    n_updates            | 150          |
|    policy_gradient_loss | -1.65e-06    |
|    reward               | 1788779.9    |
|    std                  | 1            |
|    value_loss           | 1.54e+15     |
------------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4011862.64593621
Sharpe:  0.6882706958170082
=================================
-----------------------------------------
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 17          |
|    time_elapsed         | 136         |
|    total_timesteps      | 34816       |
| train/                  |             |
|    approx_kl            | 8.96398e-09 |
|    clip_fraction        | 0           |
|    clip_range           | 0.2         |
|    entropy_loss         | -9.93       |
|    explained_variance   | 0           |
|    learning_rate        | 0.001       |
|    loss                 | 1.03e+15    |
|    n_updates            | 160         |
|    policy_gradient_loss | -1.76e-06   |
|    reward               | 1022607.75  |
|    std                  | 1           |
|    value_loss           | 2.15e+15    |
-----------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 255          |
|    iterations           | 18           |
|    time_elapsed         | 144          |
|    total_timesteps      | 36864        |
| train/                  |              |
|    approx_kl            | 6.722985e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -9.93        |
|    explained_variance   | -1.19e-07    |
|    learning_rate        | 0.001        |
|    loss                 | 1.33e+15     |
|    n_updates            | 170          |
|    policy_gradient_loss | -6.24e-07    |
|    reward               | 3342887.0    |
|    std                  | 1            |
|    value_loss           | 2.65e+15     |
------------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:3874303.678110981
Sharpe:  0.6749724874730472
=================================
-----------------------------------------
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 19          |
|    time_elapsed         | 152         |
|    total_timesteps      | 38912       |
| train/                  |             |
|    approx_kl            | 7.82893e-09 |
|    clip_fraction        | 0           |
|    clip_range           | 0.2         |
|    entropy_loss         | -9.93       |
|    explained_variance   | 0           |
|    learning_rate        | 0.001       |
|    loss                 | 6.35e+14    |
|    n_updates            | 180         |
|    policy_gradient_loss | -1.17e-06   |
|    reward               | 2591912.2   |
|    std                  | 1           |
|    value_loss           | 1.39e+15    |
-----------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4075396.0292179375
Sharpe:  0.6960579699082773
=================================
------------------------------------------
| time/                   |              |
|    fps                  | 255          |
|    iterations           | 20           |
|    time_elapsed         | 160          |
|    total_timesteps      | 40960        |
| train/                  |              |
|    approx_kl            | 7.945346e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -9.93        |
|    explained_variance   | 1.19e-07     |
|    learning_rate        | 0.001        |
|    loss                 | 9.7e+14      |
|    n_updates            | 190          |
|    policy_gradient_loss | -8.17e-07    |
|    reward               | 1483324.1    |
|    std                  | 1            |
|    value_loss           | 1.91e+15     |
------------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 255          |
|    iterations           | 21           |
|    time_elapsed         | 168          |
|    total_timesteps      | 43008        |
| train/                  |              |
|    approx_kl            | 9.546056e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -9.93        |
|    explained_variance   | 0            |
|    learning_rate        | 0.001        |
|    loss                 | 1.38e+15     |
|    n_updates            | 200          |
|    policy_gradient_loss | -1.91e-06    |
|    reward               | 3484793.0    |
|    std                  | 1            |
|    value_loss           | 2.54e+15     |
------------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4115279.627057315
Sharpe:  0.7000977033778167
=================================
-------------------------------------------
| time/                   |               |
|    fps                  | 254           |
|    iterations           | 22            |
|    time_elapsed         | 176           |
|    total_timesteps      | 45056         |
| train/                  |               |
|    approx_kl            | 5.6170393e-09 |
|    clip_fraction        | 0             |
|    clip_range           | 0.2           |
|    entropy_loss         | -9.93         |
|    explained_variance   | 0             |
|    learning_rate        | 0.001         |
|    loss                 | 1.01e+15      |
|    n_updates            | 210           |
|    policy_gradient_loss | -9.59e-07     |
|    reward               | 3015818.0     |
|    std                  | 1             |
|    value_loss           | 2.04e+15      |
-------------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:4155706.236713753
Sharpe:  0.703959200241537
=================================
------------------------------------------
| time/                   |              |
|    fps                  | 254          |
|    iterations           | 23           |
|    time_elapsed         | 185          |
|    total_timesteps      | 47104        |
| train/                  |              |
|    approx_kl            | 9.022187e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -9.93        |
|    explained_variance   | 0            |
|    learning_rate        | 0.001        |
|    loss                 | 8.5e+14      |
|    n_updates            | 220          |
|    policy_gradient_loss | -1.94e-06    |
|    reward               | 1757937.2    |
|    std                  | 1            |
|    value_loss           | 1.72e+15     |
------------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 254          |
|    iterations           | 24           |
|    time_elapsed         | 193          |
|    total_timesteps      | 49152        |
| train/                  |              |
|    approx_kl            | 7.887138e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -9.93        |
|    explained_variance   | 0            |
|    learning_rate        | 0.001        |
|    loss                 | 1.13e+15     |
|    n_updates            | 230          |
|    policy_gradient_loss | -1.5e-06     |
|    reward               | 3765411.8    |
|    std                  | 1            |
|    value_loss           | 2.44e+15     |
------------------------------------------
=================================
begin_total_asset:1000000
end_total_asset:3983449.2318723113
Sharpe:  0.686623843915132
=================================
------------------------------------------
| time/                   |              |
|    fps                  | 254          |
|    iterations           | 25           |
|    time_elapsed         | 201          |
|    total_timesteps      | 51200        |
| train/                  |              |
|    approx_kl            | 6.548362e-09 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -9.93        |
|    explained_variance   | 5.96e-08     |
|    learning_rate        | 0.001        |
|    loss                 | 1.28e+15     |
|    n_updates            | 240          |
|    policy_gradient_loss | -7.5e-07     |
|    reward               | 3082500.2    |
|    std                  | 1            |
|    value_loss           | 2.53e+15     |
------------------------------------------
CPU times: user 3min 21s, sys: 1.52 s, total: 3min 23s
Wall time: 3min 22s

3.4 Evaluation

We now use the most recent data to evaluate the performance of the two models of the agent. This is also referred to as back-testing or simply trading. This data has never been seen by the training process. The start date of this data is captured in the parameter TRADE_START.

TRADE_START
'2020-07-01'
# hide
env_kwargs
{'action_space': 7,
 'hmax': 100,
 'initial_amount': 1000000,
 'reward_scaling': 0.1,
 'state_space': 7,
 'stock_dim': 7,
 'tech_indicator_list': ['macd',
  'boll_ub',
  'boll_lb',
  'rsi_30',
  'cci_30',
  'dx_30',
  'close_30_sma',
  'close_60_sma'],
 'transaction_cost_pct': 0}
trade = data_split(df, TRADE_START, DATA_END)
e_trade_gym = Portfolio(df=trade, **env_kwargs)
e_trade_gym
<__main__.Portfolio at 0x7fc9816700d0>
baseline_df = get_baseline(
        ticker="^DJI", 
        start=TRADE_START,
        end=DATA_END)
baseline_df_stats = backtest_stats(baseline_df, value_col_name = 'close')
baseline_returns = get_daily_return(baseline_df, value_col_name="close")
dji_cumpod =(baseline_returns + 1).cumprod() - 1
[*********************100%***********************]  1 of 1 completed
Shape of DataFrame:  (295, 8)
Annual return          0.311845
Cumulative returns     0.374034
Annual volatility      0.140762
Sharpe ratio           2.006165
Calmar ratio           3.491806
Stability              0.950106
Max drawdown          -0.089308
Omega ratio            1.397014
Sortino ratio          2.988706
Skew                        NaN
Kurtosis                    NaN
Tail ratio             1.094883
Daily value at risk   -0.016614
dtype: float64
df_daily_return_a2c, df_actions_a2c = DRLAgent.DRL_prediction(model=trained_a2c, environment = e_trade_gym)
df_daily_return_ppo, df_actions_ppo = DRLAgent.DRL_prediction(model=trained_ppo, environment = e_trade_gym)
time_ind = pd.Series(df_daily_return_a2c.date)
a2c_cumpod =(df_daily_return_a2c.daily_return + 1).cumprod() - 1
ppo_cumpod =(df_daily_return_ppo.daily_return + 1).cumprod() - 1
DRL_strat_a2c = convert_daily_return_to_pyfolio_ts(df_daily_return_a2c)
DRL_strat_ppo = convert_daily_return_to_pyfolio_ts(df_daily_return_ppo)

perf_func = timeseries.perf_stats
perf_stats_all_a2c = perf_func(returns=DRL_strat_a2c, factor_returns=DRL_strat_a2c, positions=None, transactions=None, turnover_denom="AGB")
perf_stats_all_ppo = perf_func(returns=DRL_strat_ppo, factor_returns=DRL_strat_ppo, positions=None, transactions=None, turnover_denom="AGB")
=================================
begin_total_asset:1000000
end_total_asset:1524583.336446113
Sharpe:  2.369076062885316
=================================
hit end!
=================================
begin_total_asset:1000000
end_total_asset:1507488.2263027485
Sharpe:  2.2708997438017855
=================================
hit end!
# hide
len(df_actions_a2c.columns)
7

3.4.1 Inspect actions

For the sake of interest, we inspect some the actions taken by the A2C agent.

# 
# Inspect the actions taken by the A2C agent (for interest sake)
# A2C actions
df_actions_a2c
FBCVX FCPGX FDCAX FDGFX FDSCX FDSVX FSMVX
date
2020-07-01 0.142857 0.142857 0.142857 0.142857 0.142857 0.142857 0.142857
2020-07-02 0.197429 0.103983 0.103983 0.103983 0.103983 0.103983 0.282655
2020-07-06 0.242351 0.089156 0.089156 0.242351 0.158675 0.089156 0.089156
2020-07-07 0.250948 0.230227 0.092319 0.149551 0.092319 0.092319 0.092319
2020-07-08 0.112299 0.214671 0.078973 0.214671 0.214671 0.078973 0.085741
... ... ... ... ... ... ... ...
2021-08-25 0.257359 0.106575 0.094677 0.094677 0.094677 0.094677 0.257359
2021-08-26 0.270076 0.151744 0.099356 0.099356 0.099356 0.099356 0.180758
2021-08-27 0.222533 0.081865 0.081865 0.222533 0.081865 0.086806 0.222533
2021-08-30 0.124118 0.186260 0.085769 0.233143 0.124187 0.085769 0.160754
2021-08-31 0.223638 0.223638 0.082272 0.082272 0.082272 0.082272 0.223638

295 rows × 7 columns

Here is a visualization of the A2C agent’s actions specifically on two of the funds:

# hide
my_tickers
['FBCVX',
 'FSMVX',
 'FDGFX',
 'FULVX',
 'FDSCX',
 'FDCAX',
 'FDSVX',
 'FCPGX',
 'FIFNX']
fig = go.Figure()
fig.update_layout(width=900, height=600)
fig.add_trace(go.Scatter(x=time_ind, y=df_actions_a2c['FBCVX'], mode='lines', name='FBCVX A2C'))
fig.add_trace(go.Scatter(x=time_ind, y=-df_actions_a2c['FSMVX'], mode='lines', name='FSMVX A2C'))
fig.update_layout(
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        font=dict(
            family="sans-serif",
            size=20,
            color="black"
        ),
        bgcolor="White",
        bordercolor="white",
        borderwidth=2))
fig.update_layout(title={
        'text': "Actions of A2C & PPO",
        'y': 0.87,
        'x': 0.48,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.update_layout(
    paper_bgcolor='rgba(1, 1, 0, 0)',
    plot_bgcolor='rgba(1, 1, 0, 0)',
    xaxis_title="Date",
    yaxis = dict(titlefont=dict(size=26), title="Daily Actions"),
    font=dict(size=15))
# fig.update_layout(font_size = 20)
fig.update_traces(line=dict(width=2))
fig.update_xaxes(showline=True, linecolor='black', showgrid=False, gridwidth=1, gridcolor='Black', mirror=True)
fig.update_yaxes(showline=True,linecolor='black', showgrid=False, gridwidth=1, gridcolor='Black', mirror=True)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='Grey')
fig.show()

3.4.2 Inspect daily return

Here is a visualization of the daily return of the portfolio:

fig = go.Figure()
fig.update_layout(width=900, height=600)
fig.add_trace(go.Scatter(x=time_ind, y=DRL_strat_a2c, mode='lines', name='A2C'))
fig.add_trace(go.Scatter(x=time_ind, y=DRL_strat_ppo, mode='lines', name='PPO'))
fig.update_layout(
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        font=dict(
            family="sans-serif",
            size=20,
            color="black"
        ),
        bgcolor="White",
        bordercolor="white",
        borderwidth=2))
fig.update_layout(title={
        'text': "Daily Return of A2C & PPO",
        'y': 0.87,
        'x': 0.48,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.update_layout(
    paper_bgcolor='rgba(1, 1, 0, 0)',
    plot_bgcolor='rgba(1, 1, 0, 0)',
    xaxis_title="Date",
    yaxis = dict(titlefont=dict(size=26), title="Daily Return"),
    font=dict(size=15))
# fig.update_layout(font_size = 20)
fig.update_traces(line=dict(width=2))
fig.update_xaxes(showline=True, linecolor='black', showgrid=False, gridwidth=1, gridcolor='Black', mirror=True)
fig.update_yaxes(showline=True,linecolor='black', showgrid=False, gridwidth=1, gridcolor='Black', mirror=True)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='Grey')
fig.show()

3.4.3 Inspect cumulative return

Finally, we inspect the cumulative return of the portfolio brought about by each of the agents. The DJIA index is used as a baseline for reference. Both agents end up with a larger cumulative return than the DJIA.

fig = go.Figure()
fig.update_layout(width=900, height=900)
fig.add_trace(go.Scatter(x=time_ind, y=dji_cumpod, mode='lines', name='DJIA', line=dict(color="#a9a9a9")))
fig.add_trace(go.Scatter(x=time_ind, y=a2c_cumpod, mode='lines', name='A2C'))
fig.add_trace(go.Scatter(x=time_ind, y=ppo_cumpod, mode='lines', name='PPO'))
fig.update_layout(
    legend=dict(
        x=0,
        y=1,
        traceorder="normal",
        font=dict(
            family="sans-serif",
            size=16,
            color="black"
        ),
        bgcolor="White",
        bordercolor="white",
        borderwidth=2))
fig.update_layout(title={
        'text': "Cumulative Return of A2C & PPO against DJIA",
        'y': 0.92,
        'x': 0.48,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.update_layout(
    paper_bgcolor='rgba(1, 1, 0, 0)',
    plot_bgcolor='rgba(1, 1, 0, 0)',
    xaxis_title="Date",
    yaxis = dict(titlefont=dict(size=26), title="Cumulative Return"),
    font=dict(size=15))
# fig.update_layout(font_size = 20)
fig.update_traces(line=dict(width=2))
fig.update_xaxes(showline=True, linecolor='black', showgrid=False, gridwidth=1, gridcolor='Black', mirror=True)
fig.update_yaxes(showline=True,linecolor='black', showgrid=False, gridwidth=1, gridcolor='Black', mirror=True)
fig.update_yaxes(zeroline=True, zerolinewidth=1, zerolinecolor='Grey')
fig.show()

Disclaimer

This content is only meant for research purposes and is not meant to be used in any form of trading. Past performance is no guarantee of future results. If you suffer losses from making use of this content, directly or indirectly, you are the sole person responsible for the losses. The author will not be held responsible in any way.