Asset Selling

Using the Powell Unified Framework & Sequential Decision Analytics to find the optimal time to sell an asset

Investment Industry
Inventory Management
Powell Unified Framework
Reinforcement Learning
Python
Author

Kobus Esterhuysen

Published

July 11, 2023

0 INTRODUCTION

The overall structure of this project and report follows the traditional CRISP-DM format. However, instead of the CRISP-DM’S “4 Modeling” section, we inserted the “6 step modeling process” of Dr. Warren Powell in section 4 of this document. Dr Powell’s universal framework shows great promise for unifying the formalisms of at least a dozen different fields. Using his framework enables easier access to thinking patterns in these other fields that might be beneficial and informative to the sequential decision problem at hand. Traditionally, this kind of problem would be approached from the reinforcement learning perspective. However, using Dr. Powell’s wider and more comprehensive perspective almost certainly provides additional value.

Here is information on Dr. Powell’s perspective on Sequential Decision Analytics.

In order to make a strong mapping between the code in this notebook and the mathematics in the Powell Universal Framework (PUF), we follow the following convention for naming Python identifier names:

  • How to read/say
    • var name & flavor first
    • at t/n
    • for entity OR of/with attribute
    • \(\hat{R}^{fail}_{t+1,a}\) has code Rhat__fail_tt1_a which is read: “Rhatfail at t+1 of/with (attribute) a”
  • Superscripts
    • variable names have a double underscore to indicate a superscript
    • \(X^{\pi}\): has code X__pi, is read X pi
    • when there is a ‘natural’ distinction between the variable symbol and the superscript (e.g. a change in case), the double underscore is sometimes omitted: Xpi instead of X__pi, or MSpend_t instead of M__Spend_t
  • Subscripts
    • variable names have a single underscore to indicate a subscript
    • \(S_t\): has code S_t, is read ‘S at t’
    • \(M^{Spend}_t\) has code M__Spend_t which is read: “MSpend at t”
    • \(\hat{R}^{fail}_{t+1,a}\) has code Rhat__fail_tt1_a which is read: “Rhatfail at t+1 of/with (attribute) a” [RLSO-p436]
  • Arguments
    • collection variable names may have argument information added
    • \(X^{\pi}(S_t)\): has code X__piIS_tI, is read ‘X pi in S at t’
    • the surrounding I’s are used to imitate the parentheses around the argument
  • Next time/iteration
    • variable names that indicate one step in the future are quite common
    • \(R_{t+1}\): has code R_tt1, is read ‘R at t+1’
    • \(R^{n+1}\): has code R__nt1, is read ‘R at n+1’
  • Rewards
    • State-independent terminal reward and cumulative reward
      • \(F\): has code F for terminal reward
      • \(\sum_{n}F\): has code cumF for cumulative reward
    • State-dependent terminal reward and cumulative reward
      • \(C\): has code C for terminal reward
      • \(\sum_{t}C\): has code cumC for cumulative reward
  • Vectors where components use different names
    • \(S_t(R_t, p_t)\): has code S_t.R_t and S_t.p_t, is read ‘S at t in R at t, and, S at t in p at t’
    • the code implementation is by means of a named tuple
      • self.State = namedtuple('State', SVarNames) for the ‘class’ of the vector
      • self.S_t for the ‘instance’ of the vector
  • Vectors where components reuse names
    • \(x_t(x_{t,GB}, x_{t,BL})\): has code x_t.x_t_GB and x_t.x_t_BL, is read ‘x at t in x at t for GB, and, x at t in x at t for BL’
    • the code implementation is by means of a named tuple
      • self.Decision = namedtuple('Decision', xVarNames) for the ‘class’ of the vector
      • self.x_t for the ‘instance’ of the vector
  • Use of mixed-case variable names
    • to reduce confusion, sometimes the use of mixed-case variable names are preferred (even though it is not a best practice in the Python community), reserving the use of underscores and double underscores for math-related variables

1 BUSINESS UNDERSTANDING

The problem in this project was chosen as a starting example for a client need relating to making optimal investment decisions regarding a given portfolio. Although the present example only deals with the decision to either hold or to sell an asset, it is developed to provide the basis for expansion towards this need. It comes from the free book by Dr. Powell, Sequential Decision Analytics and Modeling. However, the code has been modified quite substantially.

The original code for this example can be found here.

2 DATA UNDERSTANDING

# import pdb
# pdb.set_trace()
from collections import namedtuple, defaultdict
import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter
import matplotlib as mpl
from copy import copy
import math
pd.options.display.float_format = '{:,.4f}'.format
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)
! python --version
Python 3.10.12

The parameters of the system-under-steer (SUS) are:

sheet1 = pd.read_excel(f'{base_dir}/asset_selling_policy_parameters.xlsx', sheet_name="Sheet1")
sheet1
Unnamed: 0 param1 param2
0 sell_low 2 NaN
1 high_low 4 10.0000
2 track 0 4.0000
params = zip(sheet1['param1'], sheet1['param2'])
param_list = list(params); print(f'{param_list=}')
param_list=[(2, nan), (4, 10.0), (0, 4.0)]
LOWER_LIMIT_SellLow = param_list[0][0]; print(f'{LOWER_LIMIT_SellLow=}')
LOWER_LIMIT_HighLow = param_list[1][0]; print(f'{LOWER_LIMIT_HighLow=}')
UPPER_LIMIT_HighLow = param_list[1][1]; print(f'{UPPER_LIMIT_HighLow=}')
TRACK_SIGNAL_Track = param_list[2][0]; print(f'{TRACK_SIGNAL_Track=}')
ALPHA_Track = param_list[2][1]; print(f'{ALPHA_Track=}')
LOWER_LIMIT_SellLow=2
LOWER_LIMIT_HighLow=4
UPPER_LIMIT_HighLow=10.0
TRACK_SIGNAL_Track=0
ALPHA_Track=4.0
sheet2 = pd.read_excel(f"{base_dir}/asset_selling_policy_parameters.xlsx", sheet_name="Sheet2")
sheet2
low_min low_max high_min high_max increment_size
0 0 0 0.0100 5 0.0050
sheet3 = pd.read_excel(f"{base_dir}/asset_selling_policy_parameters.xlsx", sheet_name="Sheet3")
sheet3
Policy TimeHorizon DiscountFactor InitialPrice InitialBias UpStep DownStep Variance Iterations PrintStep
0 track 40 0.9900 16 Up 1 -1 2 10000 40
biasdf = pd.read_excel(f"{base_dir}/asset_selling_policy_parameters.xlsx", sheet_name="Sheet4")
biasdf
Unnamed: 0 Up Neutral Down
0 Up 0.9000 0.1000 0.0000
1 Neutral 0.2000 0.6000 0.2000
2 Down 0.0000 0.1000 0.9000
W_biasdf_cum = pd.concat([
  biasdf[['Unnamed: 0']],
  biasdf[['Up','Neutral','Down']].cumsum(axis=1)], axis=1) #.
W_biasdf_cum = W_biasdf_cum.set_index(['Unnamed: 0'])
W_biasdf_cum
Up Neutral Down
Unnamed: 0
Up 0.9000 1.0000 1.0000
Neutral 0.2000 0.8000 1.0000
Down 0.0000 0.1000 1.0000
INIT_PRICE = sheet3['InitialPrice'][0]; INIT_PRICE
16
INIT_BIAS = sheet3['InitialBias'][0]; INIT_BIAS
'Up'
W_UpStep = sheet3['UpStep'][0]
W_DownStep = sheet3['DownStep'][0]
W_Variance = sheet3['Variance'][0]
W_UpStep, W_DownStep, W_Variance
(1, -1, 2)
L = sheet3['Iterations'][0]; L
10000
SEED_TRAIN = 77777777
SEED_EVALU = 88888888
piNAMES = ['X__SellLow', 'X__HighLow', 'X__Track'] #policy names
SNAMES = [ #state variable names
    'R_t',   #resource
    'p_t',   #price
    'pbar_t', #smoothed price
]
xNAMES = ['x_t'] #decision variable names
class PriceSimulator():
  def __init__(self,
    biasCdfs=W_biasdf_cum,
    upStep=W_UpStep,
    downStep=W_DownStep,
    variance=W_Variance,
    seed=None):

    self.biasCdfs = biasCdfs
    self.upStep = upStep
    self.downStep = downStep
    self.variance = variance
    self.prng = np.random.RandomState(seed)
    self.bias = 'Neutral'

  def simulate(self):
    #assume the change in price is normal with mean bias and variance 2
    b_t = self.prng.choice(['Down', 'Neutral', 'Up'])
    # b_t = self.bias #
    biasCdf = self.biasCdfs.loc[[b_t]]
    coin = self.prng.random_sample()
    if (coin < float(biasCdf['Up'])):
      b_tt1 = 'Up' #new bias
      b_tt1_val = self.upStep #bias
    elif (coin >= float(biasCdf['Up']) and coin < float(biasCdf['Neutral'])): #.
      b_tt1 = 'Neutral' #new bias
      b_tt1_val = 0 #bias
    else:
      b_tt1 = 'Down' #new bias
      b_tt1_val = self.downStep #bias
    self.bias = b_tt1
    # p_tt1 = p_t + self.prng.normal(b_tt1_val, self.variance) #price
    phat_tt1 = self.prng.normal(b_tt1_val, self.variance) #change in price
    W_tt1 = {
        # "p_t": p_tt1,
        "p_t": phat_tt1,
        "b_t": b_tt1, #just for display
        "b_t_val": b_tt1_val #just for display
    }
    return W_tt1

  def plot_output(self, df1):
    n_charts = 3
    ylabelsize = 16
    mpl.rcParams['lines.linewidth'] = 1.2
    default_colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
    fig, axs = plt.subplots(n_charts, sharex=True)
    fig.set_figwidth(13); fig.set_figheight(9)
    fig.suptitle('Price Simulation', fontsize=20)

    xi = 0
    axs[xi].set_title(f'')
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df1['p_t'], random.choice(default_colors), where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    y1ab = '$p_{t}$'
    axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)

    xi = 1
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df1['b_t_val'], random.choice(default_colors), where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    y1ab = '$b_{t,val}$'
    axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)

    xi = 2
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df1['b_t'], random.choice(default_colors), where='post')
    y1ab = '$b_{t}$'
    axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)

    axs[xi].set_xlabel('$t$', rotation=0, ha='center', va='center', fontweight='bold', size=ylabelsize)
W_biasdf_cum
Up Neutral Down
Unnamed: 0
Up 0.9000 1.0000 1.0000
Neutral 0.2000 0.8000 1.0000
Down 0.0000 0.1000 1.0000
# EXAMPLE OF CUSTOMIZED SIMULATOR
# x = ['Up', 'Neutral', 'Down']
# biasCdfs = pd.DataFrame(
#   [[.9, 1., 1.], #'Up' cdf
#    [.2, .8, 1.], #'Neutral' cdf
#    [0., .1, 1.]], #'Down' cdf
#   index=x,
#   columns=x,
# )
# price_sim = PriceSimulator(
#   # T__sim=T__sim,
#   biasCdfs=biasCdfs,
#   upStep=1,
#   downStep=-1,
#   variance=W_Variance
# )

price_sim = PriceSimulator(seed=SEED_TRAIN) # use pars from spreadsheet
T__sim = 20
PriceData = []
for i in range(T__sim):
  entry = list(price_sim.simulate().values())
  PriceData.append(entry)
labels = ['p_t', 'b_t', 'b_t_val']
df = pd.DataFrame.from_records(data=PriceData, columns=labels); df[:10]
p_t b_t b_t_val
0 0.3365 Up 1
1 3.1084 Neutral 0
2 2.0958 Neutral 0
3 1.1369 Neutral 0
4 0.7226 Up 1
5 -1.1433 Up 1
6 3.9397 Neutral 0
7 -0.4858 Down -1
8 2.1811 Down -1
9 0.4218 Down -1
price_sim.plot_output(df)

del price_sim

3 DATA PREPARATION

We will use the data provided by the simulator directly. There is no need to perform additional data preparation.

4 MODELING

4.1 Narrative

We are holding an asset and we are looking for the best time to sell it. For simplicity of thought we will consider this asset to be a single share in a company. In later POCs we will allow for multiple shares in multiple companies with the ability to buy/hold/sell a decided number of shares at each step. We assume that our decisions in the current project do not affect the price of the share. The price varies according to a stochastic process (as coded in the above price simulator).

4.2 Core Elements

This section attempts to answer three important questions: - What metrics are we going to track? - What decisions do we intend to make? - What are the sources of uncertainty?

For this problem, the only metric we are interested in is the amount of profit we make after each decision window. A single type of decision needs to be made at the start of each window - whether we want to hold on to the asset, or to sell it at the current price. The only source of uncertainty is the price of the asset (share).

4.3 Mathematical Model | SUS Design

A Python class is used to implement the model for the SUS (System-Under-Steer):

class Model():
  def __init__(self, S_0_info):
    ...
    ...

4.3.1 State variables

The state variables represent what we need to know.

  • \(R_t\)
    • the number of shares at time \(t\)
    • will be either 0 or 1 in this project
    • measured in units
  • \(p_t\)
    • price of the share at time \(t\)
  • \(\bar{p}_{t}\)
    • smoothed estimate of the price of the share at time \(t\)
    • smoothing happens according to \[\bar{p}_t = (1 - \alpha) \bar{p}_{t-1} + \alpha p_t\]

The state is:

\(S_t = (R_t, p_t, \bar{p}_{t})\)

The state variables are represented by the following variables in the AssetSellingModel class:

self.State = namedtuple('State', SNAMES) # 'class'
self.S_t = self.build_state(info) # 'instance'

where

SNAMES = [ #state variable names
    'R_t',   #resource
    'p_t',   #price
    'pbar_t', #smoothed price
]

4.3.2 Decision variables

The decision variables represent what we control.

  • \(x_t\)
    • the number of shares held or sold
    • \(x_t=0\) for hold
    • \(x_t=-1\) for sell
    • \(x_t \in \{0,-1\}\)
  • Constraints
    • we can only sell if we are holding the share:
      • \(x_t \le R_t\)
  • Decisions are made with a policy (TBD below):
    • \(X^{\pi}(S_t)\)

The decision variables are represented by the following variables in the AssetSellingModel class:

self.Decision = namedtuple('Decision', xNAMES) # 'class'

where

xNAMES = ['x_t']

4.3.3 Exogenous information variables

The exogenous information variables represent what we did not know (when we made a decision). These are the variables that we cannot control directly. The information in these variables become available after we make the decision \(x_t\).

When we assume that the price in each time period is revealed, without any model to predict the price based on past prices, we have, using approach 1:

\[ p_{t+1} = W_{t+1} \]

Alternatively, when we assume that we observe the change in price \(\hat{p}_{t+1}=p_{t+1}-p_{t}\), we have, using approach 2:

\[ \begin{aligned} p_{t+1} &= p_t + W_{t+1} \\ &= p_t + \hat{p}_{t+1} \end{aligned} \]

We will make use of approach 2 which means that the exogenous information, \(W_{t+1}\), is the observed change in price of the share.

The exogenous information is obtained by a call to

SIM = PriceSimulator.simulate(...)

where SIM is a global variable.

The latest exogenous information can be accessed by calling the following method from class AssetSellingModel():

def W_fn(self, t):
    W_tt1 = SIM.simulate()
    return W_ttl

4.3.4 Transition function

The transition function describes how the state variables evolve over time. Because we currently have three state variables in the state, \(S_t=(R_t,p_t,\bar{p}_t)\), we have the equations:

\[ \begin{aligned} R_{t+1} &= R_t + x_t \\ p_{t+1} &= p_t + \hat{p}_{t+1} \\ \bar{p}_t &= (1 - \alpha) \bar{p}_{t-1} + \alpha p_t \end{aligned} \]

Collectively, they represent the general transition function:

\[ S_{t+1} = S^M(S_t,X^{\pi}(S_t)) \]

4.3.5 Objective function

The objective function captures the performance metrics of the solution to the problem. It is given by:

\[ \max_{\pi} \mathbb{E} \sum_{t=0}^T C(S_t,x_t) \]

where \[ C(S_t,x_t) = -p_tx_t \]

4.3.6 Implementation of SUM Model

class AssetSellingModel():
    def __init__(self, S_0_info):
      self.S_0_info = S_0_info
      self.State = namedtuple('State', SNAMES) #. 'class'
      self.S_t = self.build_state(S_0_info) #. 'instance'
      self.Decision = namedtuple('Decision', xNAMES) #. 'class'
      self.Ccum = 0.0 #. cumulative reward

    def build_state(self, info):
      return self.State(*[info[sn] for sn in SNAMES])

    def build_decision(self, info):
        return self.Decision(*[info[xn] for xn in xNAMES])

    # this function gives the exogenous information that is dependent on a
    # random process (in the case of the asset selling model, it is the
    # change in price)
    def W_fn(self):
        W_tt1 = SIM.simulate()
        return W_tt1

    def S__M_fn(self, S_t, x_t, W_tt1, theta, piName):
        # R_t
        R_tt1 = max(0, S_t.R_t + x_t.x_t)

        # p_t
        p_t = S_t.p_t
        p_tt1 = max(0, p_t + W_tt1['p_t']) #W_tt1['p_t'] has CHANGE in price

        # pbar_t
        if piName=='X__Track':
          theta__alpha = theta[0]
          pbar_t_1 = S_t.pbar_t
          pbar_t = (1 - theta__alpha)*pbar_t_1 + theta__alpha*p_t #SDAM-eq2.15, RLSO-12.2.4
        else:
          pbar_t = 0

        S_tt1 = self.build_state({
            'R_t': R_tt1,
            'p_t': p_tt1,
            'pbar_t': pbar_t,
        })
        return S_tt1

    def C_fn(self, S_t, x_t, W_tt1):
        C_t = -S_t.p_t*x_t.x_t if S_t.R_t > 0 else 0 #x_t = -1 for sell
        return C_t

    def step(self, x_t, theta, piName):
        W_tt1 = self.W_fn()
        C = self.C_fn(self.S_t, x_t, W_tt1)
        self.Ccum += C
        self.S_t = self.S__M_fn(self.S_t, x_t, W_tt1, theta, piName)
        return (self.S_t, self.Ccum, x_t, W_tt1['b_t_val']) #. for plotting

4.4 Uncertainty Model

We will simulate the share price \(p_{t+1} = p_t + \hat{p}_{t+1} = p_t + W_{t+1}\) as described in section 2.

4.5 Policy Design

There are two main meta-classes of policy design. Each of these has two subclasses: - Policy Search - Policy Function Approximations (PFAs) - Cost Function Approximations (CFAs) - Lookahead - Value Function Approximations (VFAs) - Direct Lookaheads (DLAs)

In this project we will make use of 3 policies, all from the PFA class:

  • X__HighLow
  • X__SellLow
  • X__Track

where

\[ X^{HighLow}(S_t|\theta^{HighLow}) = \begin{cases} -1 & \text{if } p_t < \theta^{low} \text{ or } p_t > \theta^{high} \\ -1 & \text{if } t = T \text{ and } R_t = 1 \\ 0 & \text{otherwise } \end{cases} \]

\[ X^{SellLow}(S_t|\theta^{SellLow}) = \begin{cases} -1 & \text{if } p_t < \theta^{low} \\ -1 & \text{if } t = T \text{ and } R_t = 1 \\ 0 & \text{otherwise } \end{cases} \]

\[ X^{Track}(S_t|\theta^{Track}) = \begin{cases} -1 & \text{if } p_t \ge \bar{p_t} + \theta^{track} \\ -1 & \text{if } t = T \text{ and } R_t = 1 \\ 0 & \text{otherwise } \end{cases} \]

4.5.1 Implementation of Policy Design

class AssetSellingPolicy():
    def __init__(self, model):
        self.model = model
        self.Policy = namedtuple('Policy', piNAMES) #. 'class'

    def build_policy(self, info):
        return self.Policy(*[info[pn] for pn in piNAMES])

    ######################################################################
    ############################## POLICIES ##############################
    ######################################################################
    def X__HighLow(self, t, S_t, theta):
        lower_limit = theta[0]
        upper_limit = theta[1]
        x_t_info = \
          {'x_t': -1} \
            if (S_t.p_t < lower_limit) or (S_t.p_t > upper_limit) else \
          {'x_t': 0} #hold
        return self.model.build_decision(x_t_info)

    def X__SellLow(self, t, S_t, theta):
        lower_limit = theta[0]
        # upper_limit = theta[1]
        x_t_info = \
          {'x_t': -1} if S_t.p_t < lower_limit else \
          {'x_t': 0}
        return self.model.build_decision(x_t_info)

    def X__Track(self, t, S_t, theta):
        if t == 0:
            return self.model.build_decision({'x_t': 0}) #hold
        else:
            # theta__alpha = theta[0] #not used here
            theta__track = theta[1]
            x_t_info = \
              {'x_t': -1} \
              if S_t.p_t >= S_t.pbar_t + theta__track \
              else {'x_t': 0} #hold
            return self.model.build_decision(x_t_info)

    # def X__Random(self, t, S_t, theta):
    #     # sell_time = theta[0]
    #     sell_time = np.random.choice(np.arange(T))
    #     x_t_info = \
    #       {'x_t': -1} if t > sell_time else \
    #       {'x_t': 0}
    #     return self.model.build_decision(x_t_info)

    ######################################################################
    ############################## TRAIN #################################
    ######################################################################
    def grid_search_thetas_2(self, thetas1, thetas2):
        thetas = [(th1, th2) for th1 in thetas1 for th2 in thetas2]
        return thetas

    def grid_search_thetas_1(self, thetas1):
        thetas = [(th1,) for th1 in thetas1]
        return thetas

    def run_policy_sample_paths2(self, theta, piName, record):
        CcumIomega__lI = []
        for l in list(range(L)): #for each sample-path
            # print(f'%%% {l=}')
            model_copy = copy(self.model)
            record_l = [piName, theta, l]
            for t in range(T): #for each transition/step
                # print(f'\t%%% {t=}')
                x_t = getattr(self, piName)(t, model_copy.S_t, theta)

                if (t == T - 1):
                    x_t = model_copy.build_decision({'x_t': -1}) #sell
                    # print(f'%%% TRAINING TERMINAL SELL decision at {t=}')

                S_t, Ccum, x_t, b_t_val = model_copy.step(x_t, theta, piName)

                record_t = [t] + \
                  [S_t.R_t] + [S_t.p_t] + [S_t.pbar_t] + \
                  [Ccum] + \
                  [x_t.x_t] + \
                  [b_t_val] #rather than b_t which is text and not ordered
                record.append(record_l + record_t)
            CcumIomega__lI.append(model_copy.Ccum)
        return CcumIomega__lI

    def perform_grid_search_sample_paths2(self, piName, thetas):
        Cbarcum = defaultdict(float)
        Ctilcum = defaultdict(float)
        expCbarcum = defaultdict(float)
        expCtilcum = defaultdict(float)
        num_thetas = len(thetas)
        record = []
        print(f'{num_thetas=}'); print(f'{thetas=}')
        i = 0; print(f'... printing every 10th theta if considered ...')
        for theta in thetas:
            # print(f'\n=== {theta=} ===')
            if i%10 == 0: print(f'=== ({i:,} / {num_thetas:,}), {theta=} ===')

            CcumIomega__lI = self.run_policy_sample_paths2(theta, piName, record)

            Cbarcum_tmp = np.array(CcumIomega__lI).mean()
            Ctilcum_tmp = np.sum(np.square(np.array(CcumIomega__lI) - Cbarcum_tmp))/(L - 1)
            Cbarcum[theta] = Cbarcum_tmp
            Ctilcum[theta] = np.sqrt(Ctilcum_tmp/L)
            best_theta = max(Cbarcum, key=Cbarcum.get)
            worst_theta = min(Cbarcum, key=Cbarcum.get)

            expCbarcum_tmp = pd.Series(CcumIomega__lI).expanding().mean()
            expCbarcum[theta] = expCbarcum_tmp

            expCtilcum_tmp = pd.Series(CcumIomega__lI).expanding().std()
            expCtilcum[theta] = expCtilcum_tmp
            i += 1
        best_Cbarcum = Cbarcum[best_theta]
        best_Ctilcum = Ctilcum[best_theta]
        print(f'{best_theta=}, {best_Cbarcum=:.2f}, {best_Ctilcum=:.2f}')

        worst_Cbarcum = Cbarcum[worst_theta]
        worst_Ctilcum = Ctilcum[worst_theta]
        print(f'{worst_theta=}, {worst_Cbarcum=:.2f}, {worst_Ctilcum=:.2f}')

        thetaStar_expCbarcum = expCbarcum[best_theta]
        thetaStar_expCtilcum = expCtilcum[best_theta]
        thetaStar_expCtilcum[0] = 0 #set NaN to 0
        return \
          thetaStar_expCbarcum, thetaStar_expCtilcum, \
          Cbarcum, Ctilcum, \
          best_theta, worst_theta, \
          best_Cbarcum, worst_Cbarcum, \
          best_Ctilcum, worst_Ctilcum, \
          record

    def plot_Fhat_map(self, FhatI_theta_I, thetasX, thetasY, labelX, labelY, title):
        Fhat_values = [FhatI_theta_I[(thetaX,thetaY)] for thetaY in thetasY for thetaX in thetasX]
        Fhats = np.array(Fhat_values)
        increment_count = len(thetasX)
        Fhats = np.reshape(Fhats, (-1, increment_count))#.

        fig, ax = plt.subplots()
        im = ax.imshow(Fhats, cmap='hot', origin='lower', aspect='auto')
        # create colorbar
        cbar = ax.figure.colorbar(im, ax=ax)
        # cbar.ax.set_ylabel(cbarlabel, rotation=-90, va="bottom")

        # ax.set_xticks(np.arange(0, len(thetasX), 5))#.
        ax.set_xticks(np.arange(len(thetasX)))

        # ax.set_yticks(np.arange(0, len(thetasY), 5))#.
        ax.set_yticks(np.arange(len(thetasY)))

        # NOTE: round tick labels, else very messy
        # function round() does not work, have to do this way
        thetasX_form = [f'{th:.1f}' for th in thetasX]
        thetasY_form = [f'{th:.1f}' for th in thetasY]

        # ax.set_xticklabels(thetasX[::5])#.
        # ax.set_xticklabels(thetasX)
        ax.set_xticklabels(thetasX_form)

        # ax.set_yticklabels(thetasY[::5])#.
        # ax.set_yticklabels(thetasY)
        ax.set_yticklabels(thetasY_form)

        # rotate the tick labels and set their alignment.
        #plt.setp(ax.get_xticklabels(), rotation=45, ha="right",rotation_mode="anchor")

        ax.set_title(title)
        ax.set_xlabel(labelX)
        ax.set_ylabel(labelY)

        #fig.tight_layout()
        plt.show()
        return True

    # color_style examples: 'r-', 'b:', 'g--'
    def plot_Fhat_chart(self, FhatI_theta_I, thetasX, labelX, labelY, title, color_style, thetaStar):
        mpl.rcParams['lines.linewidth'] = 1.2
        xylabelsize = 16
        # plt.figure(figsize=(13, 9))
        plt.figure(figsize=(13, 4))
        plt.title(title, fontsize=20)
        Fhats = FhatI_theta_I.values()
        plt.plot(thetasX, Fhats, color_style)
        plt.axvline(x=thetaStar, color='k', linestyle=':')
        plt.xlabel(labelX, rotation=0, labelpad=10, ha='right', va='center', fontweight='bold', size=xylabelsize)
        plt.ylabel(labelY, rotation=0, labelpad=1, ha='right', va='center', fontweight='normal', size=xylabelsize)
        plt.show()

    # expanding Fhat chart
    def plot_expFhat_chart(self, df, labelX, labelY, title, color_style):
      mpl.rcParams['lines.linewidth'] = 1.2
      xylabelsize = 16
      plt.figure(figsize=(13, 4))
      plt.title(title, fontsize=20)
      plt.plot(df, color_style)
      plt.xlabel(labelX, rotation=0, labelpad=10, ha='right', va='center', fontweight='bold', size=xylabelsize)
      plt.ylabel(labelY, rotation=0, labelpad=1, ha='right', va='center', fontweight='normal', size=xylabelsize)
      plt.show()

    # expanding Fhat charts
    def plot_expFhat_charts(self, means, stdvs, labelX, labelY, suptitle, pars=defaultdict(str)):
      n_charts = 2
      xlabelsize = 14
      ylabelsize = 14
      mpl.rcParams['lines.linewidth'] = 1.2
      default_colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
      fig, axs = plt.subplots(n_charts, sharex=True)
      fig.set_figwidth(13); fig.set_figheight(9)
      fig.suptitle(suptitle, fontsize=18)

      xi = 0
      legendlabels = []
      axs[xi].set_title(r"$exp\bar{C}^{cum}(\theta^*)$", loc='right', fontsize=16)
      for i,itm in enumerate(means.items()):
        axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
        leg = axs[xi].plot(itm[1], color=pars['colors'][i]); #print(f"{pars['colors'][i]}")
        legendlabels.append(itm[0])
      axs[xi].set_ylabel(labelY, rotation=0, ha='right', va='center', fontweight='normal', size=ylabelsize)

      xi = 1
      axs[xi].set_title(r"$exp\tilde{C}^{cum}(\theta^*)$", loc='right', fontsize=16)
      for i,itm in enumerate(stdvs.items()):
        axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
        # leg = axs[xi].plot(itm[1], default_colors[i], linestyle='--')
        leg = axs[xi].plot(itm[1], pars['colors'][i], linestyle='--')
      axs[xi].set_ylabel(labelY, rotation=0, ha='right', va='center', fontweight='normal', size=ylabelsize)

      fig.legend(
            # [leg],
            labels=legendlabels,
            title="Policies",
            loc='upper right',
            fancybox=True,
            shadow=True,
            ncol=1)
      plt.xlabel(labelX, rotation=0, labelpad=10, ha='right', va='center', fontweight='normal', size=xlabelsize)
      plt.show()

    # FUTURE: def plot_record(df_non, df, pars=defaultdict(str)):
    # FUTURE: def plot_train_evalu(df_non, df, pars=defaultdict(str)):
    def plot_train(self, df, df_non, pars=defaultdict(str)):
      # legendlabels = []
      n_charts = 5
      ylabelsize = 14
      mpl.rcParams['lines.linewidth'] = 1.2
      fig, axs = plt.subplots(n_charts, sharex=True)
      fig.set_figwidth(13); fig.set_figheight(9)
      thetaStarStr = []
      for cmp in pars["thetaStar"]: thetaStarStr.append(f'{cmp:.1f}')
      thetaStarStr = '(' + ', '.join(thetaStarStr) + ')'
      fig.suptitle(pars['suptitle'], fontsize=14)

      xi = 0
      axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
      axs[xi].step(df[f'x_t'], 'm-', where='post')
      if not df_non is None: axs[xi].step(df_non[f'x_t'], 'c-.', where='post')
      axs[xi].axhline(y=0, color='k', linestyle=':')
      y1ab = '$x_{t}$'
      axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      for j in range(df.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

      xi = 1
      axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
      axs[xi].step(df[f'R_t'], 'm-', where='post')
      if not df_non is None: axs[xi].step(df_non[f'R_t'], 'c-.', where='post')
      axs[xi].axhline(y=0, color='k', linestyle=':')
      y1ab = '$R_{t}$'
      axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      for j in range(df.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

      xi = 2
      axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
      axs[xi].step(df[f'p_t'], 'g', where='post')
      if pars["policy"]=='X__Track': axs[xi].step(df[f'pbar_t'], 'g:')
      axs[xi].axhline(y=0, color='k', linestyle=':')

      if(pars['lower_non']): axs[xi].text(-4, pars['lower_non'], r'$\theta^{lower}$' + f"={pars['lower_non']:.1f}", size=10, color='c')
      if(pars['lower_non']): axs[xi].axhline(y=pars['lower_non'], color='c', linestyle=':')

      if(pars['upper_non']): axs[xi].text(-4, pars['upper_non'], r'$\theta^{upper}$' + f"={pars['upper_non']:.1f}", size=10, color='c')
      if(pars['upper_non']): axs[xi].axhline(y=pars['upper_non'], color='c', linestyle=':')

      if(pars['lower']): axs[xi].text(-4, pars['lower'], r'$\theta^{lower}$' + f"={pars['lower']:.1f}", size=10, color='m')
      if(pars['lower']): axs[xi].axhline(y=pars['lower'], color='m', linestyle=':')

      if(pars['upper']): axs[xi].text(-4, pars['upper'], r'$\theta^{upper}$' + f"={pars['upper']:.1f}", size=10, color='m')
      if(pars['upper']): axs[xi].axhline(y=pars['upper'], color='m', linestyle=':')

      if(pars['alpha_non']): axs[xi].text(-4, pars['alpha_non'], r'$\theta^{alpha}$' + f"={pars['alpha_non']:.1f}", size=10, color='c')
      if(pars['alpha_non']): axs[xi].axhline(y=pars['alpha_non'], color='c', linestyle=':')

      if(pars['trackSignal_non']): axs[xi].text(-4, pars['trackSignal_non'], r'$\theta^{trackSignal}$' + f"={pars['trackSignal_non']:.1f}", size=10, color='c')
      if(pars['trackSignal_non']): axs[xi].axhline(y=pars['trackSignal_non'], color='c', linestyle=':')

      if(pars['alpha']): axs[xi].text(-4, pars['alpha'], r'$\theta^{\alpha}$' + f"={pars['alpha']:.1f}", size=10, color='m')
      if(pars['alpha']): axs[xi].axhline(y=pars['alpha'], color='m', linestyle=':')

      if(pars['trackSignal']): axs[xi].text(-4, pars['trackSignal'], r'$\theta^{trackSignal}$' + f"={pars['trackSignal']:.1f}", size=10, color='m')
      if(pars['trackSignal']): axs[xi].axhline(y=pars['trackSignal'], color='m', linestyle=':')

      y1ab_p = '$p_{t}$'
      y1ab_p_pbar = '$p_{t}$'+'\n'+'$\overline{p}_{t}$'
      y1ab = y1ab_p_pbar if pars['policy']=='X__Track' else y1ab_p
      axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      for j in range(df.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

      xi = 3
      axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
      axs[xi].step(df['b_t_val'], 'b', where='post')
      axs[xi].axhline(y=0, color='k', linestyle=':')
      y1ab = '$b_{t,val}$'
      axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      for j in range(df.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

      xi = 4
      axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
      axs[xi].step(df['Ccum'], 'm-', where='post')
      if not df_non is None: axs[xi].step(df_non['Ccum'], 'c-.', where='post')
      axs[xi].axhline(y=0, color='k', linestyle=':')
      axs[xi].set_ylabel('$C^{cum}$'+'\n'+'$\mathrm{(Profit)}$'+'\n'+''+'$\mathrm{[\$]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      axs[xi].set_xlabel('$t\ \mathrm{[days]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      for j in range(df.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')
      # fig.legend(labels=legendlabels, loc='lower left', fontsize=16)


    ######################################################################
    ############################## EVALU #################################
    ######################################################################
    def plot_experiment(self):
      df_L = pd.DataFrame({
        'best_theta': [7.00, 9.90, 11.60, 9.70, 7.50, 11.70, 10.20, 11.70, 7.60],
        'best_Cbarcum': [35.83, 24.23, 22.02, 19.30, 17.98, 17.97, 18.19, 16.92, 16.55],
        'best_Ctilcum': [15.93, 5.46, 2.49, 2.16, 1.43, 0.82, 0.61, 0.37, 0.29],
        'worst_theta': [7.80, 7.30, 10.60, 8.70, 12.20, 9.00, 8.00, 11.80, 10.50],
        'worst_Cbarcum': [5.70, 7.00, 9.80, 10.83, 13.15, 13.95, 14.38, 15.35, 15.24],
        'worst_Ctilcum': [1.19, 0.90, 0.69, 1.58, 0.76, 0.75, 0.61, 0.32, 0.24],
      })
      df_L
      fig, ax = plt.subplots()
      fig.set_figwidth(8); fig.set_figheight(6)
      ax.plot(df_L['best_Cbarcum'], label='best_Cbarcum') #
      ax.plot(df_L['best_Ctilcum'], label='best_Ctilcum') #
      ax.plot(df_L['worst_Cbarcum'], label='worst_Cbarcum') #
      ax.set_title('Increasing L from 2 to 1000\n T=20 \n thetasLo = np.arange(7.0, 14.0, 0.10) \n X__SellLow')
      ax.set_xlabel('L')
      ax.set_xticklabels(['blank', '2', '5', '10', '20', '50', '100', '200', '500', '1000'])
      fig.legend(loc='upper left')

    def plot_evalu_comparison(self, df1, df2, df3, pars=defaultdict(str)):
      legendlabels = ['X__HighLow', 'X__SellLow', 'X__Track']
      n_charts = 5
      ylabelsize = 14
      mpl.rcParams['lines.linewidth'] = 1.2
      fig, axs = plt.subplots(n_charts, sharex=True)
      fig.set_figwidth(13); fig.set_figheight(9)
      thetaStarStr = []
      for cmp in pars["thetaStar"]: thetaStarStr.append(f'{cmp:.1f}')
      thetaStarStr = '(' + ', '.join(thetaStarStr) + ')'
      fig.suptitle(pars['suptitle'], fontsize=14)

      xi = 0
      axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
      axs[xi].step(df1[f'x_t'], 'r-', where='post')
      axs[xi].step(df2[f'x_t'], 'g-.', where='post')
      axs[xi].step(df3[f'x_t'], 'b:', where='post')
      axs[xi].axhline(y=0, color='k', linestyle=':')
      y1ab = '$x_{t}$'
      axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

      xi = 1
      axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
      axs[xi].step(df1[f'R_t'], 'r-', where='post')
      axs[xi].step(df2[f'R_t'], 'g-.', where='post')
      axs[xi].step(df3[f'R_t'], 'b:', where='post')
      axs[xi].axhline(y=0, color='k', linestyle=':')
      y1ab = '$R_{t}$'
      axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

      xi = 2
      axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
      axs[xi].step(df1[f'p_t'], 'r-', where='post')
      axs[xi].step(df2[f'p_t'], 'g-.', where='post')
      axs[xi].step(df3[f'p_t'], 'b:', where='post')
      axs[xi].axhline(y=0, color='k', linestyle=':')

      if(pars['lower_non']): axs[xi].text(-4, pars['lower_non'], r'$\theta^{lower}$' + f"={pars['lower_non']:.1f}", size=10, color='c')
      if(pars['lower_non']): axs[xi].axhline(y=pars['lower_non'], color='c', linestyle=':')

      if(pars['upper_non']): axs[xi].text(-4, pars['upper_non'], r'$\theta^{upper}$' + f"={pars['upper_non']:.1f}", size=10, color='c')
      if(pars['upper_non']): axs[xi].axhline(y=pars['upper_non'], color='c', linestyle=':')

      if(pars['lower']): axs[xi].text(-4, pars['lower'], r'$\theta^{lower}$' + f"={pars['lower']:.1f}", size=10, color='m')
      if(pars['lower']): axs[xi].axhline(y=pars['lower'], color='m', linestyle=':')

      if(pars['upper']): axs[xi].text(-4, pars['upper'], r'$\theta^{upper}$' + f"={pars['upper']:.1f}", size=10, color='m')
      if(pars['upper']): axs[xi].axhline(y=pars['upper'], color='m', linestyle=':')

      if(pars['alpha_non']): axs[xi].text(-4, pars['alpha_non'], r'$\theta^{alpha}$' + f"={pars['alpha_non']:.1f}", size=10, color='c')
      if(pars['alpha_non']): axs[xi].axhline(y=pars['alpha_non'], color='c', linestyle=':')

      if(pars['trackSignal_non']): axs[xi].text(-4, pars['trackSignal_non'], r'$\theta^{trackSignal}$' + f"={pars['trackSignal_non']:.1f}", size=10, color='c')
      if(pars['trackSignal_non']): axs[xi].axhline(y=pars['trackSignal_non'], color='c', linestyle=':')

      if(pars['alpha']): axs[xi].text(-4, pars['alpha'], r'$\theta^{alpha}$' + f"={pars['alpha']:.1f}", size=10, color='m')
      if(pars['alpha']): axs[xi].axhline(y=pars['alpha'], color='m', linestyle=':')

      if(pars['trackSignal']): axs[xi].text(-4, pars['trackSignal'], r'$\theta^{trackSignal}$' + f"={pars['trackSignal']:.1f}", size=10, color='m')
      if(pars['trackSignal']): axs[xi].axhline(y=pars['trackSignal'], color='m', linestyle=':')

      y1ab = '$p_{t}$'
      axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

      xi = 3
      axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
      axs[xi].step(df1['b_t_val'], 'r-', where='post')
      axs[xi].step(df2['b_t_val'], 'g-.', where='post')
      axs[xi].step(df3['b_t_val'], 'b:', where='post')
      axs[xi].axhline(y=0, color='k', linestyle=':')
      y1ab = '$b_{t,val}$'
      axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

      xi = 4
      axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
      axs[xi].step(df1['Ccum'], 'r-', where='post')
      axs[xi].step(df2['Ccum'], 'g-.', where='post')
      axs[xi].step(df3['Ccum'], 'b:', where='post')
      axs[xi].axhline(y=0, color='k', linestyle=':')
      axs[xi].set_ylabel('$\mathrm{cumC}$'+'\n'+'$\mathrm{(Profit)}$'+'\n'+''+'$\mathrm{[\$]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      axs[xi].set_xlabel('$t\ \mathrm{[decision\ windows]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

      fig.legend(
            # [leg],
            labels=legendlabels,
            # labels=labels,
            title="Policies",
            loc='upper right',
            fontsize=16,
            fancybox=True,
            shadow=True,
            ncol=1)

4.6 Policy Evaluation

N_SAMPLEPATHS = 1_000
N_TRANSITIONS = 40

4.6.1 Training/Tuning

# setup labels to plot info
R_t_labels = ['R_t']
p_t_labels = ['p_t']
pbar_t_labels = ['pbar_t']
b_t_labels = ['b_t']
x_t_labels = ['x_t']
b_t_val_labels = ['b_t_val']
labels = ['piName', 'theta', 'l'] + \
  ['t'] + R_t_labels + p_t_labels + pbar_t_labels + ['Ccum'] + x_t_labels + b_t_val_labels
labels
['piName', 'theta', 'l', 't', 'R_t', 'p_t', 'pbar_t', 'Ccum', 'x_t', 'b_t_val']
4.6.1.1 X__HighLow
%%time
L = N_SAMPLEPATHS
T = N_TRANSITIONS
S_0_info = {'R_t': 1, 'p_t': INIT_PRICE, 'pbar_t': INIT_PRICE, 'b_t': INIT_BIAS}
first_n_t = 160
last_n_t = 160

M = AssetSellingModel(
    S_0_info,
)
P = AssetSellingPolicy(M)
SIM = PriceSimulator(seed=SEED_TRAIN)

thetasLo = np.arange(14.0, 15.0, 0.10)
thetasHi = np.arange(31.0, 32.0, 0.10)
thetas = P.grid_search_thetas_2(thetasLo, thetasHi)

thetaStar_expCbarcum_HighLow, thetaStar_expCtilcum_HighLow, \
Cbarcum_HighLow, Ctilcum_HighLow, \
best_theta_HighLow, worst_theta_HighLow, \
best_Cbarcum_HighLow, worst_Cbarcum_HighLow, \
best_Ctilcum_HighLow, worst_Ctilcum_HighLow, \
record_HighLow = \
  P.perform_grid_search_sample_paths2("X__HighLow", thetas)
f'{thetaStar_expCbarcum_HighLow.iloc[-1]=:.2f}'
df_first_n = pd.DataFrame.from_records(record_HighLow[:first_n_t], columns=labels)
df_last_n = pd.DataFrame.from_records(record_HighLow[-last_n_t:], columns=labels)
num_thetas=100
thetas=[(14.0, 31.0), (14.0, 31.1), (14.0, 31.200000000000003), (14.0, 31.300000000000004), (14.0, 31.400000000000006), (14.0, 31.500000000000007), (14.0, 31.60000000000001), (14.0, 31.70000000000001), (14.0, 31.80000000000001), (14.0, 31.900000000000013), (14.1, 31.0), (14.1, 31.1), (14.1, 31.200000000000003), (14.1, 31.300000000000004), (14.1, 31.400000000000006), (14.1, 31.500000000000007), (14.1, 31.60000000000001), (14.1, 31.70000000000001), (14.1, 31.80000000000001), (14.1, 31.900000000000013), (14.2, 31.0), (14.2, 31.1), (14.2, 31.200000000000003), (14.2, 31.300000000000004), (14.2, 31.400000000000006), (14.2, 31.500000000000007), (14.2, 31.60000000000001), (14.2, 31.70000000000001), (14.2, 31.80000000000001), (14.2, 31.900000000000013), (14.299999999999999, 31.0), (14.299999999999999, 31.1), (14.299999999999999, 31.200000000000003), (14.299999999999999, 31.300000000000004), (14.299999999999999, 31.400000000000006), (14.299999999999999, 31.500000000000007), (14.299999999999999, 31.60000000000001), (14.299999999999999, 31.70000000000001), (14.299999999999999, 31.80000000000001), (14.299999999999999, 31.900000000000013), (14.399999999999999, 31.0), (14.399999999999999, 31.1), (14.399999999999999, 31.200000000000003), (14.399999999999999, 31.300000000000004), (14.399999999999999, 31.400000000000006), (14.399999999999999, 31.500000000000007), (14.399999999999999, 31.60000000000001), (14.399999999999999, 31.70000000000001), (14.399999999999999, 31.80000000000001), (14.399999999999999, 31.900000000000013), (14.499999999999998, 31.0), (14.499999999999998, 31.1), (14.499999999999998, 31.200000000000003), (14.499999999999998, 31.300000000000004), (14.499999999999998, 31.400000000000006), (14.499999999999998, 31.500000000000007), (14.499999999999998, 31.60000000000001), (14.499999999999998, 31.70000000000001), (14.499999999999998, 31.80000000000001), (14.499999999999998, 31.900000000000013), (14.599999999999998, 31.0), (14.599999999999998, 31.1), (14.599999999999998, 31.200000000000003), (14.599999999999998, 31.300000000000004), (14.599999999999998, 31.400000000000006), (14.599999999999998, 31.500000000000007), (14.599999999999998, 31.60000000000001), (14.599999999999998, 31.70000000000001), (14.599999999999998, 31.80000000000001), (14.599999999999998, 31.900000000000013), (14.699999999999998, 31.0), (14.699999999999998, 31.1), (14.699999999999998, 31.200000000000003), (14.699999999999998, 31.300000000000004), (14.699999999999998, 31.400000000000006), (14.699999999999998, 31.500000000000007), (14.699999999999998, 31.60000000000001), (14.699999999999998, 31.70000000000001), (14.699999999999998, 31.80000000000001), (14.699999999999998, 31.900000000000013), (14.799999999999997, 31.0), (14.799999999999997, 31.1), (14.799999999999997, 31.200000000000003), (14.799999999999997, 31.300000000000004), (14.799999999999997, 31.400000000000006), (14.799999999999997, 31.500000000000007), (14.799999999999997, 31.60000000000001), (14.799999999999997, 31.70000000000001), (14.799999999999997, 31.80000000000001), (14.799999999999997, 31.900000000000013), (14.899999999999997, 31.0), (14.899999999999997, 31.1), (14.899999999999997, 31.200000000000003), (14.899999999999997, 31.300000000000004), (14.899999999999997, 31.400000000000006), (14.899999999999997, 31.500000000000007), (14.899999999999997, 31.60000000000001), (14.899999999999997, 31.70000000000001), (14.899999999999997, 31.80000000000001), (14.899999999999997, 31.900000000000013)]
... printing every 10th theta if considered ...
=== (0 / 100), theta=(14.0, 31.0) ===
=== (10 / 100), theta=(14.1, 31.0) ===
=== (20 / 100), theta=(14.2, 31.0) ===
=== (30 / 100), theta=(14.299999999999999, 31.0) ===
=== (40 / 100), theta=(14.399999999999999, 31.0) ===
=== (50 / 100), theta=(14.499999999999998, 31.0) ===
=== (60 / 100), theta=(14.599999999999998, 31.0) ===
=== (70 / 100), theta=(14.699999999999998, 31.0) ===
=== (80 / 100), theta=(14.799999999999997, 31.0) ===
=== (90 / 100), theta=(14.899999999999997, 31.0) ===
best_theta=(14.0, 31.70000000000001), best_Cbarcum=16.45, best_Ctilcum=0.24
worst_theta=(14.699999999999998, 31.200000000000003), worst_Cbarcum=15.48, worst_Ctilcum=0.18
CPU times: user 50min 46s, sys: 1min 40s, total: 52min 27s
Wall time: 52min 10s
P.plot_Fhat_map(
  FhatI_theta_I=Cbarcum_HighLow,
  thetasX=thetasLo,
  thetasY=thetasHi,
  labelX=r'$\theta^{lower}$',
  labelY=r'$\theta^{upper}$',
  title="Sample mean of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$"+f"({best_theta_HighLow[0]:.2f}, {best_theta_HighLow[1]:.2f}), "+ \
    r"$\bar{C}^{cum}(\theta^*) =$"+f"{best_Cbarcum_HighLow:.2f}",
)
print()
P.plot_Fhat_map(
  FhatI_theta_I=Ctilcum_HighLow,
  thetasX=thetasLo,
  thetasY=thetasHi,
  labelX=r'$\theta^{lower}$',
  labelY=r'$\theta^{upper}$',
  title="Standard error of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$"+f"({best_theta_HighLow[0]:.2f}, {best_theta_HighLow[1]:.2f}), "+ \
    r"$\tilde{C}^{cum}(\theta^*) =$"+f"{best_Ctilcum_HighLow:.2f}",
);

P.plot_expFhat_chart(
  thetaStar_expCbarcum_HighLow,
  r'$\ell$',
  r"$exp\bar{C}^{cum}(\theta^*)$"+"\n(Profit)\n[$]",
  "Expanding sample mean of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$"+f"({best_theta_HighLow[0]:.2f}, {best_theta_HighLow[1]:.2f}), "+ \
    r"$\bar{C}^{cum}(\theta^*) =$"+f"{best_Cbarcum_HighLow:.2f}",
  'b-'
)
print()
P.plot_expFhat_chart(
  thetaStar_expCtilcum_HighLow,
  r'$\ell$',
  r"$exp\tilde{C}^{cum}(\theta^*)$"+"\n(Profit)\n[$]",
  "Expanding standard error of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$"+f"({best_theta_HighLow[0]:.2f}, {best_theta_HighLow[1]:.2f}), "+ \
    r"$\tilde{C}^{cum}(\theta^*) =$"+f"{best_Ctilcum_HighLow:.2f}",
  'b--'
);

f'{len(record_HighLow):,}', L, T
('4,000,000', 1000, 40)
best_theta_HighLow
(14.0, 31.70000000000001)
P.plot_train(
  df=df_first_n,
  df_non=None,
  pars=defaultdict(str, {
    'policy': 'X__HighLow',
    'lower': best_theta_HighLow[0],
    'upper': best_theta_HighLow[1],
    'suptitle': f'TRAINING OF X__HighLow POLICY'+'\n'+f'(first {first_n_t} records)'+'\n'+ \
    f'L = {L}, T = {T}',
  }),
)

P.plot_train(
  df=df_last_n,
  df_non=None,
  pars=defaultdict(str, {
    'policy': 'X__HighLow',
    'lower': best_theta_HighLow[0],
    'upper': best_theta_HighLow[1],
    'suptitle': f'TRAINING OF X__HighLow POLICY'+'\n'+f'(last {last_n_t} records)'+'\n'+ \
    f'L = {L}, T = {T}',
  }),
)

4.6.1.2 X__SellLow
%%time
L = N_SAMPLEPATHS
T = N_TRANSITIONS
first_n_t = 160
last_n_t = 160
S_0_info = {'R_t': 1, 'p_t': INIT_PRICE, 'pbar_t': INIT_PRICE, 'b_t': INIT_BIAS}

M = AssetSellingModel(S_0_info)
P = AssetSellingPolicy(M)
SIM = PriceSimulator(seed=SEED_TRAIN)

thetasLo = np.arange(7.0, 14.0, 0.10)
thetas = P.grid_search_thetas_1(thetasLo)
thetaStar_expCbarcum_SellLow, thetaStar_expCtilcum_SellLow, \
Cbarcum_SellLow, Ctilcum_SellLow, \
best_theta_SellLow, worst_theta_SellLow, \
best_Cbarcum_SellLow, worst_Cbarcum_SellLow, \
best_Ctilcum_SellLow, worst_Ctilcum_SellLow, \
record_SellLow = \
  P.perform_grid_search_sample_paths2("X__SellLow", thetas)
f'{thetaStar_expCbarcum_SellLow.iloc[-1]=:.2f}'
df_first_n = pd.DataFrame.from_records(record_SellLow[:first_n_t], columns=labels)
df_last_n = pd.DataFrame.from_records(record_SellLow[-last_n_t:], columns=labels)
num_thetas=70
thetas=[(7.0,), (7.1,), (7.199999999999999,), (7.299999999999999,), (7.399999999999999,), (7.499999999999998,), (7.599999999999998,), (7.6999999999999975,), (7.799999999999997,), (7.899999999999997,), (7.9999999999999964,), (8.099999999999996,), (8.199999999999996,), (8.299999999999995,), (8.399999999999995,), (8.499999999999995,), (8.599999999999994,), (8.699999999999994,), (8.799999999999994,), (8.899999999999993,), (8.999999999999993,), (9.099999999999993,), (9.199999999999992,), (9.299999999999992,), (9.399999999999991,), (9.499999999999991,), (9.59999999999999,), (9.69999999999999,), (9.79999999999999,), (9.89999999999999,), (9.99999999999999,), (10.099999999999989,), (10.199999999999989,), (10.299999999999988,), (10.399999999999988,), (10.499999999999988,), (10.599999999999987,), (10.699999999999987,), (10.799999999999986,), (10.899999999999986,), (10.999999999999986,), (11.099999999999985,), (11.199999999999985,), (11.299999999999985,), (11.399999999999984,), (11.499999999999984,), (11.599999999999984,), (11.699999999999983,), (11.799999999999983,), (11.899999999999983,), (11.999999999999982,), (12.099999999999982,), (12.199999999999982,), (12.299999999999981,), (12.39999999999998,), (12.49999999999998,), (12.59999999999998,), (12.69999999999998,), (12.79999999999998,), (12.899999999999979,), (12.999999999999979,), (13.099999999999978,), (13.199999999999978,), (13.299999999999978,), (13.399999999999977,), (13.499999999999977,), (13.599999999999977,), (13.699999999999976,), (13.799999999999976,), (13.899999999999975,)]
... printing every 10th theta if considered ...
=== (0 / 70), theta=(7.0,) ===
=== (10 / 70), theta=(7.9999999999999964,) ===
=== (20 / 70), theta=(8.999999999999993,) ===
=== (30 / 70), theta=(9.99999999999999,) ===
=== (40 / 70), theta=(10.999999999999986,) ===
=== (50 / 70), theta=(11.999999999999982,) ===
=== (60 / 70), theta=(12.999999999999979,) ===
best_theta=(12.999999999999979,), best_Cbarcum=16.96, best_Ctilcum=0.30
worst_theta=(12.199999999999982,), worst_Cbarcum=15.16, worst_Ctilcum=0.27
CPU times: user 35min 22s, sys: 1min 11s, total: 36min 34s
Wall time: 36min 22s
P.plot_Fhat_chart(
  Cbarcum_SellLow,
  thetasLo,
  r'$\theta^{lower}$',
  r"$\bar{C}^{cum}(\theta)$"+"\n(Profit)\n[$]",
  "Sample mean of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$"+f"({best_theta_SellLow[0]:.2f},), "+ \
    r"$\bar{C}^{cum}(\theta^*) =$"+f"{best_Cbarcum_SellLow:.2f}",
  'g-',
  best_theta_SellLow,
)
print()
P.plot_Fhat_chart(
  Ctilcum_SellLow,
  thetasLo,
  r'$\theta^{lower}$',
  r"$\tilde{C}^{cum}(\theta)$"+"\n(Profit)\n[$]",
  "Standard error of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$"+f"({best_theta_SellLow[0]:.2f},), "+ \
    r"$\tilde{C}^{cum}(\theta^*) =$"+f"{best_Ctilcum_SellLow:.2f}",
  'g--',
  best_theta_SellLow,
);

P.plot_expFhat_chart(
  thetaStar_expCbarcum_SellLow,
  r'$\ell$',
  r"$exp\bar{C}^{cum}(\theta^*)$"+"\n(Profit)\n[$]",
  "Expanding sample mean of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$"+f"({best_theta_SellLow[0]:.2f},), "+ \
    r"$\bar{C}^{cum}(\theta^*) =$"+f"{best_Cbarcum_SellLow:.2f}",
  'b-'
)
print()
P.plot_expFhat_chart(
  thetaStar_expCtilcum_SellLow,
  r'$\ell$',
  r"$exp\tilde{C}^{cum}(\theta^*)$"+"\n(Profit)\n[$]",
  "Expanding standard error of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$"+f"({best_theta_SellLow[0]:.2f},), "+ \
    r"$\tilde{C}^{cum}(\theta^*) =$"+f"{best_Ctilcum_SellLow:.2f}",
  'b--'
);

f'{len(record_SellLow):,}', L, T
('2,800,000', 1000, 40)
best_theta_SellLow
(12.999999999999979,)
P.plot_train(
  df=df_first_n,
  df_non=None,
  pars=defaultdict(str, {
    'policy': 'X__SellLow',
    'lower': best_theta_SellLow[0],
    'suptitle': f'TRAINING OF X__SellLow POLICY'+'\n'+f'(first {first_n_t} records)'+'\n'+ \
    f'L = {L}, T = {T}'
  }),
)

P.plot_train(
  df=df_last_n,
  df_non=None,
  pars=defaultdict(str, {
    'policy': 'X__SellLow',
    'lower': best_theta_SellLow[0],
    'suptitle': f'TRAINING OF X__SellLow POLICY'+'\n'+f'(last {last_n_t} records)'+'\n'+ \
    f'L = {L}, T = {T}'
  }),
)

4.6.1.3 X__Track
%%time
L = N_SAMPLEPATHS
T = N_TRANSITIONS
S_0_info = {'R_t': 1, 'p_t': INIT_PRICE, 'pbar_t': INIT_PRICE, 'b_t': INIT_BIAS}
first_n_t = 160
last_n_t = 160

M = AssetSellingModel(S_0_info)
P = AssetSellingPolicy(M)
SIM = PriceSimulator(seed=SEED_TRAIN)

# thetasAlpha = np.arange(2.0, 6.0, 1.0) #explodes
thetasAlpha = np.arange(.2, .6, .1)
thetasTrackSignal = np.arange(-1.0, 1.0, 0.1)
thetas = P.grid_search_thetas_2(thetasAlpha, thetasTrackSignal)
thetaStar_expCbarcum_Track, thetaStar_expCtilcum_Track, \
Cbarcum_Track, Ctilcum_Track, \
best_theta_Track, worst_theta_Track, \
best_Cbarcum_Track, worst_Cbarcum_Track, \
best_Ctilcum_Track, worst_Ctilcum_Track, \
record_Track = \
  P.perform_grid_search_sample_paths2("X__Track", thetas)
f'{thetaStar_expCbarcum_Track.iloc[-1]=:.2f}'
df_first_n = pd.DataFrame.from_records(record_Track[:first_n_t], columns=labels)
df_last_n = pd.DataFrame.from_records(record_Track[-last_n_t:], columns=labels)
num_thetas=80
thetas=[(0.2, -1.0), (0.2, -0.9), (0.2, -0.8), (0.2, -0.7000000000000001), (0.2, -0.6000000000000001), (0.2, -0.5000000000000001), (0.2, -0.40000000000000013), (0.2, -0.30000000000000016), (0.2, -0.20000000000000018), (0.2, -0.1000000000000002), (0.2, -2.220446049250313e-16), (0.2, 0.09999999999999964), (0.2, 0.19999999999999973), (0.2, 0.2999999999999998), (0.2, 0.3999999999999997), (0.2, 0.49999999999999956), (0.2, 0.5999999999999996), (0.2, 0.6999999999999997), (0.2, 0.7999999999999996), (0.2, 0.8999999999999995), (0.30000000000000004, -1.0), (0.30000000000000004, -0.9), (0.30000000000000004, -0.8), (0.30000000000000004, -0.7000000000000001), (0.30000000000000004, -0.6000000000000001), (0.30000000000000004, -0.5000000000000001), (0.30000000000000004, -0.40000000000000013), (0.30000000000000004, -0.30000000000000016), (0.30000000000000004, -0.20000000000000018), (0.30000000000000004, -0.1000000000000002), (0.30000000000000004, -2.220446049250313e-16), (0.30000000000000004, 0.09999999999999964), (0.30000000000000004, 0.19999999999999973), (0.30000000000000004, 0.2999999999999998), (0.30000000000000004, 0.3999999999999997), (0.30000000000000004, 0.49999999999999956), (0.30000000000000004, 0.5999999999999996), (0.30000000000000004, 0.6999999999999997), (0.30000000000000004, 0.7999999999999996), (0.30000000000000004, 0.8999999999999995), (0.4000000000000001, -1.0), (0.4000000000000001, -0.9), (0.4000000000000001, -0.8), (0.4000000000000001, -0.7000000000000001), (0.4000000000000001, -0.6000000000000001), (0.4000000000000001, -0.5000000000000001), (0.4000000000000001, -0.40000000000000013), (0.4000000000000001, -0.30000000000000016), (0.4000000000000001, -0.20000000000000018), (0.4000000000000001, -0.1000000000000002), (0.4000000000000001, -2.220446049250313e-16), (0.4000000000000001, 0.09999999999999964), (0.4000000000000001, 0.19999999999999973), (0.4000000000000001, 0.2999999999999998), (0.4000000000000001, 0.3999999999999997), (0.4000000000000001, 0.49999999999999956), (0.4000000000000001, 0.5999999999999996), (0.4000000000000001, 0.6999999999999997), (0.4000000000000001, 0.7999999999999996), (0.4000000000000001, 0.8999999999999995), (0.5000000000000001, -1.0), (0.5000000000000001, -0.9), (0.5000000000000001, -0.8), (0.5000000000000001, -0.7000000000000001), (0.5000000000000001, -0.6000000000000001), (0.5000000000000001, -0.5000000000000001), (0.5000000000000001, -0.40000000000000013), (0.5000000000000001, -0.30000000000000016), (0.5000000000000001, -0.20000000000000018), (0.5000000000000001, -0.1000000000000002), (0.5000000000000001, -2.220446049250313e-16), (0.5000000000000001, 0.09999999999999964), (0.5000000000000001, 0.19999999999999973), (0.5000000000000001, 0.2999999999999998), (0.5000000000000001, 0.3999999999999997), (0.5000000000000001, 0.49999999999999956), (0.5000000000000001, 0.5999999999999996), (0.5000000000000001, 0.6999999999999997), (0.5000000000000001, 0.7999999999999996), (0.5000000000000001, 0.8999999999999995)]
... printing every 10th theta if considered ...
=== (0 / 80), theta=(0.2, -1.0) ===
=== (10 / 80), theta=(0.2, -2.220446049250313e-16) ===
=== (20 / 80), theta=(0.30000000000000004, -1.0) ===
=== (30 / 80), theta=(0.30000000000000004, -2.220446049250313e-16) ===
=== (40 / 80), theta=(0.4000000000000001, -1.0) ===
=== (50 / 80), theta=(0.4000000000000001, -2.220446049250313e-16) ===
=== (60 / 80), theta=(0.5000000000000001, -1.0) ===
=== (70 / 80), theta=(0.5000000000000001, -2.220446049250313e-16) ===
best_theta=(0.30000000000000004, 0.7999999999999996), best_Cbarcum=16.32, best_Ctilcum=0.12
worst_theta=(0.4000000000000001, 0.8999999999999995), worst_Cbarcum=15.61, worst_Ctilcum=0.13
CPU times: user 40min 48s, sys: 1min 21s, total: 42min 10s
Wall time: 41min 53s
P.plot_Fhat_map(
  FhatI_theta_I=Cbarcum_Track,
  thetasX=thetasAlpha,
  thetasY=thetasTrackSignal,
  labelX=r'$\theta^{\alpha}$',
  labelY=r'$\theta^{trackSignal}$',
  title="Sample mean of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$"+f"({best_theta_Track[0]:.2f}, {best_theta_Track[1]:.2f}), "+ \
    r"$\bar{C}^{cum}(\theta^*) =$"+f"{best_Cbarcum_Track:.2f}",
)
print()
P.plot_Fhat_map(
  FhatI_theta_I=Ctilcum_Track,
  thetasX=thetasAlpha,
  thetasY=thetasTrackSignal,
  labelX=r'$\theta^{\alpha}$',
  labelY=r'$\theta^{trackSignal}$',
  title="Standard error of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$"+f"({best_theta_Track[0]:.2f}, {best_theta_Track[1]:.2f}), "+ \
    r"$\tilde{C}^{cum}(\theta^*) =$"+f"{best_Ctilcum_Track:.2f}",
);

P.plot_expFhat_chart(
  df=thetaStar_expCbarcum_Track,
  labelX=r'$\ell$',
  labelY=r"$exp\bar{C}^{cum}(\theta^*)$"+"\n(Profit)\n[$]",
  title="Expanding sample mean of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$"+f"({best_theta_Track[0]:.2f}, {best_theta_Track[1]:.2f}), "+ \
    r"$\bar{C}^{cum}(\theta^*) =$"+f"{best_Cbarcum_Track:.2f}",
  color_style='b-'
)
print()
P.plot_expFhat_chart(
  thetaStar_expCtilcum_HighLow,
  labelX=r'$\ell$',
  labelY=r"$exp\tilde{C}^{cum}(\theta^*)$"+"\n(Profit)\n[$]",
  title="Expanding standard error of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$"+f"({best_theta_Track[0]:.2f}, {best_theta_Track[1]:.2f}), "+ \
    r"$\tilde{C}^{cum}(\theta^*) =$"+f"{best_Ctilcum_Track:.2f}",
  color_style='b--'
);

f'{len(record_Track):,}', L, T
('3,200,000', 1000, 40)
best_theta_Track
(0.30000000000000004, 0.7999999999999996)
P.plot_train(
  df=df_first_n,
  df_non=None,
  pars=defaultdict(str, {
    'policy': 'X__Track',
    'alpha': best_theta_Track[0],
    'trackSignal': best_theta_Track[1],
    'suptitle': f'TRAINING OF X__Track POLICY'+'\n'+f'(first {first_n_t} records)'+'\n'+ \
    f'L = {L}, T = {T}',
  }),
)

P.plot_train(
  df=df_last_n,
  df_non=None,
  pars=defaultdict(str, {
    'policy': 'X__Track',
    'alpha': best_theta_Track[0],
    'trackSignal': best_theta_Track[1],
    'suptitle': f'TRAINING OF X__Track POLICY'+'\n'+f'(first {last_n_t} records)'+'\n'+ \
    f'L = {L}, T = {T}',
  }),
)

Comparison of Policies

last_n_l = int(.95*L)
P.plot_expFhat_charts(
  means={
      'HighLow': thetaStar_expCbarcum_HighLow[-last_n_l:],
      'SellLow': thetaStar_expCbarcum_SellLow[-last_n_l:],
      'Track': thetaStar_expCbarcum_Track[-last_n_l:]
  },
  stdvs={
      'HighLow': thetaStar_expCtilcum_HighLow[-last_n_l:],
      'SellLow': thetaStar_expCtilcum_SellLow[-last_n_l:],
      'Track': thetaStar_expCtilcum_Track[-last_n_l:]
  },
  labelX='Sample paths, ' + r'$\ell$',
  labelY='Profit\n[$]',
  suptitle=f"Comparison of Policies after Training\n \
    L = {L}, T = {T}\n \
    last {last_n_l} records\n \
    ('exp' refers to expanding)",
  pars=defaultdict(str, {
    'colors': ['r', 'g', 'b']
  }),
)

4.6.2 Evaluation

4.6.2.1 X__HighLow
best_theta_HighLow, worst_theta_HighLow #paste into next cell (vars won't work)
((14.0, 31.70000000000001), (14.699999999999998, 31.200000000000003))
L = N_SAMPLEPATHS
T = N_TRANSITIONS
first_n = 160
S_0_info = {'R_t': 1, 'p_t': INIT_PRICE, 'pbar_t': INIT_PRICE, 'b_t': INIT_BIAS}

M = AssetSellingModel(S_0_info)
P = AssetSellingPolicy(M)
# SIM = PriceSimulator(seed=SEED_TRAIN)
SIM = PriceSimulator(seed=SEED_EVALU)
thetasOpt = [(14.0, 31.70000000000001)]; print(f'{thetasOpt=}')
thetaStar_expCbarcum_HighLow_evalu, thetaStar_expCtilcum_HighLow_evalu, \
Cbarcum_HighLow_evalu, Ctilcum_HighLow_evalu, \
best_theta_HighLow_evalu, worst_theta_HighLow_evalu, \
best_Cbarcum_HighLow_evalu, worst_Cbarcum_HighLow_evalu, \
best_Ctilcum_HighLow_evalu, worst_Ctilcum_HighLow_evalu, \
record_HighLow_evalu = \
  P.perform_grid_search_sample_paths2("X__HighLow", thetasOpt)
df_X__HighLow_evalu = pd.DataFrame.from_records(record_HighLow_evalu[:first_n], columns=labels)

M = AssetSellingModel(S_0_info)
P = AssetSellingPolicy(M)
# SIM = PriceSimulator(seed=SEED_TRAIN)
SIM = PriceSimulator(seed=SEED_EVALU)
thetasNon = [(14.699999999999998, 31.200000000000003)]; print(f'{thetasNon=}')
thetaStar_expCbarcum_HighLow_evalu_non, thetaStar_expCtilcum_HighLow_evalu_non, \
Cbarcum_HighLow_evalu_non, Ctilcum_HighLow_evalu_non, \
best_theta_HighLow_evalu_non, worst_theta_HighLow_evalu_non, \
best_Cbarcum_HighLow_evalu_non, worst_Cbarcum_HighLow_evalu_non, \
best_Ctilcum_HighLow_evalu_non, worst_Ctilcum_HighLow_evalu_non, \
record_HighLow_evalu_non = \
  P.perform_grid_search_sample_paths2("X__HighLow", thetasNon)
print(
  f'{thetaStar_expCbarcum_HighLow_evalu.iloc[-1]=:.2f}, \
    {thetaStar_expCbarcum_HighLow_evalu_non.iloc[-1]=:.2f}')
df_X__HighLow_evalu_non = pd.DataFrame.from_records(record_HighLow_evalu_non[:first_n], columns=labels)
thetasOpt=[(14.0, 31.70000000000001)]
num_thetas=1
thetas=[(14.0, 31.70000000000001)]
... printing every 10th theta if considered ...
=== (0 / 1), theta=(14.0, 31.70000000000001) ===
best_theta=(14.0, 31.70000000000001), best_Cbarcum=15.92, best_Ctilcum=0.22
worst_theta=(14.0, 31.70000000000001), worst_Cbarcum=15.92, worst_Ctilcum=0.22
thetasNon=[(14.699999999999998, 31.200000000000003)]
num_thetas=1
thetas=[(14.699999999999998, 31.200000000000003)]
... printing every 10th theta if considered ...
=== (0 / 1), theta=(14.699999999999998, 31.200000000000003) ===
best_theta=(14.699999999999998, 31.200000000000003), best_Cbarcum=16.00, best_Ctilcum=0.20
worst_theta=(14.699999999999998, 31.200000000000003), worst_Cbarcum=16.00, worst_Ctilcum=0.20
thetaStar_expCbarcum_HighLow_evalu.iloc[-1]=15.92,     thetaStar_expCbarcum_HighLow_evalu_non.iloc[-1]=16.00
thetasOpt
[(14.0, 31.70000000000001)]
P.plot_train(
  df=df_X__HighLow_evalu,
  df_non=df_X__HighLow_evalu_non,
  pars=defaultdict(str, {
    'policy': 'X__HighLow',
    'thetaStar': best_theta_HighLow_evalu,
    'lower': thetasOpt[0][0],
    'upper': thetasOpt[0][1],
    'lower_non': thetasNon[0][0],
    'upper_non': thetasNon[0][1],
    'suptitle': f'EVALUATION OF X__HighLow POLICY'+'\n'+f'(first {first_n} records)'+'\n'+ \
    f'L = {L}, T = {T}'
  }),
)

last_n_l = int(.95*L)
P.plot_expFhat_charts(
  means={
      'HighLow optimal': thetaStar_expCbarcum_HighLow_evalu[-last_n_l:],
      'HighLow non-optimal': thetaStar_expCbarcum_HighLow_evalu_non[-last_n_l:],
  },
  stdvs={
      'HighLow optimal': thetaStar_expCtilcum_HighLow_evalu[-last_n_l:],
      'HighLow non-optimal': thetaStar_expCtilcum_HighLow_evalu_non[-last_n_l:],
  },
  labelX='Sample paths, ' + r'$\ell$',
  labelY='Profit\n[$]',
  suptitle=f"Comparison of Optimal/Non-optimal Policies after Evaluation\n \
    L = {L}, T = {T}\n \
    last {last_n_l} records\n \
    ('exp' refers to expanding)",
  pars=defaultdict(str, {
    'colors': ['m', 'c']
  }),
)

4.6.2.2 X__SellLow
best_theta_SellLow, worst_theta_SellLow #paste into next cell (vars won't work)
((12.999999999999979,), (12.199999999999982,))
L = N_SAMPLEPATHS
T = N_TRANSITIONS
first_n = 160
S_0_info = {'R_t': 1, 'p_t': INIT_PRICE, 'pbar_t': INIT_PRICE, 'b_t': INIT_BIAS}

M = AssetSellingModel(S_0_info)
P = AssetSellingPolicy(M)
SIM = PriceSimulator(seed=SEED_EVALU)
# SIM = PriceSimulator(seed=SEED_TRAIN)
thetasOpt = [(12.999999999999979,)]; print(f'{thetasOpt=}')
thetaStar_expCbarcum_SellLow_evalu, thetaStar_expCtilcum_SellLow_evalu, \
Cbarcum_SellLow_evalu, Ctilcum_SellLow_evalu, \
best_theta_SellLow_evalu, worst_theta_SellLow_evalu, \
best_Cbarcum_SellLow_evalu, worst_Cbarcum_SellLow_evalu, \
best_Ctilcum_SellLow_evalu, worst_Ctilcum_SellLow_evalu, \
record_SellLow_evalu = \
  P.perform_grid_search_sample_paths2("X__SellLow", thetasOpt)
df_X__SellLow_evalu = pd.DataFrame.from_records(record_SellLow_evalu[:first_n], columns=labels)

M = AssetSellingModel(S_0_info)
P = AssetSellingPolicy(M)
SIM = PriceSimulator(seed=SEED_EVALU)
# SIM = PriceSimulator(seed=SEED_TRAIN)
thetasNon = [(12.199999999999982,)]; print(f'{thetasNon=}')
thetaStar_expCbarcum_SellLow_evalu_non, thetaStar_expCtilcum_SellLow_evalu_non, \
Cbarcum_SellLow_evalu_non, Ctilcum_SellLow_evalu_non, \
best_theta_SellLow_evalu_non, worst_theta_SellLow_evalu_non, \
best_Cbarcum_SellLow_evalu_non, worst_Cbarcum_SellLow_evalu_non, \
best_Ctilcum_SellLow_evalu_non, worst_Ctilcum_SellLow_evalu_non, \
record_SellLow_evalu_non = \
  P.perform_grid_search_sample_paths2("X__SellLow", thetasNon)
print(
  f'{thetaStar_expCbarcum_SellLow_evalu.iloc[-1]=:.2f}, \
    {thetaStar_expCbarcum_SellLow_evalu_non.iloc[-1]=:.2f}')
df_X__SellLow_evalu_non = pd.DataFrame.from_records(record_SellLow_evalu_non[:first_n], columns=labels)
thetasOpt=[(12.999999999999979,)]
num_thetas=1
thetas=[(12.999999999999979,)]
... printing every 10th theta if considered ...
=== (0 / 1), theta=(12.999999999999979,) ===
best_theta=(12.999999999999979,), best_Cbarcum=16.24, best_Ctilcum=0.29
worst_theta=(12.999999999999979,), worst_Cbarcum=16.24, worst_Ctilcum=0.29
thetasNon=[(12.199999999999982,)]
num_thetas=1
thetas=[(12.199999999999982,)]
... printing every 10th theta if considered ...
=== (0 / 1), theta=(12.199999999999982,) ===
best_theta=(12.199999999999982,), best_Cbarcum=16.19, best_Ctilcum=0.31
worst_theta=(12.199999999999982,), worst_Cbarcum=16.19, worst_Ctilcum=0.31
thetaStar_expCbarcum_SellLow_evalu.iloc[-1]=16.24,     thetaStar_expCbarcum_SellLow_evalu_non.iloc[-1]=16.19
P.plot_train(
  df=df_X__SellLow_evalu,
  df_non=df_X__SellLow_evalu_non,
  pars=defaultdict(str, {
    'policy': 'X__SellLow',
    'thetaStar': best_theta_SellLow_evalu,
    'lower': thetasOpt[0][0],
    'lower_non': thetasNon[0][0],
    'suptitle': f'EVALUATION OF X__SellLow POLICY'+'\n'+f'(first {first_n} records)'+'\n'+ \
    f'L = {L}, T = {T}'
  }),
)

last_n_l = int(.95*L)
P.plot_expFhat_charts(
  means={
      'SellLow optimal': thetaStar_expCbarcum_SellLow_evalu[-last_n_l:],
      'SellLow non-optimal': thetaStar_expCbarcum_SellLow_evalu_non[-last_n_l:],
  },
  stdvs={
      'SellLow optimal': thetaStar_expCtilcum_SellLow_evalu[-last_n_l:],
      'SellLow non-optimal': thetaStar_expCtilcum_SellLow_evalu_non[-last_n_l:],
  },
  labelX='Sample paths, ' + r'$\ell$',
  labelY='Profit\n[$]',
  suptitle=f"Comparison of Optimal/Non-optimal Policies after Evaluation\n \
    L = {L}, T = {T}\n \
    last {last_n_l} records\n \
    ('exp' refers to expanding)",
  pars=defaultdict(str, {
    'colors': ['m', 'c']
  }),
)

4.6.2.3 X__Track
best_theta_Track, worst_theta_Track #paste into next cell (vars won't work)
((0.30000000000000004, 0.7999999999999996),
 (0.4000000000000001, 0.8999999999999995))
L = N_SAMPLEPATHS
T = N_TRANSITIONS
first_n = 160
S_0_info = {'R_t': 1, 'p_t': INIT_PRICE, 'pbar_t': INIT_PRICE, 'b_t': INIT_BIAS}

M = AssetSellingModel(S_0_info)
P = AssetSellingPolicy(M)
SIM = PriceSimulator(seed=SEED_EVALU)
thetasOpt = [(0.30000000000000004, 0.7999999999999996)]; print(f'{thetasOpt=}')
thetaStar_expCbarcum_Track_evalu, thetaStar_expCtilcum_Track_evalu, \
Cbarcum_Track_evalu, Ctilcum_Track_evalu, \
best_theta_Track_evalu, worst_theta_Track_evalu, \
best_Cbarcum_Track_evalu, worst_Cbarcum_Track_evalu, \
best_Ctilcum_Track_evalu, worst_Ctilcum_Track_evalu, \
record_Track_evalu = \
  P.perform_grid_search_sample_paths2("X__Track", thetasOpt)
df_X__Track_evalu = pd.DataFrame.from_records(record_Track_evalu[:first_n], columns=labels)

M = AssetSellingModel(S_0_info)
P = AssetSellingPolicy(M)
SIM = PriceSimulator(seed=SEED_EVALU)
thetasNon = [(0.4000000000000001, 0.8999999999999995)]; print(f'{thetasNon=}')
thetaStar_expCbarcum_Track_evalu_non, thetaStar_expCtilcum_Track_evalu_non, \
Cbarcum_Track_evalu_non, Ctilcum_Track_evalu_non, \
best_theta_Track_evalu_non, worst_theta_Track_evalu_non, \
best_Cbarcum_Track_evalu_non, worst_Cbarcum_Track_evalu_non, \
best_Ctilcum_Track_evalu_non, worst_Ctilcum_Track_evalu_non, \
record_Track_evalu_non = \
  P.perform_grid_search_sample_paths2("X__Track", thetasNon)
print(
  f'{thetaStar_expCbarcum_Track_evalu.iloc[-1]=:.2f}, \
    {thetaStar_expCbarcum_Track_evalu_non.iloc[-1]=:.2f}')
df_X__Track_evalu_non = pd.DataFrame.from_records(record_Track_evalu_non[:first_n], columns=labels)
thetasOpt=[(0.30000000000000004, 0.7999999999999996)]
num_thetas=1
thetas=[(0.30000000000000004, 0.7999999999999996)]
... printing every 10th theta if considered ...
=== (0 / 1), theta=(0.30000000000000004, 0.7999999999999996) ===
best_theta=(0.30000000000000004, 0.7999999999999996), best_Cbarcum=16.15, best_Ctilcum=0.12
worst_theta=(0.30000000000000004, 0.7999999999999996), worst_Cbarcum=16.15, worst_Ctilcum=0.12
thetasNon=[(0.4000000000000001, 0.8999999999999995)]
num_thetas=1
thetas=[(0.4000000000000001, 0.8999999999999995)]
... printing every 10th theta if considered ...
=== (0 / 1), theta=(0.4000000000000001, 0.8999999999999995) ===
best_theta=(0.4000000000000001, 0.8999999999999995), best_Cbarcum=16.10, best_Ctilcum=0.13
worst_theta=(0.4000000000000001, 0.8999999999999995), worst_Cbarcum=16.10, worst_Ctilcum=0.13
thetaStar_expCbarcum_Track_evalu.iloc[-1]=16.15,     thetaStar_expCbarcum_Track_evalu_non.iloc[-1]=16.10
P.plot_train(
  df=df_X__Track_evalu,
  df_non=df_X__Track_evalu_non,
  pars=defaultdict(str, {
    'policy': 'X__Track',
    'thetaStar': best_theta_Track_evalu,

    'alpha': thetasOpt[0][0],
    'trackSignal': thetasOpt[0][1],

    'alpha_non': thetasNon[0][0],
    'trackSignal_non': thetasNon[0][1],

    'suptitle': f'EVALUATION OF X__Track POLICY'+'\n'+f'(first {first_n} records)'+'\n'+ \
    f'L = {L}, T = {T}'
  }),
)

last_n_l = int(.95*L)
P.plot_expFhat_charts(
  means={
      'Track optimal': thetaStar_expCbarcum_Track_evalu[-last_n_l:],
      'Track non-optimal': thetaStar_expCbarcum_Track_evalu_non[-last_n_l:],
  },
  stdvs={
      'Track optimal': thetaStar_expCtilcum_Track_evalu[-last_n_l:],
      'Track non-optimal': thetaStar_expCtilcum_Track_evalu_non[-last_n_l:],
  },
  labelX='Sample paths, ' + r'$\ell$',
  labelY='Profit\n[$]',
  suptitle=f"Comparison of Optimal/Non-optimal Policies after Evaluation\n \
    L = {L}, T = {T}\n \
    last {last_n_l} records\n \
    ('exp' refers to expanding)",
  pars=defaultdict(str, {
    'colors': ['m', 'c']
  }),
)

Comparison of Policies

last_n_l = int(.95*L)
P.plot_expFhat_charts(
  means={
      'HighLow': thetaStar_expCbarcum_HighLow_evalu[-last_n_l:],
      'SellLow': thetaStar_expCbarcum_SellLow_evalu[-last_n_l:],
      'Track': thetaStar_expCbarcum_Track_evalu[-last_n_l:]
  },
  stdvs={
      'HighLow': thetaStar_expCtilcum_HighLow_evalu[-last_n_l:],
      'SellLow': thetaStar_expCtilcum_SellLow_evalu[-last_n_l:],
      'Track': thetaStar_expCtilcum_Track_evalu[-last_n_l:]
  },
  labelX='Sample paths, ' + r'$\ell$',
  labelY='Profit\n[$]',
  suptitle=f"Comparison of Optimal Policies after Evaluation\n \
    L = {L}, T = {T}\n \
    last {last_n_l} records\n \
    ('exp' refers to expanding)",
  pars=defaultdict(str, {
    'colors': ['r', 'g', 'b']
  }),
)

last_n_l = int(.95*L)
P.plot_expFhat_charts(
  means={
      'HighLow': thetaStar_expCbarcum_HighLow_evalu_non[-last_n_l:],
      'SellLow': thetaStar_expCbarcum_SellLow_evalu_non[-last_n_l:],
      'Track': thetaStar_expCbarcum_Track_evalu_non[-last_n_l:]
  },
  stdvs={
      'HighLow': thetaStar_expCtilcum_HighLow_evalu_non[-last_n_l:],
      'SellLow': thetaStar_expCtilcum_SellLow_evalu_non[-last_n_l:],
      'Track': thetaStar_expCtilcum_Track_evalu_non[-last_n_l:]
  },
  labelX='Sample paths, ' + r'$\ell$',
  labelY='Profit\n[$]',
  suptitle=f"Comparison of Non-optimal Policies after Evaluation\n \
    L = {L}, T = {T}\n \
    last {last_n_l} records\n \
    ('exp' refers to expanding)",
  pars=defaultdict(str, {
    'colors': ['r', 'g', 'b']
  }),
)

P.plot_evalu_comparison(
  df1=df_X__HighLow_evalu,
  df2=df_X__SellLow_evalu,
  df3=df_X__Track_evalu,
  pars= defaultdict(str, {
    'suptitle': f'EVALUATION OF ALL POLICIES (first {first_n_t} records)\n \
    L={L}, T={T}',
  }),
)