Car Rental Operations using the Powell Unified Framework (Part 1)

Back to Portfolio of Projects | LearnableLoopAI.com | Blog |

0 INTRODUCTION

In this project the client had a need to be convinced of the benefits of formal optimized sequential decision making. The client’s need relates to human resource scheduling. The current project, although from the car rental industry, lays a foundation for the development of the client’s specific need.

The overall structure of this project and report follows the traditional CRISP-DM format. However, instead of the CRISP-DM’S “4 Modeling” section, we inserted the “6 step modeling process” of Dr. Warren Powell in section 4 of this document. Dr Powell’s universal framework shows great promise for unifying the formalisms of at least a dozen different fields. Using his framework enables easier access to thinking patterns in these other fields that might be beneficial and informative to the sequential decision problem at hand. Traditionally, this kind of problem would be approached from the reinforcement learning perspective. However, using Dr. Powell’s wider and more comprehensive perspective almost certainly provides additional value.

Here is information on Dr. Powell’s perspective on Sequential Decision Analytics.

In order to make a strong mapping between the code in this notebook and the mathematics in the Powell Universal Framework (PUF), we follow the following convention for naming Python identifier names:

How to read/say
- var name & flavor first
- at t/n
- for entity OR of/with attribute
- ${\hat{R}}_{t + 1, a}^{f a i l}$ has code Rhat__fail_tt1_a which is read: “Rhatfail at t+1 of/with (attribute) a”
Superscripts
- variable names have a double underscore to indicate a superscript
- $X^{π}$ : has code X__pi, is read X pi
- when there is a ‘natural’ distinction between the variable symbol and the superscript (e.g. a change in case), the double underscore is sometimes omitted: Xpi instead of X__pi, or MSpend_t instead of M__Spend_t
Subscripts
- variable names have a single underscore to indicate a subscript
- $S_{t}$ : has code S_t, is read ‘S at t’
- $M_{t}^{S p e n d}$ has code M__Spend_t which is read: “MSpend at t”
- ${\hat{R}}_{t + 1, a}^{f a i l}$ has code Rhat__fail_tt1_a which is read: “Rhatfail at t+1 of/with (attribute) a” [RLSO-p436]
Arguments
- collection variable names may have argument information added
- $X^{π} (S_{t})$ : has code X__piIS_tI, is read ‘X pi in S at t’
- the surrounding I’s are used to imitate the parentheses around the argument
Next time/iteration
- variable names that indicate one step in the future are quite common
- $R_{t + 1}$ : has code R_tt1, is read ‘R at t+1’
- $R^{n + 1}$ : has code R__nt1, is read ‘R at n+1’
Rewards
- State-independent terminal reward and cumulative reward
  - $F$ : has code F for terminal reward
  - $\sum_{n} F$ : has code cumF for cumulative reward
- State-dependent terminal reward and cumulative reward
  - $C$ : has code C for terminal reward
  - $\sum_{t} C$ : has code cumC for cumulative reward
Vectors where components use different names
- $S_{t} (R_{t}, p_{t})$ : has code S_t.R_t and S_t.p_t, is read ‘S at t in R at t, and, S at t in p at t’
- the code implementation is by means of a named tuple
  - self.State = namedtuple('State', SVarNames) for the ‘class’ of the vector
  - self.S_t for the ‘instance’ of the vector
Vectors where components reuse names
- $x_{t} (x_{t, G B}, x_{t, B L})$ : has code x_t.x_t_GB and x_t.x_t_BL, is read ‘x at t in x at t for GB, and, x at t in x at t for BL’
- the code implementation is by means of a named tuple
  - self.Decision = namedtuple('Decision', xVarNames) for the ‘class’ of the vector
  - self.x_t for the ‘instance’ of the vector
Use of mixed-case variable names
- to reduce confusion, sometimes the use of mixed-case variable names are preferred (even though it is not a best practice in the Python community), reserving the use of underscores and double underscores for math-related variables

1 BUSINESS UNDERSTANDING

In this POC the car rental company (“1-3 Rentals”) has access to wholesale renting from a key partner. 1-3 Rentals then rents cars (either Elantras or Sonatas) to their customers. Rentals can be either for a day, or for 3 days. The critical decisions are the number of wholesale rentals to perform at the beginning of a day for each of the car types. This decision is partly influenced by the most recent demands for each rental type as well as each car type.

2 DATA UNDERSTANDING

Based on recent market research, the 1-day demand may be modeled by two Poisson distributions with means: $\begin{aligned} μ^{E L A_1} & = SIM_MU_D [^{'} ELA_1^{'}] \\ μ^{S O N_1} & = SIM_MU_D [^{'} SON_1^{'}] \end{aligned}$

Similarly, the 3-day demands are: $\begin{aligned} μ^{E L A_3} & = SIM_MU_D [^{'} ELA_3^{'}] \\ μ^{S O N_3} & = SIM_MU_D [^{'} SON_3^{'}] \end{aligned}$

So we have: $D_{t + 1}^{E L A_1} \sim P o i s (μ^{E L A_1}) D_{t + 1}^{E L A_3} \sim P o i s (μ^{E L A_3}) D_{t + 1}^{S O N_1} \sim P o i s (μ^{S O N_1}) D_{t + 1}^{S O N_3} \sim P o i s (μ^{S O N_3})$

The decision window is 1 day and these simulations are for the daily demands.

## import pdb
from collections import namedtuple, defaultdict
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from copy import copy
import time
from scipy.ndimage.interpolation import shift
import pickle
from bisect import bisect
import math
from pprint import pprint
import matplotlib as mpl
from certifi.core import where
pd.options.display.float_format = '{:,.4f}'.format
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)
! python --version

Python 3.10.12

DeprecationWarning: Please use `shift` from the `scipy.ndimage` namespace, the `scipy.ndimage.interpolation` namespace is deprecated.
  from scipy.ndimage.interpolation import shift

We will have the learnable paramters:

$((θ_{E L A}^{A d j 1}, θ_{S O N}^{A d j 1}), (θ_{E L A}^{A d j 3}, θ_{S O N}^{A d j 3}))$

## PARAMETERS
SNAMES = [ #state variable names
    'RAvail_t', #available resource
    'R_t',      #resource
    'D_t',      #demand
]
xNAMES = ['x_t'] #decision variable names

CAR_TYPES = ['ELA', 'SON']
RENTAL_TYPES = ['1', '3'] #1-day, 3-day
DAYS_OUT = [0, 1, 2, 3]

## all possible *resource* attribute vectors
aNAMES = []
for i in CAR_TYPES:
  for j in RENTAL_TYPES:
    for k in DAYS_OUT:
      ##skip if days out > rental type, can only be out for up to rental type
      if k > int(j): continue
      an = '_'.join([str(i),str(j),str(k)])
      aNAMES.append(an)
print(f'{len(aNAMES)=}')
print(aNAMES)
a1NAMES = CAR_TYPES
print(f'{len(a1NAMES)=}')
print(a1NAMES)

## all possible *demand* attribute vectors
bNAMES = []
for i in CAR_TYPES:
  for j in RENTAL_TYPES:
    bn = '_'.join([str(i),str(j)])
    bNAMES.append(bn)
print(f'\n{len(bNAMES)=}')
print(bNAMES)

## all possible *decision* 'attribute' vectors
baNAMES = [] ##from resource vector a to demand vector b
for bn in bNAMES:
  b = bn.split('_')
  for an in aNAMES:
    a = an.split('_')
    ban = (bn + '___' + an)
    if not a[2] == '0': continue ##skip if not a paid bench pool
    if not bn == '_'.join([a[0], a[1]]): continue ##skip if cross supply
    baNAMES.append(ban)
print(f'\n{len(baNAMES)=}')
print(baNAMES)

piNAMES = ['X__AdjBelow'] #policy names
thNAMES = [ #theta names
  'thAdj1', #adjust 1-day pool size when resource level go below/above this value
  'thAdj3', #adjust 3-day pool size when resource level go below/above this value
]
print(f'\n{len(thNAMES)=}')
print(thNAMES)

SEED_TRAIN = 77777777
SEED_EVALU = 88888888
N_SAMPLEPATHS = 100; L = N_SAMPLEPATHS
N_TRANSITIONS = 100; T = N_TRANSITIONS

AVAIL_POOLS = {'ELA_1': 100, 'ELA_3': 100, 'SON_1': 100, 'SON_3': 100} #avail for pools; unpaid bench
STANDBY_POOLS = {'ELA_1_0': 0, 'ELA_3_0': 0, 'SON_1_0': 0, 'SON_3_0': 0} #avail for rental; paid bench

##              'ELA'                     'SON'
TH_ADJ1_SPEC = {CAR_TYPES[0]: (26, 30, 1), CAR_TYPES[1]: (9, 13, 1)}
TH_ADJ3_SPEC = {CAR_TYPES[0]: (34, 38, 1), CAR_TYPES[1]: (12, 16, 1)}

SIM_T = 60
SIM_MU_D = {'ELA_1': 30, 'ELA_3': 20, 'SON_1': 25, 'SON_3': 15}
SIM_EVENT_TIME_D = {'ELA_1': None, 'ELA_3': None, 'SON_1': None, 'SON_3': None}
SIM_MU_DELTA_D = {'ELA_1': None, 'ELA_3': None, 'SON_1': None, 'SON_3': None}

# math parameters use 'math/small case' (as opposed to code parameters):

## p__recr = {'CEO': 30_000, 'CFO': 25_000} #dollars ##eventually for HR
p__rent = {'ELA_1': 46, 'ELA_3': 100, 'SON_1': 54, 'SON_3': 160} #dollars

##dollars per item to rent from wholesaler/paid bench cost
c__upkeep = {'ELA': 18.43, 'SON': 24.72}

## CONTRIB_MATRIX = {}
# for an in aNAMES:
#     contribs = {}
#     for bn in bNAMES:
#       contribs[bn] = contribution(an, bn)
#     CONTRIB_MATRIX[an] = contribs
# CONTRIB_MATRIX

len(aNAMES)=12
['ELA_1_0', 'ELA_1_1', 'ELA_3_0', 'ELA_3_1', 'ELA_3_2', 'ELA_3_3', 'SON_1_0', 'SON_1_1', 'SON_3_0', 'SON_3_1', 'SON_3_2', 'SON_3_3']
len(a1NAMES)=2
['ELA', 'SON']

len(bNAMES)=4
['ELA_1', 'ELA_3', 'SON_1', 'SON_3']

len(baNAMES)=4
['ELA_1___ELA_1_0', 'ELA_3___ELA_3_0', 'SON_1___SON_1_0', 'SON_3___SON_3_0']

len(thNAMES)=2
['thAdj1', 'thAdj3']

class DemandSimulator():
  def __init__(self,
    T__sim=SIM_T,
    muD=SIM_MU_D,
    eventTimeD=SIM_EVENT_TIME_D,
    muDeltaD=SIM_MU_DELTA_D,
    seed=None):
    self.time = 0
    self.T__sim = SIM_T
    self.muD = SIM_MU_D
    self.eventTimeD = SIM_EVENT_TIME_D
    self.muDeltaD = SIM_MU_DELTA_D
    self.prng = np.random.RandomState(seed)

  def simulate(self):
    if self.time > self.T__sim - 1:
      self.time = 0
    D_tt1 = {}
    for bn in bNAMES:
      if self.eventTimeD[bn] and self.time > self.eventTimeD[bn]: #event for entity
        D_tt1[bn] = self.muDeltaD[bn] + self.prng.poisson(self.muD[bn]) #after event
      else:
        D_tt1[bn] = self.prng.poisson(self.muD[bn])
    self.time += 1
    return {bn: max(0, D_tt1[bn]) for bn in bNAMES} #always positive

dem_sim = DemandSimulator(seed=1234)
DemandData = []
for i in range(SIM_T):
  d_e = list(dem_sim.simulate().values())
  DemandData.append(d_e)
labels = [f'{bn}_demand' for bn in bNAMES]
df = pd.DataFrame.from_records(data=DemandData, columns=labels); df[:10]

	ELA_1_demand	ELA_3_demand	SON_1_demand	SON_3_demand
0	24	19	29	12
1	28	22	23	15
2	28	12	34	14
3	27	26	30	17
4	29	11	26	13
5	22	21	22	15
6	28	14	29	15
7	29	19	33	17
8	34	22	19	15
9	31	11	29	19

## PUT BACK: COMMENTED OUT TO BE FASTER
import random
def plot_output(df1, df2):
  n_charts = len(bNAMES)
  ylabelsize = 16
  mpl.rcParams['lines.linewidth'] = 1.2
  default_colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
  fig, axs = plt.subplots(n_charts, sharex=True)
  fig.set_figwidth(13); fig.set_figheight(9)
  fig.suptitle('Demand Simulation', fontsize=20)

  for i,bn in enumerate(bNAMES):
    axs[i].set_title(f'Demanded {bn}')
    axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
    axs[i].step(df1[f'{bn}_demand'], random.choice(default_colors))
    ## axs[i].axhline(y=dem_sim.muD[e], color='k', linestyle=':')
    axs[i].axhline(y=0, color='k', linestyle=':')

  axs[i].set_xlabel('$t\ \mathrm{[daily\ windows]}$', rotation=0, ha='center', va='center', fontweight='bold', size=ylabelsize)
plot_output(df, None)

3 DATA PREPARATION

We will use the data provided by the simulator directly. There is no need to perform additional data preparation.

4 MODELING

4.1 Narrative

Please review the narrative in section 1.

In addition, unsatisfied demands are lost, i.e. there will be no ability to backlog unsatisfied demands. However, a stockout cost is incurred when demand is unsatisfied. Moreover, existing unrented vehicles in the standby pool brings about a holding cost for each item. The latter cost consists of an upkeep cost.

4.2 Core Elements

This section attempts to answer three important questions: - What metrics are we going to track? - What decisions do we intend to make? - What are the sources of uncertainty?

For this problem, the only metric we are interested in is the amount of profit we make after each decision window. A single type of decision needs to be made at the start of each window - how many cars of each type and rental type to rent from the wholesaler partner. The only source of uncertainty are the levels of demand for the four combinations.

4.3 Mathematical Model | SUS Design

A Python class is used to implement the model for the SUS (System Under Steer):

class Model():
  def __init__(self, S_0_info):
    ...
    ...

4.3.1 State variables

The state variables represent what we need to know. - $R_{t}^{A v a i l} = (R_{t b}^{A v a i l})_{b \in B}$ where $B = {β_{1}, β_{2}, β_{3}, β_{4}}$ - $R_{t b}^{A v a i l}$ = Number of resources at $t$ with attribute $b$ (in available pool) - $β_{1}$ = ELA_1 (1-day Elantras) - $β_{2}$ = ELA_3 (3-day Elantras) - $β_{3}$ = SON_1 (1-day Sonatas) - $β_{4}$ = SON_3 (3-day Sonatas)
- $R_{t} = (R_{t a})_{a \in A}$ where $A = {α_{1}, α_{2}, α_{3}, . . ., α_{12}}$ - $R_{t a}$ = Number of resources at $t$ with attribute $a$ - $α_{1}$ = ELA_1_0 (1-day Elantras in standby pool) - $α_{2}$ = ELA_1_1 (1-day Elantras out for 1 day; number of sales) - $α_{3}$ = ELA_3_0 (3-day Elantras in standby pool) - $α_{4}$ = ELA_3_1 (3-day Elantras out for 1 day; number of sales) - $α_{5}$ = ELA_3_2 (3-day Elantras out for 2 days) - $α_{6}$ = ELA_3_3 (3-day Elantras out for 3 days) - $α_{7}$ = SON_1_0 (1-day Sonatas in standby pool) - $α_{8}$ = SON_1_1 (1-day Sonatas out for 1 day; number of sales) - $α_{9}$ = SON_3_0 (3-day Sonatas in standby pool) - $α_{10}$ = SON_3_1 (3-day Sonatas out for 1 day; number of sales) - $α_{11}$ = SON_3_2 (3-day Sonatas out for 2 days) - $α_{12}$ = SON_3_3 (3-day Sonatas out for 3 days)
- $D_{t} = (D_{t b})_{b \in B}$ where $B = {β_{1}, β_{2}, β_{3}, β_{4}}$ - $D_{t b}$ = Number of demands at $t$ with attribute $b$

The state is:

$S_{t} = (R_{t}^{A v a i l}, R_{t}, D_{t}) = ((R_{t b}^{A v a i l})_{b \in B}, (R_{t a})_{a \in A}, (D_{t b})_{b \in B})$

The state variables are represented by the following variables in the Model class:

self.State = namedtuple('State', SNAMES) # 'class'
self.S_t = self.build_state(self.S_0) # 'instance'

where

SNAMES = [ #state variable names
    'RAvail_t', #available resource          
    'R_t',      #resource
    'D_t',      #demand
]

4.3.2 Decision variables

The decision variables represent what we control.

$x_{t} = (x_{t b a})_{b \in B, a \in A}$
- number of resources with attributes $b$ to be moved from the available pool of the resource with attributes $b$ , to the standby pool with the same attributes.
- $x_{t} > 0$ when resources should move to the standby pool
- $x_{t} < 0$ when resources should move from the standby pool
Decisions are made with a policy (TBD below):
- $X^{π} (S_{t})$

The decision variables are represented by the following variables in the Model class:

self.Decision = namedtuple('Decision', xNAMES) # 'class'

where

xNAMES = ['x_t'] #decision variable names

4.3.3 Exogenous information variables

The exogenous information variables represent what we did not know (when we made a decision). These are the variables that we cannot control directly. The information in these variables become available after we make the decision $x_{t}$ .

When we assume that the demand in each time period is revealed, without any model to predict the demand based on past demands, we have, using approach 1:

$\begin{aligned} D_{t + 1} & = W_{t + 1} \\ = {\hat{D}}_{t + 1} \end{aligned}$

Alternatively, when we assume that we observe the change in demand ${\hat{D}}_{t + 1} = p_{t + 1} - p_{t}$ , we have, using approach 2:

$\begin{aligned} D_{t + 1} & = D_{t} + W_{t + 1} \\ = D_{t} + {\hat{D}}_{t + 1} \end{aligned}$

We will make use of approach 1 which means that the exogenous information, $W_{t + 1}$ , is the directly observed demand of the resource.

The exogenous information is obtained by a call to

DemandSimulator.simulate(...)

4.3.4 Transition function

The transition function describe how the state variables evolve over time. We have the equations:

$\begin{aligned} R_{t + 1}^{A v a i l} & = max (0, (R_{t b}^{A v a i l} - x_{t b a})) (E q . 1) \\ R_{t + 1} & = R_{t} + x_{t b a} (E q . 2) \\ D_{t + 1} & = {\hat{D}}_{t + 1, b} \\ R_{t + 1} & = R_{t + 1} - min (R_{t + 1} + D_{t + 1}) (E q . 3) \end{aligned}$

Eq. 1 handles the update of the available pool levels. Eq. 2 handles the updating of the standby pools. Eq. 3 updates the standby pools due to sales.

There are some more equations to handle the updating of the $a$ attribute vectors at the end of the day that are not shown here.

Collectively, they represent the general transition function:

$S_{t + 1} = S^{M} (S_{t}, X^{π} (S_{t}))$

4.3.5 Objective function

The objective function captures the performance metrics of the solution to the problem.

First, let us state the stockout and holding costs:

$\begin{aligned} c_{b}^{s o u t} & = p_{b}^{r e n t} max {0, D_{t b} - R_{t, b_0}} \\ c_{b}^{h o l d} & = c_{b}^{u p k e e p} R_{t, b_0}} \end{aligned}$ Each of these costs will have to be subtracted from the contribution, $C$ .

We can write the state-dependant reward (also called contribution):

$\begin{array}{r} C_{b} (S_{t}, x_{t}) = p_{b}^{r e n t} min (R_{t + 1, b_{1}_0} + D_{t + 1, b_0}) - c_{b}^{s o u t} - c_{b}^{h o l d} \end{array}$

We have the objective function:

$max_{π} E {\sum_{t = 0}^{T} C (S_{t}, x_{t}, W_{t + 1})}$

The learned parameters are: - $θ_{a 1}^{A d j 1}$ - $a_{1}$ = CAR_TYPE - value around which the 1-day rental standby pool level is adjusted - $θ_{a 1}^{A d j 3}$ - $a_{1}$ = CAR_TYPE - value around which the 3-day rental standby pool level is adjusted

4.3.6 Implementation of the System Under Steer (SUS) Model

class Model():
  def __init__(self, W_fn=None, S__M_fn=None, C_fn=None):
    self.S_0_info = {
      'RAvail_t': {
          'ELA_1': AVAIL_POOLS['ELA_1'] - STANDBY_POOLS['ELA_1_0'],
          'ELA_3': AVAIL_POOLS['ELA_3'] - STANDBY_POOLS['ELA_3_0'],
          'SON_1': AVAIL_POOLS['SON_1'] - STANDBY_POOLS['SON_1_0'],
          'SON_3': AVAIL_POOLS['SON_3'] - STANDBY_POOLS['SON_3_0']},
      'R_t': {
          'ELA_1_0': STANDBY_POOLS['ELA_1_0'],
          'ELA_1_1': 0,
          'ELA_3_0': STANDBY_POOLS['ELA_3_0'],
          'ELA_3_1': 0,
          'ELA_3_2': 0,
          'ELA_3_3': 0,
          'SON_1_0': STANDBY_POOLS['SON_1_0'],
          'SON_1_1': 0,
          'SON_3_0': STANDBY_POOLS['SON_3_0'],
          'SON_3_1': 0,
          'SON_3_2': 0,
          'SON_3_3': 0},
      'D_t': {'ELA_1': 0, 'ELA_3': 0, 'SON_1': 0, 'SON_3': 0}
    }
    self.State = namedtuple('State', SNAMES) #. 'class'
    self.S_t = self.build_state(self.S_0_info) #. 'instance'
    self.Decision = namedtuple('Decision', xNAMES) #. 'class'
    self.x_t = self.build_decision( #. 'instance'
      info = {
        'x_t': {
          'ELA_1___ELA_1_0': 0,
          'ELA_3___ELA_3_0': 0,
          'SON_1___SON_1_0': 0,
          'SON_3___SON_3_0': 0}
      })
    self.Ccum = 0.0 #. cumulative reward

  def reset(self):
    self.Ccum = 0.0
    self.S_t = self.build_state(self.S_0)

  def build_state(self, info):
    return self.State(*[info[sn] for sn in SNAMES])

  def build_decision(self, info):
    return self.Decision(*[info[xn] for xn in xNAMES])

  # exogenous information, dependent on a random process,
  # the directly observed demands
  def W_fn(self, t):
    return SIM.simulate()

  def S__M_fn(self, t, S_t, x_t, W_tt1):
    ## D_t #direct approach
    for bn in bNAMES:
      S_t.D_t[bn] = W_tt1[bn]

    ## R_t & Ccum
    if S_t.R_t['ELA_1_0'] > 0:
      sales = min(S_t.R_t['ELA_1_0'], S_t.D_t['ELA_1'])
      S_t.R_t['ELA_1_1'] = sales; #print(f'%%% ELA_1_1 sales= {sales}')
      S_t.R_t['ELA_1_0'] -= sales
      self.Ccum += p__rent['ELA_1']*sales
      c__sout = p__rent['ELA_1']*max(0, S_t.D_t['ELA_1'] - S_t.R_t['ELA_1_0'])
      self.Ccum -= c__sout
      c__hold = c__upkeep['ELA']*S_t.R_t['ELA_1_0'] #bench cost
      self.Ccum -= c__hold
    if S_t.R_t['ELA_3_0'] > 0:
      sales = min(S_t.R_t['ELA_3_0'], S_t.D_t['ELA_3'])
      S_t.R_t['ELA_3_1'] = sales; #print(f'%%% ELA_3_1 sales= {sales}')
      S_t.R_t['ELA_3_0'] -= sales
      self.Ccum += p__rent['ELA_3']*sales
      c__sout = p__rent['ELA_3']*max(0, S_t.D_t['ELA_3'] - S_t.R_t['ELA_3_0'])
      self.Ccum -= c__sout
      c__hold = c__upkeep['ELA']*S_t.R_t['ELA_3_0'] #bench cost
      self.Ccum -= c__hold
    if S_t.R_t['SON_1_0'] > 0:
      sales = min(S_t.R_t['SON_1_0'], S_t.D_t['SON_1'])
      S_t.R_t['SON_1_1'] = sales; #print(f'%%% SON_1_1 sales= {sales}')
      S_t.R_t['SON_1_0'] -= sales
      self.Ccum += p__rent['SON_1']*sales
      c__sout = p__rent['SON_1']*max(0, S_t.D_t['SON_1'] - S_t.R_t['SON_1_0'])
      self.Ccum -= c__sout
      c__hold = c__upkeep['SON']*S_t.R_t['SON_1_0'] #bench cost
      self.Ccum -= c__hold
    if S_t.R_t['SON_3_0'] > 0:
      sales = min(S_t.R_t['SON_3_0'], S_t.D_t['SON_3'])
      S_t.R_t['SON_3_1'] = sales; #print(f'%%% SON_3_1 sales= {sales}')
      self.Ccum += p__rent['SON_3']*sales
      S_t.R_t['SON_3_0'] -= sales
      c__sout = p__rent['SON_3']*max(0, S_t.D_t['SON_3'] - S_t.R_t['SON_3_0'])
      self.Ccum -= c__sout
      c__hold = c__upkeep['SON']*S_t.R_t['SON_3_0'] #bench cost
      self.Ccum -= c__hold

    # a capture here shows pool level below actuals coz have not returned
    # to base yet, but does show sales
    # showing later, after return to base, give proper pool level, but
    # does not show any sales
    record_t = [t] + \
      [S_t.RAvail_t[bn] for bn in bNAMES] + \
      [S_t.R_t[an] for an in aNAMES] + \
      [S_t.D_t[bn] for bn in bNAMES] + \
      [self.Ccum] + \
      [x_t.x_t[ban] for ban in baNAMES]

    S_t.R_t['ELA_1_0'] += S_t.R_t['ELA_1_1'] #cars returning
    S_t.R_t['ELA_1_1'] = 0
    S_t.R_t['ELA_3_0'] = S_t.R_t['ELA_3_3'] #cars returning
    S_t.R_t['ELA_3_3'] = S_t.R_t['ELA_3_2']
    S_t.R_t['ELA_3_2'] = S_t.R_t['ELA_3_1']
    S_t.R_t['ELA_3_1'] = 0

    S_t.R_t['SON_1_0'] += S_t.R_t['SON_1_1'] #cars returning
    S_t.R_t['SON_1_1'] = 0
    S_t.R_t['SON_3_0'] = S_t.R_t['SON_3_3'] #cars returning
    S_t.R_t['SON_3_3'] = S_t.R_t['SON_3_2']
    S_t.R_t['SON_3_2'] = S_t.R_t['SON_3_1']
    S_t.R_t['SON_3_1'] = 0

    # overlay standby pool values after returned to pool
    # note that the 3-day pools will still under-read as some cars have
    # not been returned yet
    record_t[5] = S_t.R_t['ELA_1_0']
    record_t[7] = S_t.R_t['ELA_3_0']
    record_t[11] = S_t.R_t['SON_1_0']
    record_t[13] = S_t.R_t['SON_3_0']
    return record_t

  def C_fn(self, S_t, x_t, W_tt1, theta):
    return

  def step(self, t, x_t, theta):
    ## IND = '\t\t'
    ## print(f"{IND}..... M. step() .....\n{t=}\n{theta=}")
    ## for abn in abNAMES: print(f'x_t.x_t[{abn}]= {x_t.x_t[abn]}')
    W_tt1 = self.W_fn(t)

    ## update state & reward
    record_t = self.S__M_fn(t, self.S_t, x_t, W_tt1)
    return record_t

4.4 Uncertainty Model

We will simulate the rental demand vector $D_{t + 1} = (D_{t + 1, E L A_1}, D_{t + 1, E L A_3}, D_{t + 1, S O N_1}, D_{t + 1, S O N_3})$ as described in section 2.

4.5 Policy Design

There are two main meta-classes of policy design. Each of these has two subclasses: - Policy Search - Policy Function Approximations (PFAs) - Cost Function Approximations (CFAs) - Lookahead - Value Function Approximations (VFAs) - Direct Lookaheads (DLAs)

In this project we will only use one approach: - A simple adjust-around parameterized policy (from the PFA class) which can be summarized as:

if $R_{t, b_0} > θ_{b 1}^{A d j 1}$ :
- make decision to reduce 1-day standby pool by 10
else:
- make decision to increase 1-day standby pool level at previous demand
if $R_{t, b_0} > θ_{b 1}^{A d j 3}$ :
- make decision to reduce 3-day standby pool by 10
else:
- make decision to increase 3-day standby pool level at previous demand

Note that: - $b_{1}$ = CAR_TYPE

4.5.1 Implementation of Policy Design

import random
class Policy():
  def __init__(self, model):
    self.model = model
    self.Policy = namedtuple('Policy', piNAMES) #. 'class'
    self.Theta = namedtuple('Theta', thNAMES) #. 'class'

  def build_policy(self, info):
    return self.Policy(*[info[pin] for pin in piNAMES])

  def build_theta(self, info):
    return self.Theta(*[info[thn] for thn in thNAMES])

  ## Each demand comes from its own pool, no cross supplies
  def X__AdjAround(self, t, S_t, theta):
    info = {
      'x_t': {'ELA_1___ELA_1_0': 0, 'ELA_3___ELA_3_0': 0,
              'SON_1___SON_1_0': 0, 'SON_3___SON_3_0': 0}
    }
    if t >= T:
      print(f"ERROR: t={t} should not reach or exceed the max steps ({T})")
      return self.model.build_decision(info)

    if S_t.R_t['ELA_1_0'] > theta.thAdj1['ELA']:
      ##return from standby pool: excess above theta
      info['x_t']['ELA_1___ELA_1_0'] = -(10)
      ## info['x_t']['ELA_1___ELA_1_0'] = -(S_t.R_t['ELA_1_0'] - theta.thAdj1['ELA'])
    else:
      ##move to standby pool: best est. of demand up to tonight is yesterday's demand
      info['x_t']['ELA_1___ELA_1_0'] = S_t.D_t['ELA_1']

    if S_t.R_t['ELA_3_0'] > theta.thAdj3['ELA']:
      ##return from standby pool: excess above theta
      info['x_t']['ELA_3___ELA_3_0'] = -10
      ## info['x_t']['ELA_3___ELA_3_0'] = -(S_t.R_t['ELA_3_0'] - theta.thAdj3['ELA'])
    else:
      ##move to standby pool: best est. of demand up to tonight is yesterday's demand
      info['x_t']['ELA_3___ELA_3_0'] = S_t.D_t['ELA_3']

    if S_t.R_t['SON_1_0'] > theta.thAdj1['SON']:
      ##return from standby pool: excess above theta
      info['x_t']['SON_1___SON_1_0'] = -10
      ## info['x_t']['SON_1___SON_1_0'] = -(S_t.R_t['SON_1_0'] - theta.thAdj1['SON'])
    else:
      ##move to standby pool: best est. of demand up to tonight is yesterday's demand
      info['x_t']['SON_1___SON_1_0'] = S_t.D_t['SON_1']

    if S_t.R_t['SON_3_0'] > theta.thAdj3['SON']:
      ##return from standby pool: excess above theta
      info['x_t']['SON_3___SON_3_0'] = -10
      ## info['x_t']['SON_3___SON_3_0'] = -(S_t.R_t['SON_3_0'] - theta.thAdj3['SON'])
    else:
      ##move to standby pool: best est. of demand up to tonight is yesterday's demand
      info['x_t']['SON_3___SON_3_0'] = S_t.D_t['SON_3']

    return self.model.build_decision(info)

  def perform_decision(self, x_t):
    self.model.S_t.RAvail_t['ELA_1'] -= x_t.x_t['ELA_1___ELA_1_0']
    self.model.S_t.RAvail_t['ELA_1'] = max(0, self.model.S_t.RAvail_t['ELA_1'])
    self.model.S_t.R_t['ELA_1_0'] += x_t.x_t['ELA_1___ELA_1_0']

    self.model.S_t.RAvail_t['ELA_3'] -= x_t.x_t['ELA_3___ELA_3_0']
    self.model.S_t.RAvail_t['ELA_3'] = max(0, self.model.S_t.RAvail_t['ELA_3'])
    self.model.S_t.R_t['ELA_3_0'] += x_t.x_t['ELA_3___ELA_3_0']

    self.model.S_t.RAvail_t['SON_1'] -= x_t.x_t['SON_1___SON_1_0']
    self.model.S_t.RAvail_t['SON_1'] = max(0, self.model.S_t.RAvail_t['SON_1'])
    self.model.S_t.R_t['SON_1_0'] += x_t.x_t['SON_1___SON_1_0']

    self.model.S_t.RAvail_t['SON_3'] -= x_t.x_t['SON_3___SON_3_0']
    self.model.S_t.RAvail_t['SON_3'] = max(0, self.model.S_t.RAvail_t['SON_3'])
    self.model.S_t.R_t['SON_3_0'] += x_t.x_t['SON_3___SON_3_0']

  def run_grid_sample_paths(self, theta, piName, record):
    CcumIomega__lI = []
    for l in range(1, L + 1): #for each sample-path
      ## print(f'%%% {l=}')
      ## M = copy(self.model)
      ## M = Model(S_0_INFO)
      M = Model()
      ## P = Policy(M) #NO!, overwrite existing global P
      self.model = M
      record_l = [piName, theta, l]
      for t in range(T): #for each transition/step
        ## print(f'\t%%% {t=}')
        ## >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
        x_t = getattr(self, piName)(t, self.model.S_t, theta) #lookup (new) today's decision
        self.perform_decision(x_t) #implement (new) today's decision

        # sit in post-decision state until end of cycle (evening)

        ## S_t, Ccum, x_t = self.model.step(t, x_t, theta)
        record_t = self.model.step(t, x_t, theta) #change from today to tomorrow
        ## >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
        record.append(record_l + record_t)
      CcumIomega__lI.append(self.model.Ccum) #just above (SDAM-eq2.9)
    return CcumIomega__lI

  def perform_grid_search_sample_paths(self, piName, thetas):
    Cbarcum = defaultdict(float)
    Ctilcum = defaultdict(float)
    expCbarcum = defaultdict(float)
    expCtilcum = defaultdict(float)
    numThetas = len(thetas)
    record = []
    print(f'{numThetas=:,}'); #print(f'{thetas=}')
    nth = 100
    i = 0; print(f'... printing every {nth}th theta (if considered) ...')
    for theta in thetas:
      if True: ##in case relationships between thetas can be exploited
      ## if((theta.thAdj['ELA'] < theta.thMax['ELA']) and \
      ##    (theta.thAdj['SON'] < theta.thMax['SON'])):
        ## a dict cannot be used as a key, so we define theta_key, e.g.
        ## theta_key = ((168.0, 72.0), (200.0, 90.0)):
        theta_key = tuple(tuple(itm.values()) for itm in theta)
        ## theta_key = theta ##if theta is not a dict
        if i%nth == 0: print(f'=== ({i:,} / {numThetas:,}), theta={theta} ===')

        ## >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
        CcumIomega__lI = self.run_grid_sample_paths(theta, piName, record)
        ## >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

        Cbarcum_tmp = np.array(CcumIomega__lI).mean() #(SDAM-eq2.9)
        Ctilcum_tmp = np.sum(np.square(np.array(CcumIomega__lI) - Cbarcum_tmp))/(L - 1)

        Cbarcum[theta_key] = Cbarcum_tmp
        Ctilcum[theta_key] = np.sqrt(Ctilcum_tmp/L)

        expCbarcum_tmp = pd.Series(CcumIomega__lI).expanding().mean()
        expCbarcum[theta_key] = expCbarcum_tmp

        expCtilcum_tmp = pd.Series(CcumIomega__lI).expanding().std()
        expCtilcum[theta_key] = expCtilcum_tmp
        i += 1
      ##endif
    best_theta = max(Cbarcum, key=Cbarcum.get)
    worst_theta = min(Cbarcum, key=Cbarcum.get)

    best_Cbarcum = Cbarcum[best_theta]
    best_Ctilcum = Ctilcum[best_theta]

    worst_Cbarcum = Cbarcum[worst_theta]
    worst_Ctilcum = Ctilcum[worst_theta]

    thetaStar_expCbarcum = expCbarcum[best_theta]
    thetaStar_expCtilcum = expCtilcum[best_theta]
    thetaStar_expCtilcum[0] = 0 ##set NaN to 0

    best_theta_w_names = tuple((
      ({
        a1NAMES[0]: subvec[0],
        a1NAMES[1]: subvec[1]
      })) for subvec in best_theta)
    ## best_theta_n = self.build_theta({'thAdj': best_theta_w_names[0]})
    best_theta_n = self.build_theta({'thAdj1': best_theta_w_names[0], 'thAdj3': best_theta_w_names[1]})
    print(f'best_theta_n:\n{best_theta_n}\n{best_Cbarcum=:.2f}\n{best_Ctilcum=:.2f}')

    worst_theta_w_names = tuple((
      ({
        a1NAMES[0]: subvec[0],
        a1NAMES[1]: subvec[1]})) for subvec in worst_theta)
    ## worst_theta_n = self.build_theta({'thAdj': worst_theta_w_names[0]})
    worst_theta_n = self.build_theta({'thAdj1': worst_theta_w_names[0], 'thAdj3': worst_theta_w_names[1]})
    print(f'worst_theta_n:\n{worst_theta_n}\n{worst_Cbarcum=:.2f}\n{worst_Ctilcum=:.2f}')

    return \
      thetaStar_expCbarcum, thetaStar_expCtilcum, \
      Cbarcum, Ctilcum, \
      best_theta_n, worst_theta_n, \
      best_Cbarcum, worst_Cbarcum, \
      best_Ctilcum, worst_Ctilcum, \
      record

  ## dispatch {prepend @}
  # def grid_search_theta_values(self, thetas0): #. using vectors reduces loops in perform_grid_search_sample_paths()
  #     thetas = [(th0,) for th0 in thetas0]
  #     return thetas

  ## dispatch {prepend @}
  # def grid_search_theta_values(self, thetas0, thetas1): #. using vectors reduces loops in perform_grid_search_sample_paths()
  #     thetas = [(th0, th1) for th0 in thetas0 for th1 in thetas1]
  #     return thetas

  ## dispatch {prepend @}
  # def grid_search_theta_values(self, thetas0, thetas1, thetas2): #. using vectors reduces loops in perform_grid_search_sample_paths()
  #     thetas = [(th0, th1, th2) for th0 in thetas0 for th1 in thetas1 for th2 in thetas2]
  #     return thetas

  ## EXAMPLE:
  ## thetasA: Buy
  ## thetasA_name: 'thBuy'
  ## names: ELA
  ## 1_1: 1 theta sub-vectors, each having 1 theta
  ## thetas = grid_search_thetas_1_2(thetasBuy 'thBuy', CAR_TYPES)
  def grid_search_thetas_1_1(self, thetasA, thetasA_name, names):
    thetas = [
    self.build_theta({thetasA_name: {names[0]: thA0}})
    for thA0 in thetasA[names[0]]
    ]
    return thetas

  ## EXAMPLE:
  ## thetasA: Buy
  ## thetasA_name: 'thBuy'
  ## names: ELA, SON
  ## 1_2: 1 theta sub-vectors, each having 2 thetas
  ## thetas = grid_search_thetas_1_2(thetasBuy 'thBuy', CAR_TYPES)
  def grid_search_thetas_1_2(self, thetasA, thetasA_name, names):
    thetas = [
    self.build_theta({thetasA_name: {names[0]: thA0, names[1]: thA1}})
    for thA0 in thetasA[names[0]]
      for thA1 in thetasA[names[1]]
    ]
    return thetas

  ## EXAMPLE:
  ## thetasA: Adj
  ## thetasA_name: 'thAdj'
  ## names: ELA, SON
  ## 1_4: 1 theta sub-vectors, each having 4 thetas
  ## thetas = grid_search_thetas_1_4(thetasBuy 'thAdj', bNAMES)
  def grid_search_thetas_1_4(self, thetasA, thetasA_name, names):
    thetas = [
    self.build_theta({thetasA_name: {names[0]: thA0, names[1]: thA1, names[2]: thA2, names[3]: thA3}})
    for thA0 in thetasA[names[0]]
      for thA1 in thetasA[names[1]]
        for thA2 in thetasA[names[2]]
          for thA3 in thetasA[names[3]]
    ]
    return thetas

  ## EXAMPLE:
  ## thetasA: Buy
  ## thetasB: Max
  ## thetasA_name: 'thBuy'
  ## thetasB_name: 'thMax'
  ## names: ELA
  ## 2_1: 2 theta sub-vectors, each having 1 theta
  ## thetas = grid_search_thetas_2_1(thetasBuy, thetasMax, 'thBuy', 'thMax', CAR_TYPES)
  def grid_search_thetas_2_1(self, thetasA, thetasB, thetasA_name, thetasB_name, names):
    thetas = [
    self.build_theta({thetasA_name: {names[0]: thA0}, thetasB_name: {names[0]: thB0}})
    for thA0 in thetasA[names[0]]
      for thB0 in thetasB[names[0]]
    ]
    return thetas

  ## EXAMPLE:
  ## thetasA: Buy
  ## thetasB: Max
  ## thetasA_name: 'thBuy'
  ## thetasB_name: 'thMax'
  ## names: ELA, SON
  ## 2_2: 2 theta sub-vectors, each having 2 thetas
  ## thetas = grid_search_thetas_4(thetasBuy, thetasMax, 'thBuy', 'thMax', CAR_TYPES)
  def grid_search_thetas_2_2(self, thetasA, thetasB, thetasA_name, thetasB_name, names):
    thetas = [
    self.build_theta({thetasA_name: {names[0]: thA0, names[1]: thA1}, thetasB_name: {names[0]: thB0, names[1]: thB1}})
    for thA0 in thetasA[names[0]]
      for thA1 in thetasA[names[1]]
        for thB0 in thetasB[names[0]]
          for thB1 in thetasB[names[1]]
    ]
    return thetas


  ############################################################################
  ### PLOTTING
  ############################################################################
  def round_theta(self, complex_theta):
    thetas_rounded = []
    for theta in complex_theta:
      evalues_rounded = []
      for _, evalue in theta.items():
        evalues_rounded.append(float(f"{evalue:f}"))
      thetas_rounded.append(tuple(evalues_rounded))
    return str(tuple(thetas_rounded))

  def plot_Fhat_map_2(self,
      FhatI_theta_I,
      thetasX, thetasY, labelX, labelY, title):
      Fhat_values = [
        FhatI_theta_I[
          ## (thetaX,thetaY)
          ((thetaX,),(thetaY,))
        ]
          for thetaY in thetasY for thetaX in thetasX
      ]
      Fhats = np.array(Fhat_values)
      increment_count = len(thetasX)
      Fhats = np.reshape(Fhats, (-1, increment_count))#.

      fig, ax = plt.subplots()
      im = ax.imshow(Fhats, cmap='hot', origin='lower', aspect='auto')
      ## create colorbar
      cbar = ax.figure.colorbar(im, ax=ax)
      ## cbar.ax.set_ylabel(cbarlabel, rotation=-90, va="bottom")

      ax.set_xticks(np.arange(0, len(thetasX), 5))#.
      ## ax.set_xticks(np.arange(len(thetasX)))

      ax.set_yticks(np.arange(0, len(thetasY), 5))#.
      ## ax.set_yticks(np.arange(len(thetasY)))

      ## NOTE: round tick labels, else very messy
      ## function round() does not work, have to do this way
      thetasX_form = [f'{th:.0f}' for th in thetasX]
      thetasY_form = [f'{th:.0f}' for th in thetasY]

      ax.set_xticklabels(thetasX[::5])
      ## ax.set_xticklabels(thetasX); ax.set_xticklabels(thetasX_form)

      ax.set_yticklabels(thetasY[::5])
      ## ax.set_yticklabels(thetasY); ax.set_yticklabels(thetasY_form)

      ## rotate the tick labels and set their alignment.
      ## plt.setp(ax.get_xticklabels(), rotation=45, ha="right",rotation_mode="anchor")

      ax.set_title(title)
      ax.set_xlabel(labelX)
      ax.set_ylabel(labelY)

      ## fig.tight_layout()
      plt.show()
      return True

  def plot_Fhat_map_4(self,
      FhatI_theta_I,
      thetasX, thetasY, labelX, labelY, title,
      thetaFixed1, thetaFixed2):
      ## Fhat_values = [FhatI_theta_I[(thetaX,thetaY)] for thetaY in thetasY for thetaX in thetasX]
      Fhat_values = [
        FhatI_theta_I[((thetaX,thetaY), (thetaFixed1,thetaFixed2))]
        for thetaY in thetasY
          for thetaX in thetasX]
      Fhats = np.array(Fhat_values)
      increment_count = len(thetasX)
      Fhats = np.reshape(Fhats, (-1, increment_count))#.

      fig, ax = plt.subplots()
      im = ax.imshow(Fhats, cmap='hot', origin='lower', aspect='auto')
      ## create colorbar
      cbar = ax.figure.colorbar(im, ax=ax)
      ## cbar.ax.set_ylabel(cbarlabel, rotation=-90, va="bottom")

      ax.set_xticks(np.arange(0, len(thetasX), 5))#.
      ## ax.set_xticks(np.arange(len(thetasX)))

      ax.set_yticks(np.arange(0, len(thetasY), 5))#.
      ## ax.set_yticks(np.arange(len(thetasY)))

      ## NOTE: round tick labels, else very messy
      ## function round() does not work, have to do this way
      ## thetasX_form = [f'{th:.1f}' for th in thetasX]
      ## thetasY_form = [f'{th:.1f}' for th in thetasY]

      ax.set_xticklabels(thetasX[::5])#.
      ## ax.set_xticklabels(thetasX)
      ## ax.set_xticklabels(thetasX_form)

      ax.set_yticklabels(thetasY[::5])#.
      ## ax.set_yticklabels(thetasY)
      ## ax.set_yticklabels(thetasY_form)

      ## rotate the tick labels and set their alignment.
      ## plt.setp(ax.get_xticklabels(), rotation=45, ha="right",rotation_mode="anchor")

      ax.set_title(title)
      ax.set_xlabel(labelX)
      ax.set_ylabel(labelY)

      ## fig.tight_layout()
      plt.show()
      return True

  ## color_style examples: 'r-', 'b:', 'g--'
  def plot_Fhat_chart(self, FhatI_theta_I, thetasX, labelX, labelY, title, color_style, thetaStar):
      mpl.rcParams['lines.linewidth'] = 1.2
      xylabelsize = 16
      ## plt.figure(figsize=(13, 9))
      plt.figure(figsize=(13, 4))
      plt.title(title, fontsize=20)
      Fhats = FhatI_theta_I.values()
      plt.plot(thetasX, Fhats, color_style)
      plt.axvline(x=thetaStar, color='k', linestyle=':')
      plt.xlabel(labelX, rotation=0, labelpad=10, ha='right', va='center', fontweight='bold', size=xylabelsize)
      plt.ylabel(labelY, rotation=0, labelpad=1, ha='right', va='center', fontweight='normal', size=xylabelsize)
      plt.show()

  ## expanding Fhat chart
  def plot_expFhat_chart(self, df, labelX, labelY, title, color_style):
    mpl.rcParams['lines.linewidth'] = 1.2
    xylabelsize = 16
    plt.figure(figsize=(13, 4))
    plt.title(title, fontsize=20)
    plt.plot(df, color_style)
    plt.xlabel(labelX, rotation=0, labelpad=10, ha='right', va='center', fontweight='bold', size=xylabelsize)
    plt.ylabel(labelY, rotation=0, labelpad=1, ha='right', va='center', fontweight='normal', size=xylabelsize)
    plt.show()

  ## expanding Fhat charts
  def plot_expFhat_charts(self, means, stdvs, labelX, labelY, suptitle, pars=defaultdict(str)):
    n_charts = 2
    xlabelsize = 14
    ylabelsize = 14
    mpl.rcParams['lines.linewidth'] = 1.2
    default_colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
    fig, axs = plt.subplots(n_charts, sharex=True)
    fig.set_figwidth(13); fig.set_figheight(9)
    fig.suptitle(suptitle, fontsize=18)

    xi = 0
    legendlabels = []
    axs[xi].set_title(r"$exp\bar{C}^{cum}(\theta^*)$", loc='right', fontsize=16)
    for i,itm in enumerate(means.items()):
      axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
      leg = axs[xi].plot(itm[1], color=pars['colors'][i])
      legendlabels.append(itm[0])
    axs[xi].set_ylabel(labelY, rotation=0, ha='right', va='center', fontweight='normal', size=ylabelsize)

    xi = 1
    axs[xi].set_title(r"$exp\tilde{C}^{cum}(\theta^*)$", loc='right', fontsize=16)
    for i,itm in enumerate(stdvs.items()):
      axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
      ## leg = axs[xi].plot(itm[1], default_colors[i], linestyle='--')
      leg = axs[xi].plot(itm[1], pars['colors'][i], linestyle='--')
    axs[xi].set_ylabel(labelY, rotation=0, ha='right', va='center', fontweight='normal', size=ylabelsize)

    fig.legend(
      ## [leg],
      labels=legendlabels,
      title="Policies",
      loc='upper right',
      fancybox=True,
      shadow=True,
      ncol=1)
    plt.xlabel(labelX, rotation=0, labelpad=10, ha='right', va='center', fontweight='normal', size=xlabelsize)
    plt.show()

  def plot_records(self, df, df_non, pars=defaultdict(str)):
    legendlabels = [r'$\mathrm{opt}$', r'$\mathrm{non}$']
    n_ba = len(baNAMES)
    n_a = len(aNAMES)
    n_b = len(bNAMES)
    n_charts = n_ba + n_b + n_b + n_a + 1
    ylabelsize = 14
    mpl.rcParams['lines.linewidth'] = 1.2
    mycolors = ['g', 'b']
    fig, axs = plt.subplots(n_charts, sharex=True)
    ## fig.set_figwidth(13); fig.set_figheight(9)
    fig.set_figwidth(13); fig.set_figheight(20)
    fig.suptitle(pars['suptitle'], fontsize=14)

    xi = 0
    for i,ba in enumerate(baNAMES):
      axs[xi+i].set_ylim(auto=True); axs[xi+i].spines['top'].set_visible(False); axs[xi+i].spines['right'].set_visible(True); axs[xi+i].spines['bottom'].set_visible(False)
      axs[xi+i].step(df[f'x_t_{ba}'], 'm-', where='post')
      if not df_non is None: axs[xi+i].step(df_non[f'x_t_{ba}'], 'c-.', where='post')
      axs[xi+i].axhline(y=0, color='k', linestyle=':')
      bal = ba.split("___");
      bl = bal[0].split('_'); bl = bl[0]+'\_'+bl[1]; al = bal[1].split('_'); al = al[0]+'\_'+al[1]+'\_'+al[2]
      y1ab = '$x_{t,'+f'{",".join([bl, al])}'+'}$'
      axs[xi+i].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      for j in range(df.shape[0]//T): axs[xi+i].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = n_ba
    for i,b in enumerate(bNAMES):
      bl = b.split('_'); bl = bl[0]+'\_'+bl[1];
      y1ab = '$D_{t,'+f'{bl}'+'}$'; p1ab = r'$\mu^{'+f'{bl}'+'}$'
      axs[xi+i].set_ylim(auto=True); axs[xi+i].spines['top'].set_visible(False); axs[xi+i].spines['right'].set_visible(True); axs[xi+i].spines['bottom'].set_visible(False)
      axs[xi+i].step(df[f'D_t_{b}'], mycolors[xi%len(mycolors)], where='post')
      if not df_non is None: axs[xi+i].step(df_non[f'D_t_{b}'], 'c-.', where='post')
      axs[xi+i].axhline(y=SIM.muD[b], color='k', linestyle=':')
      axs[xi+i].text(-4, SIM.muD[b], p1ab, size=16, color='k')
      ## if(pars['thetaAdjNon']): axs[xi+i].text(-4, pars['thetaAdjNon'][aNAMES[b]], r'$\theta^{Adj}$' + f"={pars['thetaAdjNon'][aNAMES[b]]:.1f}", size=10, color='c')
      ## if(pars['thetaAdjNon']): axs[xi+i].axhline(y=pars['thetaAdjNon'][aNAMES[b]], color='c', linestyle=':')
      ## if(pars['thetaMaxNon']): axs[xi+i].text(-4, pars['thetaMaxNon'][aNAMES[b]], r'$\theta^{Max}$' + f"={pars['thetaMaxNon'][aNAMES[b]]:.1f}", size=10, color='c')
      ## if(pars['thetaMaxNon']): axs[xi+i].axhline(y=pars['thetaMaxNon'][aNAMES[b]], color='c', linestyle=':')
      axs[xi+i].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      for j in range(df.shape[0]//T): axs[xi+i].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = n_ba + n_b
    for i,b in enumerate(bNAMES):
      axs[xi+i].set_ylim(auto=True); axs[xi+i].spines['top'].set_visible(False); axs[xi+i].spines['right'].set_visible(True); axs[xi+i].spines['bottom'].set_visible(False)
      axs[xi+i].step(df[f'RAvail_t_{b}'], 'm-', where='post')
      ## axs[xi+i].step(df[f"R_t_{b+'_0'}"], 'k-', where='post') #show standby pool
      if not df_non is None: axs[xi+i].step(df_non[f'RAvail_t_{b}'], 'c-.', where='post')
      axs[xi+i].axhline(y=0, color='k', linestyle=':')
      bl = b.split('_'); bl = bl[0]+'\_'+bl[1];
      y1ab = '$R^{Avail}_{t,'+f'{bl}'+'}$'; p1ab = r'$\mu^{'+f'{bl}'+'}$'
      axs[xi+i].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      for j in range(df.shape[0]//T): axs[xi+i].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = n_ba + n_b + n_b
    for i,a in enumerate(aNAMES):
      axs[xi+i].set_ylim(auto=True); axs[xi+i].spines['top'].set_visible(False); axs[xi+i].spines['right'].set_visible(True); axs[xi+i].spines['bottom'].set_visible(False)
      axs[xi+i].step(df[f'R_t_{a}'], 'm-', where='post')
      if(pars['thetaAdj1']) and (pars['thetaAdj3']):
        acomps = a.split('_')
        if acomps[2] == '0':
          if acomps[0] == 'ELA' and acomps[1] == '1':
            axs[xi+i].text(-4, pars['thetaAdj1']['ELA'], r'$\theta^{Adj1}_{ELA}$' + f"={pars['thetaAdj1']['ELA']:.1f}", size=10, color='r')
            axs[xi+i].axhline(y=pars['thetaAdj1']['ELA'], color='r', linestyle=':')
          elif acomps[0] == 'ELA' and acomps[1] == '3':
            axs[xi+i].text(-4, pars['thetaAdj3']['ELA'], r'$\theta^{Adj3}_{ELA}$' + f"={pars['thetaAdj3']['ELA']:.1f}", size=10, color='r')
            axs[xi+i].axhline(y=pars['thetaAdj3']['ELA'], color='r', linestyle=':')
          elif acomps[0] == 'SON' and acomps[1] == '1':
            axs[xi+i].text(-4, pars['thetaAdj1']['SON'], r'$\theta^{Adj1}_{SON}$' + f"={pars['thetaAdj1']['SON']:.1f}", size=10, color='r')
            axs[xi+i].axhline(y=pars['thetaAdj1']['SON'], color='r', linestyle=':')
          elif acomps[0] == 'SON' and acomps[1] == '3':
            axs[xi+i].axhline(y=pars['thetaAdj3']['SON'], color='r', linestyle=':')
            axs[xi+i].text(-4, pars['thetaAdj3']['SON'], r'$\theta^{Adj3}_{SON}$' + f"={pars['thetaAdj3']['SON']:.1f}", size=10, color='r')
          else: print('ERROR in plot_records!')
      if not df_non is None: axs[xi+i].step(df_non[f'R_t_{a}'], 'c-.', where='post')
      axs[xi+i].axhline(y=0, color='k', linestyle=':')
      al = a.split('_'); al = al[0]+'\_'+al[1]+'\_'+al[2]; y1ab = '$R_{t,'+f'{al}'+'}$'
      axs[xi+i].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      for j in range(df.shape[0]//T): axs[xi+i].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = n_ba + n_b + n_b + n_a
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df['Ccum'], 'm-', where='post')
    if not df_non is None: axs[xi].step(df_non['Ccum'], 'c-.', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    axs[xi].set_ylabel('$C^{cum}$'+'\n'+'$\mathrm{(Profit)}$'+'\n'+''+'$\mathrm{[\$]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
    for j in range(df.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    axs[xi].set_xlabel('$t\ \mathrm{[decision\ windows]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
    fig.legend(labels=legendlabels, loc='lower left', fontsize=16)

  def plot_evalu_comparison(self, df1, df2, df3, pars=defaultdict(str)):
    legendlabels = ['X__BuyBelow', 'X__Bellman']
    n_charts = 5
    ylabelsize = 14
    mpl.rcParams['lines.linewidth'] = 1.2
    fig, axs = plt.subplots(n_charts, sharex=True)
    fig.set_figwidth(13); fig.set_figheight(9)
    thetaStarStr = []
    for cmp in pars["thetaStar"]: thetaStarStr.append(f'{cmp:.1f}')
    thetaStarStr = '(' + ', '.join(thetaStarStr) + ')'
    fig.suptitle(pars['suptitle'], fontsize=14)

    xi = 0
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df1[f'x_t'], 'r-', where='post')
    axs[xi].step(df2[f'x_t'], 'g-.', where='post')
    axs[xi].step(df3[f'x_t'], 'b:', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    y1ab = '$x_{t}$'
    axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
    for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = 1
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df1[f'R_t'], 'r-', where='post')
    axs[xi].step(df2[f'R_t'], 'g-.', where='post')
    axs[xi].step(df3[f'R_t'], 'b:', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    y1ab = '$R_{t}$'
    axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
    for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = 2
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df1[f'p_t'], 'r-', where='post')
    axs[xi].step(df2[f'p_t'], 'g-.', where='post')
    axs[xi].step(df3[f'p_t'], 'b:', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')

    if(pars['lower_non']): axs[xi].text(-4, pars['lower_non'], r'$\theta^{lower}$' + f"={pars['lower_non']:.1f}", size=10, color='c')
    if(pars['lower_non']): axs[xi].axhline(y=pars['lower_non'], color='c', linestyle=':')

    if(pars['upper_non']): axs[xi].text(-4, pars['upper_non'], r'$\theta^{upper}$' + f"={pars['upper_non']:.1f}", size=10, color='c')
    if(pars['upper_non']): axs[xi].axhline(y=pars['upper_non'], color='c', linestyle=':')

    if(pars['lower']): axs[xi].text(-4, pars['lower'], r'$\theta^{lower}$' + f"={pars['lower']:.1f}", size=10, color='m')
    if(pars['lower']): axs[xi].axhline(y=pars['lower'], color='m', linestyle=':')

    if(pars['upper']): axs[xi].text(-4, pars['upper'], r'$\theta^{upper}$' + f"={pars['upper']:.1f}", size=10, color='m')
    if(pars['upper']): axs[xi].axhline(y=pars['upper'], color='m', linestyle=':')

    if(pars['alpha_non']): axs[xi].text(-4, pars['alpha_non'], r'$\theta^{alpha}$' + f"={pars['alpha_non']:.1f}", size=10, color='c')
    if(pars['alpha_non']): axs[xi].axhline(y=pars['alpha_non'], color='c', linestyle=':')

    if(pars['trackSignal_non']): axs[xi].text(-4, pars['trackSignal_non'], r'$\theta^{trackSignal}$' + f"={pars['trackSignal_non']:.1f}", size=10, color='c')
    if(pars['trackSignal_non']): axs[xi].axhline(y=pars['trackSignal_non'], color='c', linestyle=':')

    if(pars['alpha']): axs[xi].text(-4, pars['alpha'], r'$\theta^{alpha}$' + f"={pars['alpha']:.1f}", size=10, color='m')
    if(pars['alpha']): axs[xi].axhline(y=pars['alpha'], color='m', linestyle=':')

    if(pars['trackSignal']): axs[xi].text(-4, pars['trackSignal'], r'$\theta^{trackSignal}$' + f"={pars['trackSignal']:.1f}", size=10, color='m')
    if(pars['trackSignal']): axs[xi].axhline(y=pars['trackSignal'], color='m', linestyle=':')

    y1ab = '$p_{t}$'
    axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
    for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = 3
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df1['b_t_val'], 'r-', where='post')
    axs[xi].step(df2['b_t_val'], 'g-.', where='post')
    axs[xi].step(df3['b_t_val'], 'b:', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    y1ab = '$b_{t,val}$'
    axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
    for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = 4
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df1['Ccum'], 'r-', where='post')
    axs[xi].step(df2['Ccum'], 'g-.', where='post')
    axs[xi].step(df3['Ccum'], 'b:', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    axs[xi].set_ylabel('$\mathrm{cumC}$'+'\n'+'$\mathrm{(Profit)}$'+'\n'+''+'$\mathrm{[\$]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
    axs[xi].set_xlabel('$t\ \mathrm{[decision\ windows]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
    for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    fig.legend(
      ## [leg],
      labels=legendlabels,
      title="Policies",
      loc='upper right',
      fontsize=16,
      fancybox=True,
      shadow=True,
      ncol=1)

4.6 Policy Evaluation

4.6.1 Training/Tuning

## setup labels to plot info
RAvail_t_labels = ['RAvail_t_'+bn for bn in bNAMES]
R_t_labels = ['R_t_'+an for an in aNAMES]
D_t_labels = ['D_t_'+bn for bn in bNAMES]
## x_t_labels = ['x_t_'+abn for abn in abNAMES]
x_t_labels = ['x_t_'+ban for ban in baNAMES]
labels = ['piName', 'theta', 'l'] + \
  ['t'] + RAvail_t_labels + R_t_labels + D_t_labels + \
  ['Ccum'] + x_t_labels

%%time
L = 100 #100pub #50db #N_SAMPLEPATHS
T = 100 #100pub #100db #N_TRANSITIONS
first_n_t = 200
last_n_t = 200

M = Model()
P = Policy(M)
SIM = DemandSimulator(seed=SEED_TRAIN)

thetasAdj1 = {
  a1n: np.arange(
    TH_ADJ1_SPEC[a1n][0],
    TH_ADJ1_SPEC[a1n][1],
    TH_ADJ1_SPEC[a1n][2]) for a1n in a1NAMES}
thetasAdj3 = {
  a1n: np.arange(
    TH_ADJ3_SPEC[a1n][0],
    TH_ADJ3_SPEC[a1n][1],
    TH_ADJ3_SPEC[a1n][2]) for a1n in a1NAMES}
thetas = P.grid_search_thetas_2_2(
  thetasA=thetasAdj1,
  thetasB=thetasAdj3,
  thetasA_name='thAdj1',
  thetasB_name='thAdj3',
  names=a1NAMES)

thetaStar_expCbarcum_AdjAround, thetaStar_expCtilcum_AdjAround, \
Cbarcum_AdjAround, Ctilcum_AdjAround, \
best_theta_AdjAround, worst_theta_AdjAround, \
best_Cbarcum_AdjAround, worst_Cbarcum_AdjAround, \
best_Ctilcum_AdjAround, worst_Ctilcum_AdjAround, \
record_AdjAround = \
  P.perform_grid_search_sample_paths('X__AdjAround', thetas)
f'{thetaStar_expCbarcum_AdjAround.iloc[-1]=:.2f}'
df_first_n_t = pd.DataFrame.from_records(record_AdjAround[:first_n_t], columns=labels)
df_last_n_t = pd.DataFrame.from_records(record_AdjAround[-last_n_t:], columns=labels)

numThetas=256
... printing every 100th theta (if considered) ...
=== (0 / 256), theta=Theta(thAdj1={'ELA': 26, 'SON': 9}, thAdj3={'ELA': 34, 'SON': 12}) ===
=== (100 / 256), theta=Theta(thAdj1={'ELA': 27, 'SON': 11}, thAdj3={'ELA': 35, 'SON': 12}) ===
=== (200 / 256), theta=Theta(thAdj1={'ELA': 29, 'SON': 9}, thAdj3={'ELA': 36, 'SON': 12}) ===
best_theta_n:
Theta(thAdj1={'ELA': 29, 'SON': 12}, thAdj3={'ELA': 35, 'SON': 15})
best_Cbarcum=148238.72
best_Ctilcum=1217.23
worst_theta_n:
Theta(thAdj1={'ELA': 26, 'SON': 10}, thAdj3={'ELA': 34, 'SON': 12})
worst_Cbarcum=70529.09
worst_Ctilcum=1014.79
CPU times: user 1min 25s, sys: 860 ms, total: 1min 26s
Wall time: 1min 28s

best_theta_AdjAround

Theta(thAdj1={'ELA': 29, 'SON': 12}, thAdj3={'ELA': 35, 'SON': 15})

P.plot_Fhat_map_4(
  FhatI_theta_I=Cbarcum_AdjAround,
  thetasX=thetasAdj1[a1NAMES[0]],
  thetasY=thetasAdj1[a1NAMES[1]],
  labelX=r'$\theta^{Adj}_{ELA\_1}$',
  labelY=r'$\theta^{Adj}_{SON\_1}$',
  title="Sample mean of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$"+ P.round_theta(best_theta_AdjAround)+", " \
    r"$\bar{C}^{cum}(\theta^*) =$"+f"{best_Cbarcum_AdjAround:,.0f}\n",
  thetaFixed1=best_theta_AdjAround.thAdj3[a1NAMES[0]],
  thetaFixed2=best_theta_AdjAround.thAdj3[a1NAMES[1]],
)
print()
P.plot_Fhat_map_4(
  FhatI_theta_I=Ctilcum_AdjAround,
  thetasX=thetasAdj1[a1NAMES[0]],
  thetasY=thetasAdj1[a1NAMES[1]],
  labelX=r'$\theta^{Adj}_{ELA\_1}$',
  labelY=r'$\theta^{Adj}_{SON\_1}$',
  title="Standard error of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$"+ P.round_theta(best_theta_AdjAround)+", " \
    r"$\tilde{C}^{cum}(\theta^*) =$"+f"{best_Ctilcum_AdjAround:,.0f}\n",
  thetaFixed1=best_theta_AdjAround.thAdj3[a1NAMES[0]],
  thetaFixed2=best_theta_AdjAround.thAdj3[a1NAMES[1]],
);

P.plot_expFhat_chart(
  df=thetaStar_expCbarcum_AdjAround,
  labelX=r'$\ell$',
  labelY=r"$exp\bar{C}^{cum}(\theta^*)$"+"\n(Profit)\n[$]",
  title="Expanding sample mean of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$"+ P.round_theta(best_theta_AdjAround)+", " \
    r"$\bar{C}^{cum}(\theta^*) =$"+f"{best_Cbarcum_AdjAround:,.0f}",
  color_style='b-'
)
print()
P.plot_expFhat_chart(
  df=thetaStar_expCtilcum_AdjAround,
  labelX=r'$\ell$',
  labelY=r"$exp\tilde{C}^{cum}(\theta^*)$"+"\n(Profit)\n[$]",
  title="Expanding standard error of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$"+ P.round_theta(best_theta_AdjAround)+", " \
    r"$\tilde{C}^{cum}(\theta^*) =$"+f"{best_Ctilcum_AdjAround:,.0f}",
  color_style='b--'
);

f'{len(record_AdjAround):,}', L, T

('2,560,000', 100, 100)

best_theta_AdjAround

Theta(thAdj1={'ELA': 29, 'SON': 12}, thAdj3={'ELA': 35, 'SON': 15})

P.plot_records(
  df=df_first_n_t,
  df_non=None,
  pars=defaultdict(str, {
    'thetaAdj1': {a1n: best_theta_AdjAround.thAdj1[a1n] for a1n in a1NAMES},
    'thetaAdj3': {a1n: best_theta_AdjAround.thAdj3[a1n] for a1n in a1NAMES},
    'suptitle': f'TRAINING OF X__AdjBelow POLICY'+'\n'+f'(first {first_n_t} records)'+'\n'+ \
    f'L = {L}, T = {T}, '+ \
    r'$\theta^*=$'+f'{P.round_theta(best_theta_AdjAround)}',
  }),
)

P.plot_records(
  df=df_last_n_t,
  df_non=None,
  pars=defaultdict(str, {
    'thetaAdj1': {a1n: best_theta_AdjAround.thAdj1[a1n] for a1n in a1NAMES},
    'thetaAdj3': {a1n: best_theta_AdjAround.thAdj3[a1n] for a1n in a1NAMES},
    'suptitle': f'TRAINING OF X__AdjBelow POLICY'+'\n'+f'(last {last_n_t} records)'+'\n'+ \
    f'L = {L}, T = {T}, '+ \
    r'$\theta^*=$'+f'{P.round_theta(best_theta_AdjAround)}',
  }),
)

4.6.1.2 Comparison of Policies

## last_n_l = int(.95*L)
## P.plot_expFhat_charts(
#   means={
#       'HighLow': thetaStar_expCbarcum_HighLow[-last_n_l:],
#       'SellLow': thetaStar_expCbarcum_SellLow[-last_n_l:],
#       'Track': thetaStar_expCbarcum_Track[-last_n_l:]
#   },
#   stdvs={
#       'HighLow': thetaStar_expCtilcum_HighLow[-last_n_l:],
#       'SellLow': thetaStar_expCtilcum_SellLow[-last_n_l:],
#       'Track': thetaStar_expCtilcum_Track[-last_n_l:]
#   },
#   labelX='Sample paths, ' + r'$\ell$',
#   labelY='Profit\n[$]',
#   suptitle=f"Comparison of Policies after Training\n \
#     L = {L}, T = {T}\n \
#     last {last_n_l} records\n \
#     ('exp' refers to expanding)",
#   pars=defaultdict(str, {
#     'colors': ['r', 'g', 'b']
#   }),
# )

4.6.2 Evaluation

4.6.2.1 X__AdjAround

best_theta_AdjAround

Theta(thAdj1={'ELA': 29, 'SON': 12}, thAdj3={'ELA': 35, 'SON': 15})

%%time
L = 100 #50db #N_SAMPLEPATHS
T = 100 #100db #N_TRANSITIONS
first_n_t = int(.01*L*T)+10 #pub

M = Model()
P = Policy(M)
SIM = DemandSimulator(seed=SEED_EVALU)
thetasOpt = []; thetasOpt.append(best_theta_AdjAround)
thetaStar_expCbarcum_AdjAround_evalu_opt, thetaStar_expCtilcum_AdjAround_evalu_opt, \
_, _, \
best_theta_AdjAround_evalu_opt, worst_theta_AdjAround_evalu_opt, \
_, _, \
_, _, \
record_AdjAround_evalu_opt = \
  P.perform_grid_search_sample_paths('X__AdjAround', thetasOpt)
df_AdjAround_evalu_opt = pd.DataFrame.from_records(
    record_AdjAround_evalu_opt[:first_n_t], columns=labels)

M = Model()
P = Policy(M)
SIM = DemandSimulator(seed=SEED_EVALU)
thetasNon = []; thetasNon.append(worst_theta_AdjAround)
## thetasNon = []; thetasNon.append(
#     P.build_theta(
#         {'thAdj1': {'ELA': 20, 'SON': 10}, 'thAdj3': {'ELA': 40, 'SON': 17}}
# ))
thetaStar_expCbarcum_AdjAround_evalu_non, thetaStar_expCtilcum_AdjAround_evalu_non, \
_, _, \
best_theta_AdjAround_evalu_non, worst_theta_AdjAround_evalu_non, \
_, _, \
_, _, \
record_AdjAround_evalu_non = \
  P.perform_grid_search_sample_paths('X__AdjAround', thetasNon)
df_AdjAround_evalu_non = pd.DataFrame.from_records(
    record_AdjAround_evalu_non[:first_n_t], columns=labels)

print(
  f'{thetaStar_expCbarcum_AdjAround_evalu_opt.iloc[-1]=:.2f}, \
    {thetaStar_expCbarcum_AdjAround_evalu_non.iloc[-1]=:.2f}')

numThetas=1
... printing every 100th theta (if considered) ...
=== (0 / 1), theta=Theta(thAdj1={'ELA': 29, 'SON': 12}, thAdj3={'ELA': 35, 'SON': 15}) ===
best_theta_n:
Theta(thAdj1={'ELA': 29, 'SON': 12}, thAdj3={'ELA': 35, 'SON': 15})
best_Cbarcum=146283.34
best_Ctilcum=1105.06
worst_theta_n:
Theta(thAdj1={'ELA': 29, 'SON': 12}, thAdj3={'ELA': 35, 'SON': 15})
worst_Cbarcum=146283.34
worst_Ctilcum=1105.06
numThetas=1
... printing every 100th theta (if considered) ...
=== (0 / 1), theta=Theta(thAdj1={'ELA': 26, 'SON': 10}, thAdj3={'ELA': 34, 'SON': 12}) ===
best_theta_n:
Theta(thAdj1={'ELA': 26, 'SON': 10}, thAdj3={'ELA': 34, 'SON': 12})
best_Cbarcum=70094.38
best_Ctilcum=1042.78
worst_theta_n:
Theta(thAdj1={'ELA': 26, 'SON': 10}, thAdj3={'ELA': 34, 'SON': 12})
worst_Cbarcum=70094.38
worst_Ctilcum=1042.78
thetaStar_expCbarcum_AdjAround_evalu_opt.iloc[-1]=146283.34,     thetaStar_expCbarcum_AdjAround_evalu_non.iloc[-1]=70094.38
CPU times: user 628 ms, sys: 19.8 ms, total: 647 ms
Wall time: 656 ms

P.plot_records(
  df=df_AdjAround_evalu_opt,
  df_non=df_AdjAround_evalu_non,
  pars=defaultdict(str, {
    'thetaAdj1': {a1n: thetasOpt[0].thAdj1[a1n] for a1n in a1NAMES},
    'thetaAdj3': {a1n: thetasOpt[0].thAdj3[a1n] for a1n in a1NAMES},
    'thetaAdj1Non': {a1n: thetasNon[0].thAdj1[a1n] for a1n in a1NAMES},
    'thetaAdj3Non': {a1n: thetasNon[0].thAdj3[a1n] for a1n in a1NAMES},
    'suptitle': f'EVALUATION OF X__AdjBelow POLICY'+'\n'+f'(first {first_n_t} records)'+'\n'+ \
    f'L = {L}, T = {T}, '+ \
    r'$\theta^*=$'+f'{P.round_theta(best_theta_AdjAround_evalu_opt)}',
  }),
)

## last_n_l = int(.99*L)
last_n_l = int(1.0*L)
P.plot_expFhat_charts(
  means={
      'AdjBelow optimal': thetaStar_expCbarcum_AdjAround_evalu_opt[-last_n_l:],
      'AdjBelow non-optimal': thetaStar_expCbarcum_AdjAround_evalu_non[-last_n_l:],
  },
  stdvs={
      'AdjBelow optimal': thetaStar_expCtilcum_AdjAround_evalu_opt[-last_n_l:],
      'AdjBelow non-optimal': thetaStar_expCtilcum_AdjAround_evalu_non[-last_n_l:],
  },
  labelX='Sample paths, ' + r'$\ell$',
  labelY='Profit\n[$]',
  suptitle=f"Comparison of Optimal/Non-optimal Policies after Evaluation\n \
    L = {L}, T = {T}\n \
    last {last_n_l} records\n \
    ('exp' refers to expanding)",
  pars=defaultdict(str, {
    'colors': ['m', 'c']
  }),
)

Next, we evaluate with a single, very long sample-path:

%%time
L = 1 #N_SAMPLEPATHS
T = 1_000 #N_TRANSITIONS
first_n_t = int(1*L*T) #pub

M = Model()
P = Policy(M)
SIM = DemandSimulator(seed=SEED_EVALU)
thetasOpt = []; thetasOpt.append(best_theta_AdjAround)
thetaStar_expCbarcum_AdjAround_evalu_opt, thetaStar_expCtilcum_AdjAround_evalu_opt, \
_, _, \
best_theta_AdjAround_evalu_opt, worst_theta_AdjAround_evalu_opt, \
_, _, \
_, _, \
record_AdjAround_evalu_opt = \
  P.perform_grid_search_sample_paths('X__AdjAround', thetasOpt)
df_AdjAround_evalu_opt = pd.DataFrame.from_records(
    record_AdjAround_evalu_opt[:first_n_t], columns=labels)

M = Model()
P = Policy(M)
SIM = DemandSimulator(seed=SEED_EVALU)
thetasNon = []; thetasNon.append(worst_theta_AdjAround)
## thetasNon = []; thetasNon.append(
#     P.build_theta(
#         {'thAdj1': {'ELA': 20, 'SON': 10}, 'thAdj3': {'ELA': 40, 'SON': 17}}
# ))
thetaStar_expCbarcum_AdjAround_evalu_non, thetaStar_expCtilcum_AdjAround_evalu_non, \
_, _, \
best_theta_AdjAround_evalu_non, worst_theta_AdjAround_evalu_non, \
_, _, \
_, _, \
record_AdjAround_evalu_non = \
  P.perform_grid_search_sample_paths('X__AdjAround', thetasNon)
df_AdjAround_evalu_non = pd.DataFrame.from_records(
    record_AdjAround_evalu_non[:first_n_t], columns=labels)

print(
  f'{thetaStar_expCbarcum_AdjAround_evalu_opt.iloc[-1]=:.2f}, \
    {thetaStar_expCbarcum_AdjAround_evalu_non.iloc[-1]=:.2f}')

numThetas=1
... printing every 100th theta (if considered) ...
=== (0 / 1), theta=Theta(thAdj1={'ELA': 29, 'SON': 12}, thAdj3={'ELA': 35, 'SON': 15}) ===
best_theta_n:
Theta(thAdj1={'ELA': 29, 'SON': 12}, thAdj3={'ELA': 35, 'SON': 15})
best_Cbarcum=1467678.40
best_Ctilcum=nan
worst_theta_n:
Theta(thAdj1={'ELA': 29, 'SON': 12}, thAdj3={'ELA': 35, 'SON': 15})
worst_Cbarcum=1467678.40
worst_Ctilcum=nan
numThetas=1
... printing every 100th theta (if considered) ...
=== (0 / 1), theta=Theta(thAdj1={'ELA': 26, 'SON': 10}, thAdj3={'ELA': 34, 'SON': 12}) ===
best_theta_n:
Theta(thAdj1={'ELA': 26, 'SON': 10}, thAdj3={'ELA': 34, 'SON': 12})
best_Cbarcum=768072.05
best_Ctilcum=nan
worst_theta_n:
Theta(thAdj1={'ELA': 26, 'SON': 10}, thAdj3={'ELA': 34, 'SON': 12})
worst_Cbarcum=768072.05
worst_Ctilcum=nan
thetaStar_expCbarcum_AdjAround_evalu_opt.iloc[-1]=1467678.40,     thetaStar_expCbarcum_AdjAround_evalu_non.iloc[-1]=768072.05
CPU times: user 89.9 ms, sys: 37 µs, total: 90 ms
Wall time: 95.6 ms

RuntimeWarning: invalid value encountered in double_scalars
  Ctilcum_tmp = np.sum(np.square(np.array(CcumIomega__lI) - Cbarcum_tmp))/(L - 1)
<ipython-input-10-39b98f98afdf>:125: RuntimeWarning: invalid value encountered in double_scalars
  Ctilcum_tmp = np.sum(np.square(np.array(CcumIomega__lI) - Cbarcum_tmp))/(L - 1)

P.plot_records(
  df=df_AdjAround_evalu_opt,
  df_non=df_AdjAround_evalu_non,
  pars=defaultdict(str, {
    'thetaAdj1': {a1n: thetasOpt[0].thAdj1[a1n] for a1n in a1NAMES},
    'thetaAdj3': {a1n: thetasOpt[0].thAdj3[a1n] for a1n in a1NAMES},
    'thetaAdj1Non': {a1n: thetasNon[0].thAdj1[a1n] for a1n in a1NAMES},
    'thetaAdj3Non': {a1n: thetasNon[0].thAdj3[a1n] for a1n in a1NAMES},
    'suptitle': f'EVALUATION OF X__AdjBelow POLICY'+'\n'+f'(first {first_n_t} records)'+'\n'+ \
    f'L = {L}, T = {T}, '+ \
    r'$\theta^*=$'+f'{P.round_theta(best_theta_AdjAround_evalu_opt)}',
  }),
)

df_AdjAround_evalu_opt[df_AdjAround_evalu_opt['t']==T-1][['Ccum']]

	Ccum
999	1,467,678.4000

df_AdjAround_evalu_non[df_AdjAround_evalu_non['t']==T-1][['Ccum']]

	Ccum
999	768,072.0500

From the Ccum plot we see that the cumulative reward for the optimal policy keeps on rising. The non-optimal policy keeps losing money.

4.6.3 Comparison of Optimized Policies

## P.plot_evalu_comparison(
#   df1=df_BuyBelow_evalu_opt,
#   df2=df_Bellman_evalu_opt,
#   df3=None,
#   pars= defaultdict(str, {
#     'suptitle': f'EVALUATION OF ALL POLICIES (first {first_n_t} records)\n \
#     L={L}, T={T}',
#   }),
# )