Car Rental Operations using the Powell Unified Framework (Part 2)

Using Sequential Decision Analytics to find ongoing optimal decisions

Automotive Industry
Car Rental Industry
Powell Unified Framework
Reinforcement Learning
Python
Author

Kobus Esterhuysen

Published

October 13, 2023

Modified

October 14, 2023

0 INTRODUCTION

In this project the client had a need to be convinced of the benefits of formal optimized sequential decision making. The client’s need relates to human resource scheduling. The current project, although from the car rental industry, lays a foundation for the development of the client’s specific need.

In the previous part we modeled the operations of the “1-3 Rentals” company. They provided rental cars (Elantras or Sonatas). In this part, we deal with another company called “Vintage Rentals”. Their business involves connecting owners of operational vintage cars wanting to rent them out, to consumers that have a need to be thrilled by driving a vintage car for a few hours.

Vintage Rentals’ current pool of available cars are:

  • 7 Volkswagen Beetles (1960 - 1965)
  • 3 Ford Mustangs (1965 - 1990)
  • 2 BMWs (1984 - 1990)
  • 4 Corvettes (1980 - 1995)

This pool is expected to grow as new vintage car owners signs contracts for providing this service.

The overall structure of this project and report follows the traditional CRISP-DM format. However, instead of the CRISP-DM’S “4 Modeling” section, we inserted the “6 step modeling process” of Dr. Warren Powell in section 4 of this document. Dr Powell’s universal framework shows great promise for unifying the formalisms of at least a dozen different fields. Using his framework enables easier access to thinking patterns in these other fields that might be beneficial and informative to the sequential decision problem at hand. Traditionally, this kind of problem would be approached from the reinforcement learning perspective. However, using Dr. Powell’s wider and more comprehensive perspective almost certainly provides additional value.

Here is information on Dr. Powell’s perspective on Sequential Decision Analytics.

In order to make a strong mapping between the code in this notebook and the mathematics in the Powell Universal Framework (PUF), we follow the following convention for naming Python identifier names:

  • How to read/say
    • var name & flavor first
    • at t/n
    • for entity OR of/with attribute
    • \(\hat{R}^{fail}_{t+1,a}\) has code Rhat__fail_tt1_a which is read: “Rhatfail at t+1 of/with (attribute) a”
  • Superscripts
    • variable names have a double underscore to indicate a superscript
    • \(X^{\pi}\): has code X__pi, is read X pi
    • when there is a ‘natural’ distinction between the variable symbol and the superscript (e.g. a change in case), the double underscore is sometimes omitted: Xpi instead of X__pi, or MSpend_t instead of M__Spend_t
  • Subscripts
    • variable names have a single underscore to indicate a subscript
    • \(S_t\): has code S_t, is read ‘S at t’
    • \(M^{Spend}_t\) has code M__Spend_t which is read: “MSpend at t”
    • \(\hat{R}^{fail}_{t+1,a}\) has code Rhat__fail_tt1_a which is read: “Rhatfail at t+1 of/with (attribute) a” [RLSO-p436]
  • Arguments
    • collection variable names may have argument information added
    • \(X^{\pi}(S_t)\): has code X__piIS_tI, is read ‘X pi in S at t’
    • the surrounding I’s are used to imitate the parentheses around the argument
  • Next time/iteration
    • variable names that indicate one step in the future are quite common
    • \(R_{t+1}\): has code R_tt1, is read ‘R at t+1’
    • \(R^{n+1}\): has code R__nt1, is read ‘R at n+1’
  • Rewards
    • State-independent terminal reward and cumulative reward
      • \(F\): has code F for terminal reward
      • \(\sum_{n}F\): has code cumF for cumulative reward
    • State-dependent terminal reward and cumulative reward
      • \(C\): has code C for terminal reward
      • \(\sum_{t}C\): has code cumC for cumulative reward
  • Vectors where components use different names
    • \(S_t(R_t, p_t)\): has code S_t.R_t and S_t.p_t, is read ‘S at t in R at t, and, S at t in p at t’
    • the code implementation is by means of a named tuple
      • self.State = namedtuple('State', SVarNames) for the ‘class’ of the vector
      • self.S_t for the ‘instance’ of the vector
  • Vectors where components reuse names
    • \(x_t(x_{t,GB}, x_{t,BL})\): has code x_t.x_t_GB and x_t.x_t_BL, is read ‘x at t in x at t for GB, and, x at t in x at t for BL’
    • the code implementation is by means of a named tuple
      • self.Decision = namedtuple('Decision', xVarNames) for the ‘class’ of the vector
      • self.x_t for the ‘instance’ of the vector
  • Use of mixed-case variable names
    • to reduce confusion, sometimes the use of mixed-case variable names are preferred (even though it is not a best practice in the Python community), reserving the use of underscores and double underscores for math-related variables

1 BUSINESS UNDERSTANDING

Requests for rentals come associated with the number of hours a rental car is needed. The duration of the rental may vary from 1 hour to 24 hours. These are the only durations allowed.

As requests for rentals come in, they are placed in one of two queus:

  • a short-duration queue
  • a long-duration queue

The threshold around which a rental demand is classified as either a short-duration rental or a long-duration rental, will be learned by the AI agent. This threshold is represented by a code parameter thShortDuration. Mathematically, it is represented as \(\theta^{ShortDuration}\).

An underlying assumption is that short-duration rentals will be more profitable. A consumer will often be prepared to pay an unusually high premium for the thrill of driving an old car, maybe during a parade or a special occasion like a wedding. The company decided to capitalize on this by having a pricing structure based on a linear sliding scale:

  • Volkswagen Beetles
    • $300/hour for 1 hour

    • $50/hour for 24 hours

  • Ford Mustangs
    • $350/hour for 1 hour

    • $55/hour for 24 hours

  • BMWs
    • $370/hour for 1 hour

    • $70/hour for 24 hours

  • Corvettes
    • $380/hour for 1 hour

    • $80/hour for 24 hours

Between the outer values, in each case, the rental rates slide down linearly depending on the duration of the rental.

Another important dynamic decision to be made is how the two queues need to be served. This decision is handled by having the AI agent learn another parameter, represented in the code as thServiceAbove. The short-duration queue will be serviced each time its length exceeds the value of the \(\theta^{ServiceAbove}\) parameter. If its length is below this value, the other queue will be serviced.

The overall objective will be to maximize profit, even if it leads to growing queue lengths over time.

2 DATA UNDERSTANDING

Based on recent market research, the demand may be modeled by a Poisson distribution for each resource type: \[ \begin{aligned} \mu^{ResourceType} &= \mathrm{SIM\_MU\_D[RESOURCE\_TYPE]} \end{aligned} \]

So we have: \[ D^{ResourceType}_{t+1} \sim Pois(\mu^{ResourceType}) \]

The decision window is 1 hour and these simulations are for the hourly demands (during business hours).

## import pdb
from collections import namedtuple, defaultdict
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from copy import copy
import time
from scipy.ndimage.interpolation import shift
import pickle
from bisect import bisect
import math
from pprint import pprint
import matplotlib as mpl
from certifi.core import where
pd.options.display.float_format = '{:,.4f}'.format
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)
! python --version
Python 3.10.12
DeprecationWarning: Please use `shift` from the `scipy.ndimage` namespace, the `scipy.ndimage.interpolation` namespace is deprecated.
  from scipy.ndimage.interpolation import shift

We will have the learnable paramters:

\[(\theta^{ShortDuration}, \theta^{ServiceAbove})\]

## PARAMETERS
SNAMES = [ #state variable names
    'RAvail_t', #available resource
    'R_t',      #resource
    'D_t',      #demand
]
xNAMES = ['x_t'] #decision variable names

TYPES = ['VWB']*7 + ['MUS']*3 + ['BMW']*2 + ['COR']*4
RESOURCE_IDS = [str(i+1) for i in range(len(TYPES))]
RESOURCE_TYPES = ['VWB', 'MUS', 'BMW', 'COR']

## *resource* attribute vectors
aNAMES = [tup[0]+'_'+tup[1] for tup in zip(RESOURCE_IDS, TYPES)]
print(f'{len(aNAMES)=}')
print(aNAMES)

## *demand* attribute vectors
bNAMES = RESOURCE_TYPES
print(f'\n{len(bNAMES)=}')
print(bNAMES)

## *decision* 'attribute' vectors

piNAMES = ['X__ServiceAbove'] #policy names
thNAMES = [ #theta names
  'thShortDuration', #put in shortDuration queue if duration is less than thShortDuration
  'thServiceAbove', #service shortDuration queue if its length exceeds thServiceAbove
]
print(f'\n{len(thNAMES)=}')
print(f'{thNAMES=}')

SEED_TRAIN = 77777777
SEED_EVALU = 88888888
N_SAMPLEPATHS = 100; L = N_SAMPLEPATHS
N_TRANSITIONS = 100; T = N_TRANSITIONS

TH_ShortDuration_SPEC = (3, 11, 1)
TH_ServiceAbove_SPEC = (1, 9, 1)

SIM_T = 60
SIM_MU_D = {bNAMES[0]: 4, bNAMES[1]: 2, bNAMES[2]: 2, bNAMES[3]: 3}
print(f'\n{SIM_MU_D=}')
assert len(SIM_MU_D.items())==len(bNAMES)

SIM_EVENT_TIME_D = {bNAMES[0]: None, bNAMES[1]: None, bNAMES[2]: None, bNAMES[3]: None}
print(f'\n{SIM_EVENT_TIME_D=}')
assert len(SIM_EVENT_TIME_D.items())==len(bNAMES)

SIM_MU_DELTA_D = {bNAMES[0]: None, bNAMES[1]: None, bNAMES[2]: None, bNAMES[3]: None}
print(f'\n{SIM_MU_DELTA_D=}')
assert len(SIM_MU_DELTA_D.items())==len(bNAMES)

# math parameters use 'math/small case' (as opposed to code parameters):

## ['VWB', 'MUS', 'BMW', 'COR']
RENT_PARS = { ##dollars/hour
  bNAMES[0]: {'m': -10.870, 'c': 310.870},
  bNAMES[1]: {'m': -12.826, 'c': 362.826},
  bNAMES[2]: {'m': -13.043, 'c': 383.043},
  bNAMES[3]: {'m': -13.043, 'c': 393.043},
}
print(f'\n{RENT_PARS=}')
assert len(RENT_PARS.items())==len(bNAMES)

## sliding scale rent
## for example: $300/hour for 1 hour, $50/hour for 24 hours
## https://www.omnicalculator.com/math/line-equation-from-two-points
def p__rent(duration, m, c):
  if (1 <= duration <= 24): #at least 1 hour but at most 24 hours
    return(m*duration + c) #$/interval, i.e. $/hour
  else:
    print(f'ERROR in p__rent(): duration should be (1 <= duration <= 24)')
    return None

DURATION_LO = 1; DURATION_HI = 24
len(aNAMES)=16
['1_VWB', '2_VWB', '3_VWB', '4_VWB', '5_VWB', '6_VWB', '7_VWB', '8_MUS', '9_MUS', '10_MUS', '11_BMW', '12_BMW', '13_COR', '14_COR', '15_COR', '16_COR']

len(bNAMES)=4
['VWB', 'MUS', 'BMW', 'COR']

len(thNAMES)=2
thNAMES=['thShortDuration', 'thServiceAbove']

SIM_MU_D={'VWB': 4, 'MUS': 2, 'BMW': 2, 'COR': 3}

SIM_EVENT_TIME_D={'VWB': None, 'MUS': None, 'BMW': None, 'COR': None}

SIM_MU_DELTA_D={'VWB': None, 'MUS': None, 'BMW': None, 'COR': None}

RENT_PARS={'VWB': {'m': -10.87, 'c': 310.87}, 'MUS': {'m': -12.826, 'c': 362.826}, 'BMW': {'m': -13.043, 'c': 383.043}, 'COR': {'m': -13.043, 'c': 393.043}}
class DemandSimulator():
  def __init__(self,
    T__sim=SIM_T,
    muD=SIM_MU_D,
    eventTimeD=SIM_EVENT_TIME_D,
    muDeltaD=SIM_MU_DELTA_D,
    seed=None):
    self.time = 0
    self.T__sim = SIM_T
    self.muD = SIM_MU_D
    self.eventTimeD = SIM_EVENT_TIME_D
    self.muDeltaD = SIM_MU_DELTA_D
    self.prng = np.random.RandomState(seed)

  def simulate(self):
    if self.time > self.T__sim - 1:
      self.time = 0
    D_tt1 = {}
    for bn in bNAMES:
      if self.eventTimeD[bn] and self.time > self.eventTimeD[bn]: #event for entity
        D_tt1[bn] = self.muDeltaD[bn] + self.prng.poisson(self.muD[bn]) #after event
      else:
        D_tt1[bn] = self.prng.poisson(self.muD[bn])
    self.time += 1
    return {bn: max(0, D_tt1[bn]) for bn in bNAMES} #always positive
dem_sim = DemandSimulator(seed=1234)
DemandData = []
for i in range(SIM_T):
  d_e = list(dem_sim.simulate().values())
  DemandData.append(d_e)
labels = [f'{bn}_demand' for bn in bNAMES]
df = pd.DataFrame.from_records(data=DemandData, columns=labels); df[:10]
VWB_demand MUS_demand BMW_demand COR_demand
0 5 4 3 2
1 4 3 3 3
2 5 1 3 3
3 2 6 2 1
4 3 3 2 1
5 6 4 4 2
6 7 1 2 4
7 7 1 1 1
8 4 2 1 3
9 4 3 6 10
import random
def plot_output(df1, df2):
  n_charts = len(bNAMES)
  ylabelsize = 16
  mpl.rcParams['lines.linewidth'] = 1.2
  default_colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
  fig, axs = plt.subplots(n_charts, sharex=True)
  fig.set_figwidth(13); fig.set_figheight(9)
  fig.suptitle('Demand Simulation', fontsize=20)

  for i,bn in enumerate(bNAMES):
    axs[i].set_title(f'Demanded {bn}')
    axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
    axs[i].step(df1[f'{bn}_demand'], random.choice(default_colors))
    ## axs[i].axhline(y=dem_sim.muD[e], color='k', linestyle=':')
    axs[i].axhline(y=0, color='k', linestyle=':')

  axs[i].set_xlabel('$t\ \mathrm{[daily\ windows]}$', rotation=0, ha='center', va='center', fontweight='bold', size=ylabelsize)
plot_output(df, None)

3 DATA PREPARATION

We will use the data provided by the simulator directly. There is no need to perform additional data preparation.

4 MODELING

4.1 Narrative

Please review the narrative in section 1.

4.2 Core Elements

This section attempts to answer three important questions: - What metrics are we going to track? - What decisions do we intend to make? - What are the sources of uncertainty?

For this problem, the only metric we are interested in is the amount of profit we make after each decision window. A single type of decision needs to be made at the start of each window - which input queue to serve (short-duration or long-duration). The only source of uncertainty are the levels of demand for each of the resource types.

4.3 Mathematical Model | SUS Design

A Python class is used to implement the model for the SUS (System Under Steer):

class Model():
  def __init__(self, S_0_info):
    ...
    ...

4.3.1 State variables

The state variables represent what we need to know. - \(R^{InvsOut}_t = (R^{InvsOut}_{ta})_{a \in \cal A}\) where \(\cal{A} = \{\alpha_1, \alpha_2, ... \alpha_{12}\}\) - \(R^{InvsOut}_{ta}\) = Number of hours this resource (with attribute \(a\)), has already been out for rental at \(t\) - \(\alpha_1\) = 1_VWB - \(\alpha_2\) = 2_VWB - \(\alpha_3\) = 3_VWB - … - \(\alpha_{15}\) = 15_COR - \(\alpha_{16}\) = 16_COR - \(R^{Avail}_t = (R^{Avail}_{ta})_{a \in \cal A}\) where \(\cal{A} = \{\alpha_1, \alpha_2, ... \alpha_{12}\}\) - \(R^{Avail}_{ta}\) = Boolean indicator for whether this resource (with attribute \(a\)), is available at \(t\) for rental - \(\alpha_1\) = 1_VWB - \(\alpha_2\) = 2_VWB - \(\alpha_3\) = 3_VWB - … - \(\alpha_{15}\) = 15_COR - \(\alpha_{16}\) = 16_COR - \(R^{Duration}_t = (R^{Duration}_{ta})_{a \in \cal A}\) where \(\cal{A} = \{\alpha_1, \alpha_2, ... \alpha_{12}\}\) - \(R^{Duration}_{ta}\) = Number of hours this resource (with attribute \(a\)), is tied up for rental at \(t\) - \(\alpha_1\) = 1_VWB - \(\alpha_2\) = 2_VWB - \(\alpha_3\) = 3_VWB - … - \(\alpha_{15}\) = 15_COR - \(\alpha_{16}\) = 16_COR - \(D_t = (D_{tb})_{b \in \cal B}\) where \(\cal{B} = \{\beta_1, \beta_2, \beta_3, \beta_4\}\) - \(D_{tb}\) = Number of demands for this resource (with attribute \(b\)), at \(t\) - \(\beta_1\) = VWB - \(\beta_1\) = MUS - \(\beta_1\) = BMW - \(\beta_1\) = COR

The state is:

\[ \begin{aligned} S_t &= (R^{InvsOut}_t, R^{Avail}_t, R^{Duration}_t, D_t) \\ &= ((R^{InvsOut}_{ta})_{a \in \cal A}, R^{Avail}_{ta})_{a \in \cal A}, (R^{Duration}_{ta})_{a \in \cal A}, (D_{tb})_{b \in \cal B}) \end{aligned} \]

4.3.2 Decision variables

The decision variables represent what we control.

  • \(x_t\)
    • \(x_t = 1\) when the short-duration queue should be processed
    • \(x_t = 0\) when the long-duration queue should be processed
  • Decisions are made with a policy (TBD below):
    • \(X^{\pi}(S_t)\)

The decision variables are represented by the following variables in the Model class:

self.Decision = namedtuple('Decision', xNAMES) # 'class'

where

xNAMES = ['x_t'] #decision variable names

4.3.3 Exogenous information variables

The exogenous information variables represent what we did not know (when we made a decision). These are the variables that we cannot control directly. The information in these variables become available after we make the decision \(x_t\).

When we assume that the demand in each time period is revealed, without any model to predict the demand based on past demands, we have, using approach 1:

\[ \begin{aligned} D_{t+1} &= W_{t+1} \\ &= \hat{D}_{t+1} \end{aligned} \]

Alternatively, when we assume that we observe the change in demand \(\hat{D}_{t+1}=p_{t+1}-p_{t}\), we have, using approach 2:

\[ \begin{aligned} D_{t+1} &= D_t + W_{t+1} \\ &= D_t + \hat{D}_{t+1} \end{aligned} \]

We will make use of approach 1 which means that the exogenous information, \(W_{t+1}\), is the directly observed demands of the resources.

The exogenous information is obtained by a call to

DemandSimulator.simulate(...)

4.3.4 Transition function

The transition function describe how the state variables evolve over time. We have the equations:

\[ R^{Avail}_{t+1} = \begin{cases} 1 & \text{if resource with attribute $a$ has not been allocated} \\ 0 & \text{if resource with attribute $a$ has been allocated } \end{cases} \]

\[ R^{InvsOut}_{t+1} = \begin{cases} R^{InvsOut}_{t} + 1 & \text{if t < Duration} \\ 0 & \text{if t = Duration } \end{cases} \]

Collectively, they represent the general transition function:

\[ S_{t+1} = S^M(S_t,X^{\pi}(S_t)) \]

4.3.5 Objective function

The objective function captures the performance metrics of the solution to the problem.

We can write the state-dependant reward (also called contribution due to a rental of a resource with attribute \(b\)):

\[ \begin{align} C_b(S_t,x_t) = p_b^{rent}(d, m_b, c_b) \\ \end{align} \]

where

  • \(d\) = the duration of the rental
  • \(m_b\) = the slope of the price sliding function for the demand with attribute \(b\)
  • \(c_b\) = the intercept of the price sliding function for the demand with attribute \(b\)

We have the objective function:

\[ \max_{\pi}\mathbb{E}\{\sum_{t=0}^{T}C(S_t,x_t,W_{t+1}) \} \]

The learned parameters are:

\[(\theta^{ShortDuration}, \theta^{ServiceAbove})\]

  • \(\theta^{ShortDuration}\)
    • the rental duration (in hours) above which the rental is classified as a long-duration rental
  • \(\theta^{ServiceAbove}\)
    • the short-duration queue length above which the queue is processed; below this value allows for the processing of the long-duration queue

4.3.6 Implementation of the System Under Steer (SUS) Model

class Model():
  def __init__(self, W_fn=None, S__M_fn=None, C_fn=None):
    self.S_t = {
      'R_t': pd.DataFrame({
        'ResourceId': RESOURCE_IDS,
        'Type': TYPES,
        'RInvsOut_t': [0]*len(TYPES),
        'RAvail_t': [True]*len(TYPES),
        'RDuration_t': [0]*len(TYPES),
      }),
      'D_t': pd.DataFrame({
        'Type': RESOURCE_TYPES,
        'D_t': [0]*len(RESOURCE_TYPES),
        'Dcum_t': [0]*len(RESOURCE_TYPES),
      }),
    }

    self.Decision = namedtuple('Decision', xNAMES) #. 'class'
    self.Ccum = 0.0 #. cumulative reward
    self.shortDurationQueue = []
    self.longDurationQueue = []

  def reset(self):
    self.Ccum = 0.0
    self.S_t = self.build_state(self.S_0)

  def build_state(self, info):
    return self.State(*[info[sn] for sn in SNAMES])

  def build_decision(self, info):
    return self.Decision(*[info[xn] for xn in xNAMES])

  # exogenous information, dependent on a random process,
  # the directly observed demands
  def W_fn(self, t):
    return SIM.simulate()

  def performServiceAboveDecision(self, x_t, theta):
    demandsToService = []
    if x_t.x_t == 1: #service shortDurationQueue
      nDemandsToService = len(self.shortDurationQueue); ##print(f'{nDemandsToService=}')
      demandsToService = self.shortDurationQueue; ##print(f'{demandsToService=}')
    elif x_t.x_t == 0: #service longDurationQueue
      nDemandsToService = len(self.longDurationQueue); ##print(f'{nDemandsToService=}')
      demandsToService = self.longDurationQueue; ##print(f'{demandsToService=}')
    else: print(f'ERROR in performServiceAboveDecision(). Invalid {x_t=}')
    for demand in demandsToService:
      resourceType, duration = demand; ##print(f'{resourceType=}, {duration=}')
      avails = self.S_t['R_t'].loc[
        (self.S_t['R_t']['Type']==resourceType)&(self.S_t['R_t']['RAvail_t']==True),
        ['ResourceId', 'Type', 'RAvail_t', 'RDuration_t']
      ]; ##print(f'avails=\n{avails}')
      if len(avails) > 0:
        alloc = avails.iloc[0, :]; ##print(f'alloc=\n{alloc}')
        self.S_t['R_t'].loc[
          self.S_t['R_t']['ResourceId']==alloc['ResourceId'],
          ['RAvail_t']
        ] = False
        self.S_t['R_t'].loc[
          self.S_t['R_t']['ResourceId']==alloc['ResourceId'],
          ['RDuration_t']
        ] = duration
        m, c = RENT_PARS[resourceType]['m'], RENT_PARS[resourceType]['c']
        self.Ccum += p__rent(int(duration), m, c)*int(duration) #sales
        if x_t.x_t == 1:
          self.shortDurationQueue.pop(0)
        elif x_t.x_t == 0:
          self.longDurationQueue.pop(0)
        ## update Dcum
        self.S_t['D_t'].loc[
          self.S_t['D_t']['Type']==resourceType,
          ['Dcum_t']
        ] -= 1 #1 resource allocated
      else:
        ## print(f'No resource of type {resourceType} available. Demand {demand} kept in queue')
        ## self.Ccum -= c__sout #stockout costs might be calced here
        pass

  def S__M_fn(self, t, S_t, x_t, W_tt1, theta):
    ## perform decision taken
    self.performServiceAboveDecision(x_t, theta)

    ## D_t #direct approach
    for rt in RESOURCE_TYPES:
      number = W_tt1[rt]
      S_t['D_t'].loc[S_t['D_t']['Type']==rt, 'D_t'] = number
      S_t['D_t'].loc[
        S_t['D_t']['Type']==rt, #row-indexor
        ['Dcum_t']              #col-indexor
      ] += number

    ## Simulate the arrival of demands throughout interval
    for resourceType in RESOURCE_TYPES:
      n_demands = int(S_t['D_t'].loc[S_t['D_t']['Type']==resourceType, 'D_t'])
      for demand in range(n_demands):
        duration = np.random.randint(low=DURATION_LO, high=DURATION_HI) #ideal for vintage cars, 24 hours
        if duration < theta.thShortDuration:
          self.shortDurationQueue.append([resourceType, duration])
        else:
          self.longDurationQueue.append([resourceType, duration])
    self.shortDurationQueue = np.random.permutation(self.shortDurationQueue).tolist()
    self.longDurationQueue = np.random.permutation(self.longDurationQueue).tolist()

    ## Increment RInvsOut_t for all engaged resources
    S_t['R_t']['RInvsOut_t'] = \
      S_t['R_t'].loc[
        S_t['R_t']['RAvail_t']==False,
        ['RInvsOut_t']
      ].apply(lambda x: x+1)
    S_t['R_t'].loc[ ##fix NaNs created by previous statement
      np.isnan(S_t['R_t']['RInvsOut_t']),
      ['RInvsOut_t']
    ] = 0

    ## Return all engaged resources when completed with engagement
    S_t['R_t'].loc[
      S_t['R_t']['RInvsOut_t'] > S_t['R_t']['RDuration_t'].astype(float),
      ['RInvsOut_t', 'RAvail_t', 'RDuration_t']
    ] = 0,True,0

    record_t = [t] + \
      list(S_t['R_t']['RInvsOut_t']) + \
      list(S_t['R_t']['RDuration_t']) + \
      list(S_t['D_t']['D_t']) + \
      list(S_t['D_t']['Dcum_t']) + \
      [len(self.shortDurationQueue)] + \
      [len(self.longDurationQueue)] + \
      [self.Ccum] + \
      [x_t.x_t]
    return record_t

  def C_fn(self, S_t, x_t, W_tt1, theta):
    return

  def step(self, t, x_t, theta):
    W_tt1 = self.W_fn(t)

    ## update state & reward
    record_t = self.S__M_fn(t, self.S_t, x_t, W_tt1, theta)
    return record_t

4.4 Uncertainty Model

We will simulate the rental demand vector \(D_{t+1}\) as described in section 2.

4.5 Policy Design

There are two main meta-classes of policy design. Each of these has two subclasses: - Policy Search - Policy Function Approximations (PFAs) - Cost Function Approximations (CFAs) - Lookahead - Value Function Approximations (VFAs) - Direct Lookaheads (DLAs)

In this project we will only use one approach: - A simple service-above parameterized policy (from the PFA class) which can be summarized as:

  • if length of shortDurationQueue > \(\theta^{ShortDuration}\):
    • process the short-duration queue
  • else:
    • process the long-duration queue

4.5.1 Implementation of Policy Design

import random
class Policy():
  def __init__(self, model):
    self.model = model
    self.Policy = namedtuple('Policy', piNAMES) #. 'class'
    self.Theta = namedtuple('Theta', thNAMES) #. 'class'

  def build_policy(self, info):
    return self.Policy(*[info[pin] for pin in piNAMES])

  def build_theta(self, info):
    return self.Theta(*[info[thn] for thn in thNAMES])

  ## Service the shortDurationQueue when its length exceeds thServiceAbove,
  ## else, the longDurationQueue.
  ## x_t=1: service shortDurationQueue; x_t=0: service longDurationQueue
  ## Assumption is that shortDurationQueue is more profitable, and the
  ## decision should center around its status; once it's been taken care of,
  ## the longDurationQueue can be serviced.
  def X__ServiceAbove(self, t, S_t, theta):
    info = {
      'x_t': 0
    }
    if t >= T:
      print(f"ERROR: t={t} should not reach or exceed the max steps ({T})")
      return self.model.build_decision(info)
    if len(self.model.shortDurationQueue) > theta.thServiceAbove:
      ##service shortDuration queue
      info['x_t'] = 1
    else:
      ##service longDuration queue
      info['x_t'] = 0
    return self.model.build_decision(info)

  def run_grid_sample_paths(self, theta, piName, record):
    CcumIomega__lI = []
    for l in range(1, L + 1): #for each sample-path
      M = Model()
      ## P = Policy(M) #NO!, overwrite existing global P
      self.model = M
      record_l = [piName, theta, l]
      for t in range(T): #for each transition/step
        ## print(f'\t%%% {t=}')
        ## >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
        x_t = getattr(self, piName)(t, self.model.S_t, theta) #lookup (new) today's decision

        # sit in post-decision state until end of cycle

        ## S_t, Ccum, x_t = self.model.step(t, x_t, theta)
        record_t = self.model.step(t, x_t, theta)
        ## >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
        record.append(record_l + record_t)
      CcumIomega__lI.append(self.model.Ccum) #just above (SDAM-eq2.9)
    return CcumIomega__lI

  def perform_grid_search_sample_paths(self, piName, thetas):
    Cbarcum = defaultdict(float)
    Ctilcum = defaultdict(float)
    expCbarcum = defaultdict(float)
    expCtilcum = defaultdict(float)
    numThetas = len(thetas)
    record = []
    print(f'{numThetas=:,}')
    nth = 1
    i = 0; print(f'... printing every {nth}th theta (if considered) ...')
    for theta in thetas:
      if True: ##in case relationships between thetas can be exploited
        ## a dict cannot be used as a key, so we define theta_key, e.g.
        ## theta_key = ((168.0, 72.0), (200.0, 90.0)):
        ## theta_key = tuple(tuple(itm.values()) for itm in theta)
        theta_key = theta ##if theta is not a dict
        if i%nth == 0: print(f'=== ({i:,} / {numThetas:,}), theta={theta} ===')

        ## >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
        CcumIomega__lI = self.run_grid_sample_paths(theta, piName, record)
        ## >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

        Cbarcum_tmp = np.array(CcumIomega__lI).mean() #(SDAM-eq2.9)
        Ctilcum_tmp = np.sum(np.square(np.array(CcumIomega__lI) - Cbarcum_tmp))/(L - 1)

        Cbarcum[theta_key] = Cbarcum_tmp
        Ctilcum[theta_key] = np.sqrt(Ctilcum_tmp/L)

        expCbarcum_tmp = pd.Series(CcumIomega__lI).expanding().mean()
        expCbarcum[theta_key] = expCbarcum_tmp

        expCtilcum_tmp = pd.Series(CcumIomega__lI).expanding().std()
        expCtilcum[theta_key] = expCtilcum_tmp
        i += 1
      ##endif
    best_theta = max(Cbarcum, key=Cbarcum.get)
    worst_theta = min(Cbarcum, key=Cbarcum.get)

    best_Cbarcum = Cbarcum[best_theta]
    best_Ctilcum = Ctilcum[best_theta]

    worst_Cbarcum = Cbarcum[worst_theta]
    worst_Ctilcum = Ctilcum[worst_theta]

    thetaStar_expCbarcum = expCbarcum[best_theta]
    thetaStar_expCtilcum = expCtilcum[best_theta]
    thetaStar_expCtilcum[0] = 0 ##set NaN to 0

    return \
      thetaStar_expCbarcum, thetaStar_expCtilcum, \
      Cbarcum, Ctilcum, \
      best_theta, worst_theta, \
      best_Cbarcum, worst_Cbarcum, \
      best_Ctilcum, worst_Ctilcum, \
      record

  ## EXAMPLE:
  ## thetasA: Buy
  ## thetasA_name: 'thBuy'
  ## names: ELA
  ## 1_1: 1 theta sub-vectors, each having 1 theta
  ## thetas = grid_search_thetas_1_2(thetasBuy 'thBuy', CAR_TYPES)
  def grid_search_thetas_1_1(self, thetasA, thetasA_name, names):
    thetas = [
    self.build_theta({thetasA_name: {names[0]: thA0}})
    for thA0 in thetasA[names[0]]
    ]
    return thetas

  ## EXAMPLE:
  ## thetasA: Buy
  ## thetasA_name: 'thBuy'
  ## names: ELA, SON
  ## 1_2: 1 theta sub-vectors, each having 2 thetas
  ## thetas = grid_search_thetas_1_2(thetasBuy 'thBuy', CAR_TYPES)
  def grid_search_thetas_1_2(self, thetasA, thetasA_name, names):
    thetas = [
    self.build_theta({thetasA_name: {names[0]: thA0, names[1]: thA1}})
    for thA0 in thetasA[names[0]]
      for thA1 in thetasA[names[1]]
    ]
    return thetas

  ## EXAMPLE:
  ## thetasA: Adj
  ## thetasA_name: 'thAdj'
  ## names: ELA, SON
  ## 1_4: 1 theta sub-vectors, each having 4 thetas
  ## thetas = grid_search_thetas_1_4(thetasBuy 'thAdj', bNAMES)
  def grid_search_thetas_1_4(self, thetasA, thetasA_name, names):
    thetas = [
    self.build_theta({thetasA_name: {names[0]: thA0, names[1]: thA1, names[2]: thA2, names[3]: thA3}})
    for thA0 in thetasA[names[0]]
      for thA1 in thetasA[names[1]]
        for thA2 in thetasA[names[2]]
          for thA3 in thetasA[names[3]]
    ]
    return thetas

  ## EXAMPLE:
  ## thetasA: Buy
  ## thetasB: Max
  ## thetasA_name: 'thBuy'
  ## thetasB_name: 'thMax'
  ## names: ELA
  ## 2_1: 2 theta sub-vectors, each having 1 theta
  ## thetas = grid_search_thetas_2_1(thetasBuy, thetasMax, 'thBuy', 'thMax', CAR_TYPES)
  def grid_search_thetas_2_1(self, thetasA, thetasB, thetasA_name, thetasB_name, names):
    thetas = [
    self.build_theta({thetasA_name: {names[0]: thA0}, thetasB_name: {names[0]: thB0}})
    for thA0 in thetasA[names[0]]
      for thB0 in thetasB[names[0]]
    ]
    return thetas

  ## EXAMPLE:
  ## thetasA: Buy
  ## thetasB: Max
  ## thetasA_name: 'thBuy'
  ## thetasB_name: 'thMax'
  ## names: ELA, SON
  ## 2_2: 2 theta sub-vectors, each having 2 thetas
  ## thetas = grid_search_thetas_4(thetasBuy, thetasMax, 'thBuy', 'thMax', CAR_TYPES)
  def grid_search_thetas_2_2(self, thetasA, thetasB, thetasA_name, thetasB_name, names):
    thetas = [
    self.build_theta({thetasA_name: {names[0]: thA0, names[1]: thA1}, thetasB_name: {names[0]: thB0, names[1]: thB1}})
    for thA0 in thetasA[names[0]]
      for thA1 in thetasA[names[1]]
        for thB0 in thetasB[names[0]]
          for thB1 in thetasB[names[1]]
    ]
    return thetas


  ############################################################################
  ### PLOTTING
  ############################################################################
  def round_theta(self, complex_theta):
    thetas_rounded = []
    for theta in complex_theta:
      evalues_rounded = []
      for _, evalue in theta.items():
        evalues_rounded.append(float(f"{evalue:f}"))
      thetas_rounded.append(tuple(evalues_rounded))
    return str(tuple(thetas_rounded))

  def plot_Fhat_map_2(self,
      FhatI_theta_I,
      thetasX, thetasY, labelX, labelY, title):
      Fhat_values = [
        FhatI_theta_I[
          (thetaX,thetaY)
          ## ((thetaX,),(thetaY,))
        ]
          for thetaY in thetasY for thetaX in thetasX
      ]
      Fhats = np.array(Fhat_values)
      increment_count = len(thetasX)
      Fhats = np.reshape(Fhats, (-1, increment_count))#.

      fig, ax = plt.subplots()
      im = ax.imshow(Fhats, cmap='hot', origin='lower', aspect='auto')
      ## create colorbar
      cbar = ax.figure.colorbar(im, ax=ax)
      ## cbar.ax.set_ylabel(cbarlabel, rotation=-90, va="bottom")

      ax.set_xticks(np.arange(0, len(thetasX), 5))#.
      ## ax.set_xticks(np.arange(len(thetasX)))

      ax.set_yticks(np.arange(0, len(thetasY), 5))#.
      ## ax.set_yticks(np.arange(len(thetasY)))

      ## NOTE: round tick labels, else very messy
      ## function round() does not work, have to do this way
      thetasX_form = [f'{th:.0f}' for th in thetasX]
      thetasY_form = [f'{th:.0f}' for th in thetasY]

      ax.set_xticklabels(thetasX[::5])
      ## ax.set_xticklabels(thetasX); ax.set_xticklabels(thetasX_form)

      ax.set_yticklabels(thetasY[::5])
      ## ax.set_yticklabels(thetasY); ax.set_yticklabels(thetasY_form)

      ## rotate the tick labels and set their alignment.
      ## plt.setp(ax.get_xticklabels(), rotation=45, ha="right",rotation_mode="anchor")

      ax.set_title(title)
      ax.set_xlabel(labelX)
      ax.set_ylabel(labelY)

      ## fig.tight_layout()
      plt.show()
      return True

  def plot_Fhat_map_4(self,
      FhatI_theta_I,
      thetasX, thetasY, labelX, labelY, title,
      thetaFixed1, thetaFixed2):
      ## Fhat_values = [FhatI_theta_I[(thetaX,thetaY)] for thetaY in thetasY for thetaX in thetasX]
      Fhat_values = [
        FhatI_theta_I[((thetaX,thetaY), (thetaFixed1,thetaFixed2))]
        for thetaY in thetasY
          for thetaX in thetasX]
      Fhats = np.array(Fhat_values)
      increment_count = len(thetasX)
      Fhats = np.reshape(Fhats, (-1, increment_count))#.

      fig, ax = plt.subplots()
      im = ax.imshow(Fhats, cmap='hot', origin='lower', aspect='auto')
      ## create colorbar
      cbar = ax.figure.colorbar(im, ax=ax)
      ## cbar.ax.set_ylabel(cbarlabel, rotation=-90, va="bottom")

      ax.set_xticks(np.arange(0, len(thetasX), 5))#.
      ## ax.set_xticks(np.arange(len(thetasX)))

      ax.set_yticks(np.arange(0, len(thetasY), 5))#.
      ## ax.set_yticks(np.arange(len(thetasY)))

      ## NOTE: round tick labels, else very messy
      ## function round() does not work, have to do this way
      ## thetasX_form = [f'{th:.1f}' for th in thetasX]
      ## thetasY_form = [f'{th:.1f}' for th in thetasY]

      ax.set_xticklabels(thetasX[::5])#.
      ## ax.set_xticklabels(thetasX)
      ## ax.set_xticklabels(thetasX_form)

      ax.set_yticklabels(thetasY[::5])#.
      ## ax.set_yticklabels(thetasY)
      ## ax.set_yticklabels(thetasY_form)

      ## rotate the tick labels and set their alignment.
      ## plt.setp(ax.get_xticklabels(), rotation=45, ha="right",rotation_mode="anchor")

      ax.set_title(title)
      ax.set_xlabel(labelX)
      ax.set_ylabel(labelY)

      ## fig.tight_layout()
      plt.show()
      return True

  ## color_style examples: 'r-', 'b:', 'g--'
  def plot_Fhat_chart(self, FhatI_theta_I, thetasX, labelX, labelY, title, color_style, thetaStar):
      mpl.rcParams['lines.linewidth'] = 1.2
      xylabelsize = 16
      ## plt.figure(figsize=(13, 9))
      plt.figure(figsize=(13, 4))
      plt.title(title, fontsize=20)
      Fhats = FhatI_theta_I.values()
      plt.plot(thetasX, Fhats, color_style)
      plt.axvline(x=thetaStar, color='k', linestyle=':')
      plt.xlabel(labelX, rotation=0, labelpad=10, ha='right', va='center', fontweight='bold', size=xylabelsize)
      plt.ylabel(labelY, rotation=0, labelpad=1, ha='right', va='center', fontweight='normal', size=xylabelsize)
      plt.show()

  ## expanding Fhat chart
  def plot_expFhat_chart(self, df, labelX, labelY, title, color_style):
    mpl.rcParams['lines.linewidth'] = 1.2
    xylabelsize = 16
    plt.figure(figsize=(13, 4))
    plt.title(title, fontsize=20)
    plt.plot(df, color_style)
    plt.xlabel(labelX, rotation=0, labelpad=10, ha='right', va='center', fontweight='bold', size=xylabelsize)
    plt.ylabel(labelY, rotation=0, labelpad=1, ha='right', va='center', fontweight='normal', size=xylabelsize)
    plt.show()

  ## expanding Fhat charts
  def plot_expFhat_charts(self, means, stdvs, labelX, labelY, suptitle, pars=defaultdict(str)):
    n_charts = 2
    xlabelsize = 14
    ylabelsize = 14
    mpl.rcParams['lines.linewidth'] = 1.2
    default_colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
    fig, axs = plt.subplots(n_charts, sharex=True)
    fig.set_figwidth(13); fig.set_figheight(9)
    fig.suptitle(suptitle, fontsize=18)

    xi = 0
    legendlabels = []
    axs[xi].set_title(r"$exp\bar{C}^{cum}(\theta^*)$", loc='right', fontsize=16)
    for i,itm in enumerate(means.items()):
      axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
      leg = axs[xi].plot(itm[1], color=pars['colors'][i])
      legendlabels.append(itm[0])
    axs[xi].set_ylabel(labelY, rotation=0, ha='right', va='center', fontweight='normal', size=ylabelsize)

    xi = 1
    axs[xi].set_title(r"$exp\tilde{C}^{cum}(\theta^*)$", loc='right', fontsize=16)
    for i,itm in enumerate(stdvs.items()):
      axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
      ## leg = axs[xi].plot(itm[1], default_colors[i], linestyle='--')
      leg = axs[xi].plot(itm[1], pars['colors'][i], linestyle='--')
    axs[xi].set_ylabel(labelY, rotation=0, ha='right', va='center', fontweight='normal', size=ylabelsize)

    fig.legend(
      ## [leg],
      labels=legendlabels,
      title="Policies",
      loc='upper right',
      fancybox=True,
      shadow=True,
      ncol=1)
    plt.xlabel(labelX, rotation=0, labelpad=10, ha='right', va='center', fontweight='normal', size=xlabelsize)
    plt.show()

  def plot_records(self, df, df_non, pars=defaultdict(str)):
    n_a = len(aNAMES)
    n_b = len(bNAMES)
    n_x = 1
    n_charts = n_x + n_b + n_b + 2 + n_a + 1
    ylabelsize = 14
    mpl.rcParams['lines.linewidth'] = 1.2
    mycolors = ['g', 'b']
    fig, axs = plt.subplots(n_charts, sharex=True)
    ## fig.set_figwidth(13); fig.set_figheight(9)
    fig.set_figwidth(13); fig.set_figheight(20)
    fig.suptitle(pars['suptitle'], fontsize=14)

    xi = 0
    i = 0
    axs[xi+i].set_ylim(auto=True); axs[xi+i].spines['top'].set_visible(False); axs[xi+i].spines['right'].set_visible(True); axs[xi+i].spines['bottom'].set_visible(False)
    axs[xi+i].step(df[f'x_t'], 'm-', where='post')
    if not df_non is None: axs[xi+i].step(df_non[f'x_t'], 'c-.', where='post')
    axs[xi+i].axhline(y=0, color='k', linestyle=':')
    y1ab = '$x_{t}$'
    axs[xi+i].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
    for j in range(df.shape[0]//T): axs[xi+i].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = n_x
    for i,rt in enumerate(RESOURCE_TYPES):
      y1ab = '$D_{t,'+f'{rt}'+'}$'
      axs[xi+i].set_ylim(auto=True); axs[xi+i].spines['top'].set_visible(False); axs[xi+i].spines['right'].set_visible(True); axs[xi+i].spines['bottom'].set_visible(False)
      axs[xi+i].step(df[f'D_t_{rt}'], mycolors[xi%len(mycolors)], where='post')
      if not df_non is None: axs[xi+i].step(df_non[f'D_t_{rt}'], 'c-.', where='post')
      axs[xi+i].axhline(y=SIM.muD[rt], color='k', linestyle=':')
      axs[xi+i].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      for j in range(df.shape[0]//T): axs[xi+i].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = n_x + n_b
    for i,b in enumerate(bNAMES):
      y1ab = '$Dcum_{t,'+f'{b}'+'}$'
      axs[xi+i].set_ylim(auto=True); axs[xi+i].spines['top'].set_visible(False); axs[xi+i].spines['right'].set_visible(True); axs[xi+i].spines['bottom'].set_visible(False)
      axs[xi+i].step(df[f'Dcum_t_{b}'], mycolors[xi%len(mycolors)], where='post')
      if not df_non is None: axs[xi+i].step(df_non[f'Dcum_t_{b}'], 'c-.', where='post')
      axs[xi+i].axhline(y=0, color='k', linestyle=':')
      axs[xi+i].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      for j in range(df.shape[0]//T): axs[xi+i].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = n_x + n_b + n_b
    y1ab = '$L^{Short}_{t}$'
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df[f'LShort_t'], 'r-', where='post')
    if not df_non is None: axs[xi].step(df_non[f'LShort_t'], 'c-.', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
    for j in range(df.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = n_x + n_b + n_b + 1
    y1ab = '$L^{Long}_{t}$'
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df[f'LLong_t'], 'r-', where='post')
    if not df_non is None: axs[xi].step(df_non[f'LLong_t'], 'c-.', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
    for j in range(df.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = n_x + n_b + n_b + 2
    for i,a in enumerate(aNAMES):
      axs[xi+i].set_ylim(auto=True); axs[xi+i].spines['top'].set_visible(False); axs[xi+i].spines['right'].set_visible(True); axs[xi+i].spines['bottom'].set_visible(False)
      axs[xi+i].step(df[f'RInvsOut_t_{a}'], 'm-', where='post')
      axs[xi+i].step(df[f'RDuration_t_{a}'], 'k:', where='post')
      if not df_non is None: axs[xi+i].step(df_non[f'RInvsOut_t_{a}'], 'c-.', where='post')
      if not df_non is None: axs[xi+i].step(df_non[f'RDuration_t_{a}'], 'c-.', where='post')
      axs[xi+i].axhline(y=0, color='k', linestyle=':')
      al = a.split('_'); al = al[0]+'\_'+al[1]; y1ab = '$RInvsOut_{t,'+f'{al}'+'}$'
      axs[xi+i].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      for j in range(df.shape[0]//T): axs[xi+i].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = n_x + n_b + n_b + 2 + n_a
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df['Ccum'], 'm-', where='post')
    if not df_non is None: axs[xi].step(df_non['Ccum'], 'c-.', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    axs[xi].set_ylabel('$C^{cum}$'+'\n'+'$\mathrm{(Profit)}$'+'\n'+''+'$\mathrm{[\$]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
    for j in range(df.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    axs[xi].set_xlabel('$t\ \mathrm{[decision\ windows]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
    if(pars['legendLabels']): fig.legend(labels=pars['legendLabels'], loc='lower left', fontsize=16)

  def plot_evalu_comparison(self, df1, df2, df3, pars=defaultdict(str)):
    legendlabels = ['X__BuyBelow', 'X__Bellman']
    n_charts = 5
    ylabelsize = 14
    mpl.rcParams['lines.linewidth'] = 1.2
    fig, axs = plt.subplots(n_charts, sharex=True)
    fig.set_figwidth(13); fig.set_figheight(9)
    thetaStarStr = []
    for cmp in pars["thetaStar"]: thetaStarStr.append(f'{cmp:.1f}')
    thetaStarStr = '(' + ', '.join(thetaStarStr) + ')'
    fig.suptitle(pars['suptitle'], fontsize=14)

    xi = 0
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df1[f'x_t'], 'r-', where='post')
    axs[xi].step(df2[f'x_t'], 'g-.', where='post')
    axs[xi].step(df3[f'x_t'], 'b:', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    y1ab = '$x_{t}$'
    axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
    for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = 1
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df1[f'R_t'], 'r-', where='post')
    axs[xi].step(df2[f'R_t'], 'g-.', where='post')
    axs[xi].step(df3[f'R_t'], 'b:', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    y1ab = '$R_{t}$'
    axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
    for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = 2
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df1[f'p_t'], 'r-', where='post')
    axs[xi].step(df2[f'p_t'], 'g-.', where='post')
    axs[xi].step(df3[f'p_t'], 'b:', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')

    if(pars['lower_non']): axs[xi].text(-4, pars['lower_non'], r'$\theta^{lower}$' + f"={pars['lower_non']:.1f}", size=10, color='c')
    if(pars['lower_non']): axs[xi].axhline(y=pars['lower_non'], color='c', linestyle=':')

    if(pars['upper_non']): axs[xi].text(-4, pars['upper_non'], r'$\theta^{upper}$' + f"={pars['upper_non']:.1f}", size=10, color='c')
    if(pars['upper_non']): axs[xi].axhline(y=pars['upper_non'], color='c', linestyle=':')

    if(pars['lower']): axs[xi].text(-4, pars['lower'], r'$\theta^{lower}$' + f"={pars['lower']:.1f}", size=10, color='m')
    if(pars['lower']): axs[xi].axhline(y=pars['lower'], color='m', linestyle=':')

    if(pars['upper']): axs[xi].text(-4, pars['upper'], r'$\theta^{upper}$' + f"={pars['upper']:.1f}", size=10, color='m')
    if(pars['upper']): axs[xi].axhline(y=pars['upper'], color='m', linestyle=':')

    if(pars['alpha_non']): axs[xi].text(-4, pars['alpha_non'], r'$\theta^{alpha}$' + f"={pars['alpha_non']:.1f}", size=10, color='c')
    if(pars['alpha_non']): axs[xi].axhline(y=pars['alpha_non'], color='c', linestyle=':')

    if(pars['trackSignal_non']): axs[xi].text(-4, pars['trackSignal_non'], r'$\theta^{trackSignal}$' + f"={pars['trackSignal_non']:.1f}", size=10, color='c')
    if(pars['trackSignal_non']): axs[xi].axhline(y=pars['trackSignal_non'], color='c', linestyle=':')

    if(pars['alpha']): axs[xi].text(-4, pars['alpha'], r'$\theta^{alpha}$' + f"={pars['alpha']:.1f}", size=10, color='m')
    if(pars['alpha']): axs[xi].axhline(y=pars['alpha'], color='m', linestyle=':')

    if(pars['trackSignal']): axs[xi].text(-4, pars['trackSignal'], r'$\theta^{trackSignal}$' + f"={pars['trackSignal']:.1f}", size=10, color='m')
    if(pars['trackSignal']): axs[xi].axhline(y=pars['trackSignal'], color='m', linestyle=':')

    y1ab = '$p_{t}$'
    axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
    for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = 3
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df1['b_t_val'], 'r-', where='post')
    axs[xi].step(df2['b_t_val'], 'g-.', where='post')
    axs[xi].step(df3['b_t_val'], 'b:', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    y1ab = '$b_{t,val}$'
    axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
    for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = 4
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df1['Ccum'], 'r-', where='post')
    axs[xi].step(df2['Ccum'], 'g-.', where='post')
    axs[xi].step(df3['Ccum'], 'b:', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    axs[xi].set_ylabel('$\mathrm{cumC}$'+'\n'+'$\mathrm{(Profit)}$'+'\n'+''+'$\mathrm{[\$]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
    axs[xi].set_xlabel('$t\ \mathrm{[decision\ windows]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
    for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    fig.legend(
      ## [leg],
      labels=legendlabels,
      title="Policies",
      loc='upper right',
      fontsize=16,
      fancybox=True,
      shadow=True,
      ncol=1)

4.6 Policy Evaluation

4.6.1 Training/Tuning

## setup labels to plot info
RInvsOut_t_labels = ['RInvsOut_t_'+an for an in aNAMES]
RDuration_t_labels = ['RDuration_t_'+an for an in aNAMES]
D_t_labels = ['D_t_'+rt for rt in RESOURCE_TYPES]
Dcum_t_labels = ['Dcum_t_'+rt for rt in RESOURCE_TYPES]
LShort_t_labels = ['LShort_t']
LLong_t_labels = ['LLong_t']
x_t_labels = ['x_t']
labels = ['piName', 'theta', 'l'] + \
  ['t'] + \
  RInvsOut_t_labels + RDuration_t_labels + \
  D_t_labels + Dcum_t_labels + \
  LShort_t_labels + \
  LLong_t_labels + \
  ['Ccum'] + \
  x_t_labels
def grid_search_thetas(thetas1, thetas2, thetas1_name, thetas2_name):
  thetas = [
    P.build_theta({thetas1_name: thA0, thetas2_name: thB0})
    # (thA0, thB0)
    for thA0 in thetas1
      for thB0 in thetas2
  ]
  return thetas
%%time
L = 30 #30pub #20db #N_SAMPLEPATHS
T = 20 #20pub #10db #N_TRANSITIONS
first_n_t = 500
last_n_t = 500

M = Model()
P = Policy(M)
SIM = DemandSimulator(seed=SEED_TRAIN)

thetasShortDuration = np.arange(
  TH_ShortDuration_SPEC[0],
  TH_ShortDuration_SPEC[1],
  TH_ShortDuration_SPEC[2])
thetasServiceAbove = np.arange(
  TH_ServiceAbove_SPEC[0],
  TH_ServiceAbove_SPEC[1],
  TH_ServiceAbove_SPEC[2])
thetas = grid_search_thetas(
  thetasShortDuration,
  thetasServiceAbove,
  'thShortDuration',
  'thServiceAbove')

thetaStar_expCbarcum_ServiceAbove, thetaStar_expCtilcum_ServiceAbove, \
Cbarcum_ServiceAbove, Ctilcum_ServiceAbove, \
best_theta_ServiceAbove, worst_theta_ServiceAbove, \
best_Cbarcum_ServiceAbove, worst_Cbarcum_ServiceAbove, \
best_Ctilcum_ServiceAbove, worst_Ctilcum_ServiceAbove, \
record_ServiceAbove = \
  P.perform_grid_search_sample_paths('X__ServiceAbove', thetas)
f'{thetaStar_expCbarcum_ServiceAbove.iloc[-1]=:.2f}'
df_first_n_t = pd.DataFrame.from_records(record_ServiceAbove[:first_n_t], columns=labels)
df_last_n_t = pd.DataFrame.from_records(record_ServiceAbove[-last_n_t:], columns=labels)
numThetas=64
... printing every 1th theta (if considered) ...
=== (0 / 64), theta=Theta(thShortDuration=3, thServiceAbove=1) ===
=== (1 / 64), theta=Theta(thShortDuration=3, thServiceAbove=2) ===
=== (2 / 64), theta=Theta(thShortDuration=3, thServiceAbove=3) ===
=== (3 / 64), theta=Theta(thShortDuration=3, thServiceAbove=4) ===
=== (4 / 64), theta=Theta(thShortDuration=3, thServiceAbove=5) ===
=== (5 / 64), theta=Theta(thShortDuration=3, thServiceAbove=6) ===
=== (6 / 64), theta=Theta(thShortDuration=3, thServiceAbove=7) ===
=== (7 / 64), theta=Theta(thShortDuration=3, thServiceAbove=8) ===
=== (8 / 64), theta=Theta(thShortDuration=4, thServiceAbove=1) ===
=== (9 / 64), theta=Theta(thShortDuration=4, thServiceAbove=2) ===
=== (10 / 64), theta=Theta(thShortDuration=4, thServiceAbove=3) ===
=== (11 / 64), theta=Theta(thShortDuration=4, thServiceAbove=4) ===
=== (12 / 64), theta=Theta(thShortDuration=4, thServiceAbove=5) ===
=== (13 / 64), theta=Theta(thShortDuration=4, thServiceAbove=6) ===
=== (14 / 64), theta=Theta(thShortDuration=4, thServiceAbove=7) ===
=== (15 / 64), theta=Theta(thShortDuration=4, thServiceAbove=8) ===
=== (16 / 64), theta=Theta(thShortDuration=5, thServiceAbove=1) ===
=== (17 / 64), theta=Theta(thShortDuration=5, thServiceAbove=2) ===
=== (18 / 64), theta=Theta(thShortDuration=5, thServiceAbove=3) ===
=== (19 / 64), theta=Theta(thShortDuration=5, thServiceAbove=4) ===
=== (20 / 64), theta=Theta(thShortDuration=5, thServiceAbove=5) ===
=== (21 / 64), theta=Theta(thShortDuration=5, thServiceAbove=6) ===
=== (22 / 64), theta=Theta(thShortDuration=5, thServiceAbove=7) ===
=== (23 / 64), theta=Theta(thShortDuration=5, thServiceAbove=8) ===
=== (24 / 64), theta=Theta(thShortDuration=6, thServiceAbove=1) ===
=== (25 / 64), theta=Theta(thShortDuration=6, thServiceAbove=2) ===
=== (26 / 64), theta=Theta(thShortDuration=6, thServiceAbove=3) ===
=== (27 / 64), theta=Theta(thShortDuration=6, thServiceAbove=4) ===
=== (28 / 64), theta=Theta(thShortDuration=6, thServiceAbove=5) ===
=== (29 / 64), theta=Theta(thShortDuration=6, thServiceAbove=6) ===
=== (30 / 64), theta=Theta(thShortDuration=6, thServiceAbove=7) ===
=== (31 / 64), theta=Theta(thShortDuration=6, thServiceAbove=8) ===
=== (32 / 64), theta=Theta(thShortDuration=7, thServiceAbove=1) ===
=== (33 / 64), theta=Theta(thShortDuration=7, thServiceAbove=2) ===
=== (34 / 64), theta=Theta(thShortDuration=7, thServiceAbove=3) ===
=== (35 / 64), theta=Theta(thShortDuration=7, thServiceAbove=4) ===
=== (36 / 64), theta=Theta(thShortDuration=7, thServiceAbove=5) ===
=== (37 / 64), theta=Theta(thShortDuration=7, thServiceAbove=6) ===
=== (38 / 64), theta=Theta(thShortDuration=7, thServiceAbove=7) ===
=== (39 / 64), theta=Theta(thShortDuration=7, thServiceAbove=8) ===
=== (40 / 64), theta=Theta(thShortDuration=8, thServiceAbove=1) ===
=== (41 / 64), theta=Theta(thShortDuration=8, thServiceAbove=2) ===
=== (42 / 64), theta=Theta(thShortDuration=8, thServiceAbove=3) ===
=== (43 / 64), theta=Theta(thShortDuration=8, thServiceAbove=4) ===
=== (44 / 64), theta=Theta(thShortDuration=8, thServiceAbove=5) ===
=== (45 / 64), theta=Theta(thShortDuration=8, thServiceAbove=6) ===
=== (46 / 64), theta=Theta(thShortDuration=8, thServiceAbove=7) ===
=== (47 / 64), theta=Theta(thShortDuration=8, thServiceAbove=8) ===
=== (48 / 64), theta=Theta(thShortDuration=9, thServiceAbove=1) ===
=== (49 / 64), theta=Theta(thShortDuration=9, thServiceAbove=2) ===
=== (50 / 64), theta=Theta(thShortDuration=9, thServiceAbove=3) ===
=== (51 / 64), theta=Theta(thShortDuration=9, thServiceAbove=4) ===
=== (52 / 64), theta=Theta(thShortDuration=9, thServiceAbove=5) ===
=== (53 / 64), theta=Theta(thShortDuration=9, thServiceAbove=6) ===
=== (54 / 64), theta=Theta(thShortDuration=9, thServiceAbove=7) ===
=== (55 / 64), theta=Theta(thShortDuration=9, thServiceAbove=8) ===
=== (56 / 64), theta=Theta(thShortDuration=10, thServiceAbove=1) ===
=== (57 / 64), theta=Theta(thShortDuration=10, thServiceAbove=2) ===
=== (58 / 64), theta=Theta(thShortDuration=10, thServiceAbove=3) ===
=== (59 / 64), theta=Theta(thShortDuration=10, thServiceAbove=4) ===
=== (60 / 64), theta=Theta(thShortDuration=10, thServiceAbove=5) ===
=== (61 / 64), theta=Theta(thShortDuration=10, thServiceAbove=6) ===
=== (62 / 64), theta=Theta(thShortDuration=10, thServiceAbove=7) ===
=== (63 / 64), theta=Theta(thShortDuration=10, thServiceAbove=8) ===
CPU times: user 21min 23s, sys: 23.9 s, total: 21min 47s
Wall time: 22min 2s
best_theta_ServiceAbove
Theta(thShortDuration=10, thServiceAbove=2)
P.plot_Fhat_map_2(
  FhatI_theta_I=Cbarcum_ServiceAbove,
  thetasX=thetasShortDuration,
  thetasY=thetasServiceAbove,
  labelX=r'$\theta^{ShortDuration}$',
  labelY=r'$\theta^{ServiceAbove}$',
  title="Sample mean of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$("+ str(best_theta_ServiceAbove[0])+", "+ str(best_theta_ServiceAbove[1])+"), " \
    r"$\bar{C}^{cum}(\theta^*) =$"+f"{best_Cbarcum_ServiceAbove:,.0f}\n"
)
print()
P.plot_Fhat_map_2(
  FhatI_theta_I=copy(Ctilcum_ServiceAbove),
  thetasX=thetasShortDuration,
  thetasY=thetasServiceAbove,
  labelX=r'$\theta^{ShortDuration}$',
  labelY=r'$\theta^{ServiceAbove}$',
  title="Standard error of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$("+ str(best_theta_ServiceAbove[0])+", "+ str(best_theta_ServiceAbove[1])+"), " \
    r"$\tilde{C}^{cum}(\theta^*) =$"+f"{best_Ctilcum_ServiceAbove:,.0f}\n"
);

P.plot_expFhat_chart(
  df=thetaStar_expCbarcum_ServiceAbove,
  labelX=r'$\ell$',
  labelY=r"$exp\bar{C}^{cum}(\theta^*)$"+"\n(Profit)\n[$]",
  title="Expanding sample mean of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$("+ str(best_theta_ServiceAbove[0])+", "+ str(best_theta_ServiceAbove[1])+"), " \
    r"$\bar{C}^{cum}(\theta^*) =$"+f"{best_Cbarcum_ServiceAbove:,.0f}\n",
  color_style='b-'
)
print()
P.plot_expFhat_chart(
  df=thetaStar_expCtilcum_ServiceAbove,
  labelX=r'$\ell$',
  labelY=r"$exp\bar{C}^{cum}(\theta^*)$"+"\n(Profit)\n[$]",
  title="Expanding standard error of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$("+ str(best_theta_ServiceAbove[0])+", "+ str(best_theta_ServiceAbove[1])+"), " \
    r"$\tilde{C}^{cum}(\theta^*) =$"+f"{best_Ctilcum_ServiceAbove:,.0f}\n",
  color_style='b--'
)

f'{len(record_ServiceAbove):,}', L, T
('38,400', 100, 100)
best_theta_ServiceAbove
Theta(thShortDuration=10, thServiceAbove=2)
P.plot_records(
  df=df_first_n_t,
  df_non=None,
  pars=defaultdict(str, {
    'thShortDuration': best_theta_ServiceAbove.thShortDuration,
    'thServiceAbove': best_theta_ServiceAbove.thServiceAbove,
    # 'legendLabels': [r'$\mathrm{opt}$', r'$\mathrm{non}$'],
    'suptitle': f'TRAINING OF X__ServiceAbove POLICY'+'\n'+f'(first {first_n_t} records)'+'\n'+ \
    f'L = {L}, T = {T}, '+ \
    r"$\theta^*=$("+ str(best_theta_ServiceAbove[0])+", "+ str(best_theta_ServiceAbove[1])+")"
  }),
)

P.plot_records(
  df=df_last_n_t,
  df_non=None,
  pars=defaultdict(str, {
    'thShortDuration': best_theta_ServiceAbove.thShortDuration,
    'thServiceAbove': best_theta_ServiceAbove.thServiceAbove,
    # 'legendLabels': [r'$\mathrm{opt}$', r'$\mathrm{non}$'],
    'suptitle': f'TRAINING OF X__ServiceAbove POLICY'+'\n'+f'(last {last_n_t} records)'+'\n'+ \
    f'L = {L}, T = {T}, '+ \
    r"$\theta^*=$("+ str(best_theta_ServiceAbove[0])+", "+ str(best_theta_ServiceAbove[1])+")"
  }),
)

4.6.2 Evaluation

4.6.2.1 X__ServiceAbove
best_theta_ServiceAbove
Theta(thShortDuration=10, thServiceAbove=2)
worst_theta_ServiceAbove
Theta(thShortDuration=5, thServiceAbove=1)
%%time
L = 30 #30pub #20db #N_SAMPLEPATHS
T = 20 #20pub #10db #N_TRANSITIONS
first_n_t = int(1*L*T)

M = Model()
P = Policy(M)
SIM = DemandSimulator(seed=SEED_EVALU)
thetasOpt = []; thetasOpt.append(best_theta_ServiceAbove)
thetaStar_expCbarcum_ServiceAbove_evalu_opt, thetaStar_expCtilcum_ServiceAbove_evalu_opt, \
_, _, \
best_theta_ServiceAbove_evalu_opt, worst_theta_ServiceAbove_evalu_opt, \
_, _, \
_, _, \
record_ServiceAbove_evalu_opt = \
  P.perform_grid_search_sample_paths('X__ServiceAbove', thetasOpt)
df_ServiceAbove_evalu_opt = pd.DataFrame.from_records(
    record_ServiceAbove_evalu_opt[:first_n_t], columns=labels)

M = Model()
P = Policy(M)
SIM = DemandSimulator(seed=SEED_EVALU)
thetasNon = []; thetasNon.append(worst_theta_ServiceAbove)
## thetasNon = []; thetasNon.append(
#   P.build_theta(
#     {'thShortDuration': 12, 'thServiceAbove': 5}
#   )
# )
thetaStar_expCbarcum_ServiceAbove_evalu_non, thetaStar_expCtilcum_ServiceAbove_evalu_non, \
_, _, \
best_theta_ServiceAbove_evalu_non, worst_theta_ServiceAbove_evalu_non, \
_, _, \
_, _, \
record_ServiceAbove_evalu_non = \
  P.perform_grid_search_sample_paths('X__ServiceAbove', thetasNon)
df_ServiceAbove_evalu_non = pd.DataFrame.from_records(
    record_ServiceAbove_evalu_non[:first_n_t], columns=labels)

print(
  f'{thetaStar_expCbarcum_ServiceAbove_evalu_opt.iloc[-1]=:.2f}, \
    {thetaStar_expCbarcum_ServiceAbove_evalu_non.iloc[-1]=:.2f}')
numThetas=1
... printing every 1th theta (if considered) ...
=== (0 / 1), theta=Theta(thShortDuration=10, thServiceAbove=2) ===
numThetas=1
... printing every 1th theta (if considered) ...
=== (0 / 1), theta=Theta(thShortDuration=5, thServiceAbove=1) ===
thetaStar_expCbarcum_ServiceAbove_evalu_opt.iloc[-1]=69352.99,     thetaStar_expCbarcum_ServiceAbove_evalu_non.iloc[-1]=45748.10
CPU times: user 37.2 s, sys: 519 ms, total: 37.7 s
Wall time: 41.1 s
P.plot_records(
  df=df_ServiceAbove_evalu_opt,
  df_non=df_ServiceAbove_evalu_non,
  pars=defaultdict(str, {
    'thShortDuration': best_theta_ServiceAbove_evalu_opt.thShortDuration,
    'thServiceAbove': best_theta_ServiceAbove_evalu_opt.thServiceAbove,
    'thShortDurationNon': best_theta_ServiceAbove_evalu_non.thShortDuration,
    'thServiceAboveNon': best_theta_ServiceAbove_evalu_non.thServiceAbove,
    'legendLabels': [r'$\mathrm{opt}$', r'$\mathrm{non}$'],
    'suptitle': f'EVALUATION OF X__ServiceAbove POLICY'+'\n'+f'(first {first_n_t} records)'+'\n'+ \
    f'L = {L}, T = {T}, '+ \
    r"$\theta^*=$("+ str(best_theta_ServiceAbove_evalu_opt[0])+", "+ str(best_theta_ServiceAbove_evalu_opt[1])+")"
  }),
)

## last_n_l = int(.99*L)
last_n_l = int(1.0*L)
P.plot_expFhat_charts(
  means={
      'ServiceAbove optimal': thetaStar_expCbarcum_ServiceAbove_evalu_opt[-last_n_l:],
      'ServiceAbove non-optimal': thetaStar_expCbarcum_ServiceAbove_evalu_non[-last_n_l:],
  },
  stdvs={
      'ServiceAbove optimal': thetaStar_expCtilcum_ServiceAbove_evalu_opt[-last_n_l:],
      'ServiceAbove non-optimal': thetaStar_expCtilcum_ServiceAbove_evalu_non[-last_n_l:],
  },
  labelX='Sample paths, ' + r'$\ell$',
  labelY='Profit\n[$]',
  suptitle=f"Comparison of Optimal/Non-optimal Policies after Evaluation\n \
    L = {L}, T = {T}\n \
    last {last_n_l} records\n \
    ('exp' refers to expanding)",
  pars=defaultdict(str, {
    'colors': ['m', 'c']
  }),
)

Next, we evaluate with a single, very long sample-path:

%%time
L = 1 #N_SAMPLEPATHS
T = 100 #N_TRANSITIONS
first_n_t = int(1*L*T)

M = Model()
P = Policy(M)
SIM = DemandSimulator(seed=SEED_EVALU)
thetasOpt = []; thetasOpt.append(best_theta_ServiceAbove)
thetaStar_expCbarcum_ServiceAbove_evalu_opt, thetaStar_expCtilcum_ServiceAbove_evalu_opt, \
_, _, \
best_theta_ServiceAbove_evalu_opt, worst_theta_ServiceAbove_evalu_opt, \
_, _, \
_, _, \
record_ServiceAbove_evalu_opt = \
  P.perform_grid_search_sample_paths('X__ServiceAbove', thetasOpt)
df_ServiceAbove_evalu_opt = pd.DataFrame.from_records(
    record_ServiceAbove_evalu_opt[:first_n_t], columns=labels)

M = Model()
P = Policy(M)
SIM = DemandSimulator(seed=SEED_EVALU)
thetasNon = []; thetasNon.append(worst_theta_ServiceAbove)
## thetasNon = []; thetasNon.append(
#   P.build_theta(
#     {'thShortDuration': 9, 'thServiceAbove': 50}
#   )
# )
thetaStar_expCbarcum_ServiceAbove_evalu_non, thetaStar_expCtilcum_ServiceAbove_evalu_non, \
_, _, \
best_theta_ServiceAbove_evalu_non, worst_theta_ServiceAbove_evalu_non, \
_, _, \
_, _, \
record_ServiceAbove_evalu_non = \
  P.perform_grid_search_sample_paths('X__ServiceAbove', thetasNon)
df_ServiceAbove_evalu_non = pd.DataFrame.from_records(
    record_ServiceAbove_evalu_non[:first_n_t], columns=labels)

print(
  f'{thetaStar_expCbarcum_ServiceAbove_evalu_opt.iloc[-1]=:.2f}, \
    {thetaStar_expCbarcum_ServiceAbove_evalu_non.iloc[-1]=:.2f}')
numThetas=1
... printing every 1th theta (if considered) ...
=== (0 / 1), theta=Theta(thShortDuration=10, thServiceAbove=2) ===
numThetas=1
... printing every 1th theta (if considered) ...
=== (0 / 1), theta=Theta(thShortDuration=5, thServiceAbove=1) ===
thetaStar_expCbarcum_ServiceAbove_evalu_opt.iloc[-1]=366863.14,     thetaStar_expCbarcum_ServiceAbove_evalu_non.iloc[-1]=240891.71
CPU times: user 13.7 s, sys: 259 ms, total: 14 s
Wall time: 16.9 s
RuntimeWarning: invalid value encountered in double_scalars
  Ctilcum_tmp = np.sum(np.square(np.array(CcumIomega__lI) - Cbarcum_tmp))/(L - 1)
<ipython-input-36-c6dabb6c7054>:158: RuntimeWarning: invalid value encountered in double_scalars
  Ctilcum_tmp = np.sum(np.square(np.array(CcumIomega__lI) - Cbarcum_tmp))/(L - 1)
P.plot_records(
  df=df_ServiceAbove_evalu_opt,
  df_non=df_ServiceAbove_evalu_non,
  pars=defaultdict(str, {
    'thShortDuration': best_theta_ServiceAbove_evalu_opt.thShortDuration,
    'thServiceAbove': best_theta_ServiceAbove_evalu_opt.thServiceAbove,
    'thShortDurationNon': best_theta_ServiceAbove_evalu_non.thShortDuration,
    'thServiceAboveNon': best_theta_ServiceAbove_evalu_non.thServiceAbove,
    'legendLabels': [r'$\mathrm{opt}$', r'$\mathrm{non}$'],
    'suptitle': f'EVALUATION OF X__ServiceAbove POLICY'+'\n'+f'(first {first_n_t} records)'+'\n'+ \
    f'L = {L}, T = {T}, '+ \
    r"$\theta^*=$("+ str(best_theta_ServiceAbove_evalu_opt[0])+", "+ str(best_theta_ServiceAbove_evalu_opt[1])+")"
  }),
)

df_ServiceAbove_evalu_opt[df_ServiceAbove_evalu_opt['t']==T-1][['Ccum']]
Ccum
99 366,863.1400
df_ServiceAbove_evalu_non[df_ServiceAbove_evalu_non['t']==T-1][['Ccum']]
Ccum
99 240,891.7120

From the Ccum plot we see that the cumulative reward for the optimal policy keeps on rising. The non-optimal policy keeps losing money.