Automatic and optimal shift scheduling using Reinforcement Learning (Part 4)

Back to Portfolio of Projects | LearnableLoopAI.com | Blog |

0 INTRODUCTION

In the previous part we added tracking of the cumulative unmet resource demands (Ucum) by resource type. We now also track the overall or total unmet demands. We are also keeping track of the associated pandas datetime for each integer value of the time $t$ . Additionally we added another parameter to be learned by the agent: $θ^{S i c k P r o b}$ . We are also making use of past employee data to calculate, for each month of the year, the probability that a particular resource will call in sick or not show up for work. To be considered a candidate for allocation to a shift a resource’s sick probability for the month must be lower than $θ^{S i c k P r o b}$ .

In this part we change the policy. First, we renamed it from X__AllocBelow to X__Alloc. We now have the policy consider costs/contributions instead of excluding candidates below thCumShifts & thSickProb. We found that the agent exploits the situation by having thCumShifts and thSickProb near their max values so that it can use the complete pool. So, instead of using thresholds below which resources are included as candidates, we will always have all (available) resources in the candidates pool. However, we will have the agent earn rewards or incur penalties to discourage allocation of less ideal resources. The new parameters will be:

$θ^{C u m S h i f t s}$
$θ^{S i c k P r o b}$

Each of these will operate as a weighting factor to weight the associated component of the cumulative reward $C^{c u m}$ :

$C^{c u m} = θ^{C u m S h i f t s} . C_{C u m S h i f t s}^{c u m} + θ^{S i c k P r o b} . C_{S i c k P r o b}^{c u m}$

To keep the visualizations manageable, we will only have the resources:

Courtesy Clerk (7 resources)
Stocker (3 resources)

In a later part we will also add:

Cleaner (2 resources)
Curbsider (4 resources)

Daily demands are provided by a stochastic simulator based on past needs and trends.

The overall structure of this project and report follows the traditional CRISP-DM format. However, instead of the CRISP-DM’S “4 Modeling” section, we inserted the “6 step modeling process” of Dr. Warren Powell in section 4 of this document. Dr Powell’s universal framework shows great promise for unifying the formalisms of at least a dozen different fields. Using his framework enables easier access to thinking patterns in these other fields that might be beneficial and informative to the sequential decision problem at hand. Traditionally, this kind of problem would be approached from the reinforcement learning perspective. However, using Dr. Powell’s wider and more comprehensive perspective almost certainly provides additional value.

Here is information on Dr. Powell’s perspective on Sequential Decision Analytics.

In order to make a strong mapping between the code in this notebook and the mathematics in the Powell Universal Framework (PUF), we follow the following convention for naming Python identifier names:

How to read/say
- var name & flavor first
- at t/n
- for entity OR of/with attribute
- ${\hat{R}}_{t + 1, a}^{f a i l}$ has code Rhat__fail_tt1_a which is read: “Rhatfail at t+1 of/with (attribute) a”
Superscripts
- variable names have a double underscore to indicate a superscript
- $X^{π}$ : has code X__pi, is read X pi
- when there is a ‘natural’ distinction between the variable symbol and the superscript (e.g. a change in case), the double underscore is sometimes omitted: Xpi instead of X__pi, or MSpend_t instead of M__Spend_t
Subscripts
- variable names have a single underscore to indicate a subscript
- $S_{t}$ : has code S_t, is read ‘S at t’
- $M_{t}^{S p e n d}$ has code M__Spend_t which is read: “MSpend at t”
- ${\hat{R}}_{t + 1, a}^{f a i l}$ has code Rhat__fail_tt1_a which is read: “Rhatfail at t+1 of/with (attribute) a” [RLSO-p436]
Arguments
- collection variable names may have argument information added
- $X^{π} (S_{t})$ : has code X__piIS_tI, is read ‘X pi in S at t’
- the surrounding I’s are used to imitate the parentheses around the argument
Next time/iteration
- variable names that indicate one step in the future are quite common
- $R_{t + 1}$ : has code R_tt1, is read ‘R at t+1’
- $R^{n + 1}$ : has code R__nt1, is read ‘R at n+1’
Rewards
- State-independent terminal reward and cumulative reward
  - $F$ : has code F for terminal reward
  - $\sum_{n} F$ : has code cumF for cumulative reward
- State-dependent terminal reward and cumulative reward
  - $C$ : has code C for terminal reward
  - $\sum_{t} C$ : has code cumC for cumulative reward
Vectors where components use different names
- $S_{t} (R_{t}, p_{t})$ : has code S_t.R_t and S_t.p_t, is read ‘S at t in R at t, and, S at t in p at t’
- the code implementation is by means of a named tuple
  - self.State = namedtuple('State', SVarNames) for the ‘class’ of the vector
  - self.S_t for the ‘instance’ of the vector
Vectors where components reuse names
- $x_{t} (x_{t, G B}, x_{t, B L})$ : has code x_t.x_t_GB and x_t.x_t_BL, is read ‘x at t in x at t for GB, and, x at t in x at t for BL’
- the code implementation is by means of a named tuple
  - self.Decision = namedtuple('Decision', xVarNames) for the ‘class’ of the vector
  - self.x_t for the ‘instance’ of the vector
Use of mixed-case variable names
- to reduce confusion, sometimes the use of mixed-case variable names are preferred (even though it is not a best practice in the Python community), reserving the use of underscores and double underscores for math-related variables

1 BUSINESS UNDERSTANDING

The HR manager has to schedule resources for 3 shifts: Shift1, Shift2, and Shift3. For now, only Shift1 will be handled by the AI agent.

The number of resources of each type for each schedule slots for each day will be provided by the simulator. Only two resource types will be handled:

Courtesy
Stocker

The HR manager typically runs the AI Shift Scheduler 2 weeks into the future to produce a tentative schedule to publish for the team.

As demands for shift slot allocations come in, they are handled in the following way:

the candidates for the resource type must have their:
- $R_{t}^{A v a i l}$ be True
the specific resources are then marked for allocation considering the number of resources needed for the type
the state of the resources are then updated including the number of accumulated shifts
at the end of the shift all resources are made available again

The overall objective will be to maximize the cumulative reward.

2 DATA UNDERSTANDING

Based on recent market research, the demand may be modeled by a Poisson distribution for each resource type: $\begin{aligned} μ^{R e s o u r c e T y p e} & = SIM_MU_D [RESOURCE_TYPE] \end{aligned}$

So we have: $D_{t + 1}^{R e s o u r c e T y p e} \sim P o i s (μ^{R e s o u r c e T y p e})$

The decision window is 1 day and these simulations are for the daily demands for Shift1.

## import pdb
from collections import namedtuple, defaultdict
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from copy import copy
import time
from scipy.ndimage.interpolation import shift
import pickle
from bisect import bisect
import math
from pprint import pprint
import matplotlib as mpl
from certifi.core import where
pd.options.display.float_format = '{:,.4f}'.format
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)
! python --version

Python 3.10.12

DeprecationWarning: Please use `shift` from the `scipy.ndimage` namespace, the `scipy.ndimage.interpolation` namespace is deprecated.
  from scipy.ndimage.interpolation import shift

We will have the learnable parameters:

$(θ^{C u m S h i f t s}, θ^{S i c k P r o b})$

## PARAMETERS
## SNAMES = [ #state variable names
#     'RAvail_t', #available resource
#     'R_t',      #resource
#     'D_t',      #demand
# ]
## xNAMES = ['x_t'] #decision variable names

## TYPES = ['Courtesy']*7 + ['Stocker']*3 + ['Cleaner']*2 + ['Curbsider']*4
TYPES = ['Courtesy']*7 + ['Stocker']*3
print(f'\n{len(TYPES)=}')
print(TYPES)

RESOURCE_IDS = [str(i+1) for i in range(len(TYPES))]
## RESOURCE_TYPES = ['Courtesy', 'Stocker', 'Cleaner', 'Curbsider']
RESOURCE_TYPES = ['Courtesy', 'Stocker']
print(f'\n{len(RESOURCE_TYPES)=}')
print(RESOURCE_TYPES)

## *resource* attribute vectors
aNAMES = [tup[0]+'_'+tup[1] for tup in zip(RESOURCE_IDS, TYPES)]
print(f'{len(aNAMES)=}')
print(aNAMES)

## *demand* attribute vectors
bNAMES = RESOURCE_TYPES
print(f'\n{len(bNAMES)=}')
print(bNAMES)

## *decision* 'attribute' vectors
abNAMES = [] #to DEMAND b
for a in aNAMES:
  a0,a1 = a.split('_')
  for b in bNAMES:
    if(a1==b):
      abn = (a + '___' + b)
      abNAMES.append(abn)
print(f'\n{len(abNAMES)=}')
print(abNAMES)

piNAMES = ['X__Alloc'] #policy names
thNAMES = [ #theta names
  'thCumShifts',
  'thSickProb',
  ## 'thAvail'
]
print(f'\n{len(thNAMES)=}')
print(f'{thNAMES=}')

AVAILABILITIES_DOW_Shift1 = pd.DataFrame({
  'ResourceId': RESOURCE_IDS,
  'Type': TYPES,
  '0_Sunday':    [True,  False, False, False, False, True, False, True, True, True],
  '1_Monday':    [False, True,  True, True, True, False, True, True, True, False],
  '2_Tuesday':   [True, True, True,  True, False, True, True, True, True, False],
  '3_Wednesday': [False, True, True, True,  False, True, True, True, True, False],
  '4_Thursday':  [False, True, True, True, True,  False, False, False, True, True],
  '5_Friday':    [False, True, False, False, True, True,  False, True, True, False],
  '6_Saturday':  [False, False, True, True, False, False, True,  True, False, True],
})
print(f"AVAILABILITIES_DOW_Shift1:\n{AVAILABILITIES_DOW_Shift1}")

CAPACITIES = pd.DataFrame({
  'ResourceId': RESOURCE_IDS,
  'Type': TYPES,
  'Capacity': 2*AVAILABILITIES_DOW_Shift1.iloc[:, 2:].sum(axis=1) #2 weeks
})
print(f"CAPACITIES:\n{CAPACITIES}")

SEED_TRAIN = 77777777
SEED_EVALU = 88888888
N_SAMPLEPATHS = 100; L = N_SAMPLEPATHS
N_TRANSITIONS = 100; T = N_TRANSITIONS

TH_CumShifts_SPEC = (0, 1, .02)
TH_SickProb_SPEC = (0, 1, .02)
## TH_Avail_SPEC = (0, 1, .1)

SIM_T = 60
SIM_MU_D = {bNAMES[0]: 4, bNAMES[1]: 2}
print(f'\n{SIM_MU_D=}')
assert len(SIM_MU_D.items())==len(bNAMES)

## SIM_EVENT_TIME_D = {bNAMES[0]: None, bNAMES[1]: None, bNAMES[2]: None, bNAMES[3]: None}
SIM_EVENT_TIME_D = {bNAMES[0]: None, bNAMES[1]: None}
print(f'\n{SIM_EVENT_TIME_D=}')
assert len(SIM_EVENT_TIME_D.items())==len(bNAMES)

## SIM_MU_DELTA_D = {bNAMES[0]: None, bNAMES[1]: None, bNAMES[2]: None, bNAMES[3]: None}
SIM_MU_DELTA_D = {bNAMES[0]: None, bNAMES[1]: None}
print(f'\n{SIM_MU_DELTA_D=}')
assert len(SIM_MU_DELTA_D.items())==len(bNAMES)

START_DATE_TIME = '2023-10-29'
sd = pd.to_datetime(START_DATE_TIME)
assert sd.strftime('%a')=='Sun'
DATE_TIME_DELTA = '1day' ##also '1D', '1H', etc

SICK_PROBS = pd.DataFrame({
  'ResourceId': RESOURCE_IDS,
  'Type': TYPES,
  'January':   [.03, .10, .13, .09, .02, .08, .06, .09, .03, .11],
  'February':  [.03, .03, .01, .08, .02, .08, .01, .09, .09, .01],
  'March':     [.01, .10, .03, .01, .09, .01, .02, .01, .03, .11],
  'April':     [.02, .08, .03, .02, .09, .01, .08, .03, .09, .08],
  'May':       [.02, .01, .03, .07, .09, .02, .07, .07, .08, .09],
  'June':      [.09, .06, .02, .08, .04, .11, .07, .08, .03, .06],
  'July':      [.02, .06, .02, .03, .03, .07, .06, .03, .05, .05],
  'August':    [.01, .09, .09, .04, .05, .08, .03, .03, .02, .07],
  'September': [.01, .07, .00, .07, .02, .05, .04, .07, .07, .01],
  'October':   [.00, .03, .02, .02, .10, .01, .01, .03, .07, .03],
  'November':  [.00, .02, .03, .03, .11, .01, .01, .09, .08, .03],
  'December':  [.03, .10, .05, .09, .07, .02, .05, .03, .03, .06],
})
print(f"SICK_PROBS:\n{SICK_PROBS}")

## math parameters use 'math/small case' (as opposed to code parameters):

## CONTRIB_MATRIX = {}
# for an in aNAMES:
#     contribs = {}
#     for bn in bNAMES:
#       contribs[bn] = contribution(an, bn)
#     CONTRIB_MATRIX[an] = contribs
# CONTRIB_MATRIX


len(TYPES)=10
['Courtesy', 'Courtesy', 'Courtesy', 'Courtesy', 'Courtesy', 'Courtesy', 'Courtesy', 'Stocker', 'Stocker', 'Stocker']

len(RESOURCE_TYPES)=2
['Courtesy', 'Stocker']
len(aNAMES)=10
['1_Courtesy', '2_Courtesy', '3_Courtesy', '4_Courtesy', '5_Courtesy', '6_Courtesy', '7_Courtesy', '8_Stocker', '9_Stocker', '10_Stocker']

len(bNAMES)=2
['Courtesy', 'Stocker']

len(abNAMES)=10
['1_Courtesy___Courtesy', '2_Courtesy___Courtesy', '3_Courtesy___Courtesy', '4_Courtesy___Courtesy', '5_Courtesy___Courtesy', '6_Courtesy___Courtesy', '7_Courtesy___Courtesy', '8_Stocker___Stocker', '9_Stocker___Stocker', '10_Stocker___Stocker']

len(thNAMES)=2
thNAMES=['thCumShifts', 'thSickProb']
AVAILABILITIES_DOW_Shift1:
  ResourceId      Type  0_Sunday  1_Monday  2_Tuesday  3_Wednesday  \
0          1  Courtesy      True     False       True        False   
1          2  Courtesy     False      True       True         True   
2          3  Courtesy     False      True       True         True   
3          4  Courtesy     False      True       True         True   
4          5  Courtesy     False      True      False        False   
5          6  Courtesy      True     False       True         True   
6          7  Courtesy     False      True       True         True   
7          8   Stocker      True      True       True         True   
8          9   Stocker      True      True       True         True   
9         10   Stocker      True     False      False        False   

   4_Thursday  5_Friday  6_Saturday  
0       False     False       False  
1        True      True       False  
2        True     False        True  
3        True     False        True  
4        True      True       False  
5       False      True       False  
6       False     False        True  
7       False      True        True  
8        True      True       False  
9        True     False        True  
CAPACITIES:
  ResourceId      Type  Capacity
0          1  Courtesy         4
1          2  Courtesy        10
2          3  Courtesy        10
3          4  Courtesy        10
4          5  Courtesy         6
5          6  Courtesy         8
6          7  Courtesy         8
7          8   Stocker        12
8          9   Stocker        12
9         10   Stocker         6

SIM_MU_D={'Courtesy': 4, 'Stocker': 2}

SIM_EVENT_TIME_D={'Courtesy': None, 'Stocker': None}

SIM_MU_DELTA_D={'Courtesy': None, 'Stocker': None}
SICK_PROBS:
  ResourceId      Type  January  February  March  April    May   June   July  \
0          1  Courtesy   0.0300    0.0300 0.0100 0.0200 0.0200 0.0900 0.0200   
1          2  Courtesy   0.1000    0.0300 0.1000 0.0800 0.0100 0.0600 0.0600   
2          3  Courtesy   0.1300    0.0100 0.0300 0.0300 0.0300 0.0200 0.0200   
3          4  Courtesy   0.0900    0.0800 0.0100 0.0200 0.0700 0.0800 0.0300   
4          5  Courtesy   0.0200    0.0200 0.0900 0.0900 0.0900 0.0400 0.0300   
5          6  Courtesy   0.0800    0.0800 0.0100 0.0100 0.0200 0.1100 0.0700   
6          7  Courtesy   0.0600    0.0100 0.0200 0.0800 0.0700 0.0700 0.0600   
7          8   Stocker   0.0900    0.0900 0.0100 0.0300 0.0700 0.0800 0.0300   
8          9   Stocker   0.0300    0.0900 0.0300 0.0900 0.0800 0.0300 0.0500   
9         10   Stocker   0.1100    0.0100 0.1100 0.0800 0.0900 0.0600 0.0500   

   August  September  October  November  December  
0  0.0100     0.0100   0.0000    0.0000    0.0300  
1  0.0900     0.0700   0.0300    0.0200    0.1000  
2  0.0900     0.0000   0.0200    0.0300    0.0500  
3  0.0400     0.0700   0.0200    0.0300    0.0900  
4  0.0500     0.0200   0.1000    0.1100    0.0700  
5  0.0800     0.0500   0.0100    0.0100    0.0200  
6  0.0300     0.0400   0.0100    0.0100    0.0500  
7  0.0300     0.0700   0.0300    0.0900    0.0300  
8  0.0200     0.0700   0.0700    0.0800    0.0300  
9  0.0700     0.0100   0.0300    0.0300    0.0600

class DemandSimulator():
  def __init__(self,
    T__sim=SIM_T,
    muD=SIM_MU_D,
    eventTimeD=SIM_EVENT_TIME_D,
    muDeltaD=SIM_MU_DELTA_D,
    seed=None):
    self.time = 0
    self.T__sim = SIM_T
    self.muD = SIM_MU_D
    self.eventTimeD = SIM_EVENT_TIME_D
    self.muDeltaD = SIM_MU_DELTA_D
    self.prng = np.random.RandomState(seed)

  def simulate(self):
    if self.time > self.T__sim - 1:
      self.time = 0
    D_tt1 = {}
    for bn in bNAMES:
      if self.eventTimeD[bn] and self.time > self.eventTimeD[bn]: #event for entity
        D_tt1[bn] = self.muDeltaD[bn] + self.prng.poisson(self.muD[bn]) #after event
      else:
        D_tt1[bn] = self.prng.poisson(self.muD[bn])
    self.time += 1
    return {bn: max(0, D_tt1[bn]) for bn in bNAMES} #always positive

dem_sim = DemandSimulator(seed=1234)
DemandData = []
for i in range(SIM_T):
  d_sh1 = list(dem_sim.simulate().values())
  d_sh2 = list(dem_sim.simulate().values())
  DemandData.append(d_sh1 + d_sh2)
# labels = [f'{bn}_demand' for bn in bNAMES]
labels = [f'sh1_{bn}_dem' for bn in bNAMES] + [f'sh2_{bn}_dem' for bn in bNAMES]
df = pd.DataFrame.from_records(data=DemandData, columns=labels); df[:10]

	sh1_Courtesy_dem	sh1_Stocker_dem	sh2_Courtesy_dem	sh2_Stocker_dem
0	5	4	6	4
1	6	3	6	1
2	3	2	3	6
3	4	2	4	2
4	1	1	7	1
5	7	6	1	0
6	6	1	7	2
7	1	3	3	1
8	4	1	5	6
9	11	3	8	2

## PUT BACK: COMMENTED OUT TO BE FASTER
import random
def plot_output(df1, df2):
  n_charts = len(bNAMES)
  ylabelsize = 16
  mpl.rcParams['lines.linewidth'] = 1.2
  default_colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
  fig, axs = plt.subplots(n_charts, sharex=True)
  fig.set_figwidth(13); fig.set_figheight(9)
  fig.suptitle('Demand Simulation', fontsize=20)

  for i,bn in enumerate(bNAMES):
    axs[i].set_title(f'Demanded {bn}')
    axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
    axs[i].step(df1[f'sh1_{bn}_dem'], 'r-')
    axs[i].step(df1[f'sh2_{bn}_dem'], 'r:')
    ## axs[i].axhline(y=dem_sim.muD[e], color='k', linestyle=':')
    axs[i].axhline(y=0, color='k', linestyle=':')

  axs[i].set_xlabel('$t\ \mathrm{[daily\ windows]}$', rotation=0, ha='center', va='center', fontweight='bold', size=ylabelsize)
plot_output(df, None)

3 DATA PREPARATION

We will use the data provided by the simulator directly. There is no need to perform additional data preparation.

4 MODELING

4.1 Narrative

Please review the narrative in section 1. The next figure is a representation of the solution to the problem:

4.2 Core Elements

This section attempts to answer three important questions:

What metrics are we going to track?
What decisions do we intend to make?
What are the sources of uncertainty?

For this problem, the only metric we are interested in is the number of allocated shift slots for Shift1 after each decision window. The only source of uncertainty is the levels of demand for each of the resource types.

4.3 Mathematical Model | SUS Design

A Python class is used to implement the model for the SUS (System Under Steer):

class Model():
  def __init__(self, S_0_info):
    ...
    ...

4.3.1 State variables

The state variables represent what we need to know.

$R_{t}^{A v a i l} = (R_{t a}^{A v a i l})_{a \in A}$ where $A = {α_{1}, α_{2}, . . . α_{10}}$
- $R_{t a}^{A v a i l}$ = Boolean indicator for whether this resource (with attribute $a$ ), is available at $t$ for rental
- $α_{1}$ = 1_Courtesy
- $α_{2}$ = 2_Courtesy
- $α_{3}$ = 3_Courtesy
- …
- $α_{15}$ = 9_Stocker
- $α_{16}$ = 10_Stocker
$R_{t}^{C u m S h i f t s} = (R_{t a}^{C u m S h i f t s})_{a \in A}$ where $A = {α_{1}, α_{2}, . . . α_{10}}$
- $R_{t a}^{C u m S h i f t s}$ = Number of hours this resource (with attribute $a$ ), is tied up for rental at $t$
- $α_{1}$ = 1_Courtesy
- $α_{2}$ = 2_Courtesy
- $α_{3}$ = 3_Courtesy
- …
- $α_{15}$ = 9_Stocker
- $α_{16}$ = 10_Stocker
$D_{t}^{S h i f t 1} = (D_{t b}^{S h i f t 1})_{b \in B}$ where $B = {β_{1}, β_{2}}$
- $D_{t b}^{S h i f t 1}$ = Number of demands for this resource (with attribute $b$ ), at $t$
- $β_{1}$ = Courtesy
- $β_{1}$ = Stocker

The state is:

$\begin{aligned} S_{t} & = (R_{t}^{A v a i l}, R_{t}^{C u m S h i f t s}, D_{t}^{S h i f t 1}) \\ = (_{t a}^{A v a i l})_{a \in A}, (R_{t a}^{C u m S h i f t s})_{a \in A}, (D_{t b}^{S h i f t 1})_{b \in B}) \end{aligned}$

4.3.2 Decision variables

The decision variables represent what we control.

The decision vector is given by:

$x_{t} = (x_{t a b})_{a \in A, b \in B}$ where
- $A = {α_{1}, α_{2}, . . . α_{10}}$
- $B = {β_{1}, β_{2}}$
- $x_{t a b}$ is a boolean vector that indicates whether a specific resource is to be allocated to a demand
Decisions are made with a policy (TBD below):
- $X^{π} (S_{t})$

4.3.3 Exogenous information variables

The exogenous information variables represent what we did not know (when we made a decision). These are the variables that we cannot control directly. The information in these variables become available after we make the decision $x_{t}$ .

When we assume that the demand in each time period is revealed, without any model to predict the demand based on past demands, we have, using approach 1:

$\begin{aligned} D_{t + 1} & = W_{t + 1} \\ = {\hat{D}}_{t + 1} \end{aligned}$

Alternatively, when we assume that we observe the change in demand ${\hat{D}}_{t + 1} = p_{t + 1} - p_{t}$ , we have, using approach 2:

$\begin{aligned} D_{t + 1} & = D_{t} + W_{t + 1} \\ = D_{t} + {\hat{D}}_{t + 1} \end{aligned}$

We will make use of approach 1 which means that the exogenous information, $W_{t + 1}$ , is the directly observed demands of the resources.

The exogenous information is obtained by a call to

DemandSimulator.simulate(...)

4.3.4 Transition function

The transition function describe how the state variables evolve over time. We have the equations:

$R_{t + 1}^{A v a i l} = {\begin{cases} 1 & if resource with attribute a has not been allocated \\ 0 & if resource with attribute a has been allocated \end{cases}$

$R_{t + 1}^{C u m S h i f t s} = {\begin{cases} R_{t}^{C u m S h i f t s} + 1 & if resource was allocated \\ R_{t}^{C u m S h i f t s} & if resource was not allocated \end{cases}$

Collectively, they represent the general transition function:

$S_{t + 1} = S^{M} (S_{t}, X^{π} (S_{t}))$

4.3.5 Objective function

The objective function captures the performance metrics of the solution to the problem.

We can write the state-dependant reward (also called contribution due to the allocation of a resource with attribute $b$ ):

$C (S_{t}, x_{t}) = {\begin{cases} 1 & if resource was allocated \\ - 1 & if resource was not allocated \end{cases}$

We have the objective function:

$max_{π} E {\sum_{t = 0}^{T} C (S_{t}, x_{t}, W_{t + 1})}$

The learned parameters are:

$(θ^{C u m S h i f t s}, θ^{S i c k P r o b})$

Both these are weight factors between 0 and 1.

4.3.6 Implementation of the System Under Steer (SUS) Model

class Model():
  def __init__(self, W_fn=None, S__M_fn=None, C_fn=None):
    self.S_t = {
      'R_t': pd.DataFrame({
        'ResourceId': RESOURCE_IDS,
        'Type': TYPES,
        'RAvail_t': AVAILABILITIES_DOW_Shift1['0_Sunday'],
        'RCumShifts_t': [0]*len(TYPES), ##cumulative allocs (for T)
      }),
      'D_t': pd.DataFrame({
        'Type': RESOURCE_TYPES,
        'DShift1_t': [1]*len(RESOURCE_TYPES),
        'DShift2_t': [1]*len(RESOURCE_TYPES),
      }),
    }
    self.x_t = {
      'xAlloc_t': pd.DataFrame({
        'Comb': abNAMES, ##Combination
        'Allocd_t': [False]*len(abNAMES), ##Allocated
      }),
    }
    self.Ccum = 0.0 ##cumulative reward
    self.Ccum_CumShifts = 0.0
    self.Ccum_SickProb = 0.0

    self.Ucum_Total = 0 ##cumulative unallocated/unmet demands
    ##cumulative unallocated/unmet demands
    self.Ucum = {rt: 0 for rt in RESOURCE_TYPES}

  ## def reset(self):
  #   self.Ccum = 0.0
  #   self.Ucum = 0

  ## exogenous information, dependent on a random process,
  ## the directly observed demands
  def W_fn(self, t):
    return {'shift1': SIM.simulate(), 'shift2': SIM.simulate()}

  def update_Ccum(self, dt, S_t, x_t, theta):
    m1 = x_t['xAlloc_t'].merge(S_t['R_t'][['RCumShifts_t', 'RAvail_t']], left_index=True, right_index=True)

    m2 = m1.merge(SICK_PROBS[[dt.month_name()]], left_index=True, right_index=True)
    m2.rename(columns={dt.month_name(): 'SickProb'}, inplace=True)

    m2['Capacity'] = CAPACITIES.loc[:, ['Capacity']]
    m2['Ccum_CumShifts'] = -(m2['RCumShifts_t'] - m2['Capacity'])

    ## m2['Ccum_SickProb'] = -100*m2['SickProb']
    m2['Ccum_SickProb'] = -100*(m2['SickProb'] - m2['SickProb'].mean()) ##100 makes values more comparable with other component of Ccum

    ## m2['Ccum_Avail'] = m2['RAvail_t'].replace({True: 0, False: -10})

    summables = m2.loc[
      m2['Allocd_t'] == True,
      ['Ccum_CumShifts', 'Ccum_SickProb'],
      ## ['Ccum_CumShifts', 'Ccum_SickProb', 'Ccum_Avail']
    ]
    Ccum_CumShifts, Ccum_SickProb = summables.sum(axis=0)
    ## Ccum_CumShifts, Ccum_SickProb, Ccum_Avail = summables.sum(axis=0)
    Ccum = \
      theta.thCumShifts*Ccum_CumShifts + \
      theta.thSickProb*Ccum_SickProb
      ## theta.thAvail*Ccum_Avail
    self.Ccum_CumShifts += Ccum_CumShifts
    self.Ccum_SickProb += Ccum_SickProb
    self.Ccum += Ccum

  def performAllocDecision(self, S_t, x_t, theta):
    ## find list of ResourceIds for allocs from x_t
    resourceIds = x_t['xAlloc_t'].loc[
      x_t['xAlloc_t']['Allocd_t']==True,
      ['Comb']
    ]['Comb'].str.split('_').str[:1].tolist(); ##print(f'{resourceIds=}')
    resourceIds_flat = [e[0] for e in resourceIds]; ##print(f'{resourceIds_flat=}')

    ## update state of allocs
    S_t['R_t'].loc[
      S_t['R_t']['ResourceId'].isin(resourceIds_flat),
      ['RAvail_t']
    ] = False
    S_t['R_t'].loc[
      S_t['R_t']['ResourceId'].isin(resourceIds_flat),
      ['RCumShifts_t']
    ] += 1

    ## update Ccum with allocations
    ## self.Ccum += len(resourceIds_flat) #number of allocations

  def S__M_fn(self, t, dt, S_t, x_t, W_tt1, theta):
    ## perform decision taken this morning
    self.performAllocDecision(S_t, x_t, theta)

    ## D_t #direct approach
    for rt in RESOURCE_TYPES:
      sh1_demands = W_tt1['shift1'][rt]
      sh2_demands = W_tt1['shift2'][rt]
      S_t['D_t'].loc[S_t['D_t']['Type']==rt, 'DShift1_t'] = sh1_demands
      S_t['D_t'].loc[S_t['D_t']['Type']==rt, 'DShift2_t'] = sh2_demands

    ## Update availabilities of all resources
    dow = (t + 1)%7
    mask = AVAILABILITIES_DOW_Shift1.columns.str.contains(str(dow) + '_')
    dow_col = AVAILABILITIES_DOW_Shift1.loc[:, mask]
    S_t['R_t']['RAvail_t'] = AVAILABILITIES_DOW_Shift1[dow_col.columns[0]]

    record_t = [t, dt] + \
      list(S_t['R_t']['RAvail_t']) + \
      list(S_t['R_t']['RCumShifts_t']) + \
      list(S_t['D_t']['DShift1_t']) + \
      [self.Ucum[rt] for rt in RESOURCE_TYPES] + \
      [self.Ucum_Total] + \
      [self.Ccum_CumShifts] + \
      [self.Ccum_SickProb] + \
      [self.Ccum] + \
      list(x_t['xAlloc_t']['Allocd_t'])
    return record_t

  def C_fn(self, S_t, x_t, W_tt1, theta):
    return

  def step(self, t, dt, theta):
    ## IND = '\t\t'
    ## print(f"{IND}..... M. step() .....\n{t=}\n{theta=}")
    W_tt1 = self.W_fn(t); ##print(f'{W_tt1=}')

    ## update state & reward
    record_t = self.S__M_fn(t, dt, self.S_t, self.x_t, W_tt1, theta)
    return record_t

4.4 Uncertainty Model

We will simulate the rental demand vector $D_{t + 1}^{S h i f t 1}$ as described in section 2.

4.5 Policy Design

There are two main meta-classes of policy design. Each of these has two subclasses: - Policy Search - Policy Function Approximations (PFAs) - Cost Function Approximations (CFAs) - Lookahead - Value Function Approximations (VFAs) - Direct Lookaheads (DLAs)

In this project we will only use one approach: - A simple allocate parameterized policy (from the PFA class), called X__Alloc.

4.5.1 Implementation of Policy Design

import random
class Policy():
  def __init__(self, model):
    self.model = model
    self.Policy = namedtuple('Policy', piNAMES) # 'class'
    self.Theta = namedtuple('Theta', thNAMES) # 'class'

  def build_policy(self, info):
    return self.Policy(*[info[pin] for pin in piNAMES])

  def build_theta(self, info):
    return self.Theta(*[info[thn] for thn in thNAMES])

  def X__Alloc(self, t, dt, S_t, x_t, theta):
    ## print(f"\n..... Policy.X__Alloc() .....\n{t=}")
    sh1_demandsToService = []
    sh2_demandsToService = []
    for rt in RESOURCE_TYPES:
      number = S_t['D_t'].loc[
        S_t['D_t']['Type']==rt,
        ['DShift1_t']
      ].squeeze()
      sh1_demandsToService.append((rt, number))
      number = S_t['D_t'].loc[
        S_t['D_t']['Type']==rt,
        ['DShift2_t']
      ].squeeze()
      sh2_demandsToService.append((rt, number))
    ## print(f"sh1_demandsToService:\n{sh1_demandsToService}")
    for demand in sh1_demandsToService:
      resourceType, number = demand; ##print(f'{resourceType=}, {number=}')
      candidates = S_t['R_t'].loc[
        (S_t['R_t']['Type'] == resourceType) & \
        (S_t['R_t']['RAvail_t'] == True),
        ['ResourceId', 'Type', 'RAvail_t', 'RCumShifts_t']
      ]; ##print(f'candidates=\n{candidates}')

      candidates = candidates.sample(frac = 1) #shuffle rows; now opt/non have different x's

      if len(candidates) > 0 and number > 0:
        if len(candidates) >= number:
          allocs = candidates.iloc[0:number, :]; ##print(f'allocs=\n{allocs}') #pick first 'number' avails
        elif len(candidates) < number:
          ##print('%%% NOT ENOUGH candidates')
          allocs = copy(candidates); ##print(f'allocs=\n{allocs}') #pick all candidates
          unallocated_demands = number - len(candidates) ##unmet demands
          ## self.model.Ccum -= unallocated_demands
          self.model.Ucum_Total += unallocated_demands
          self.model.Ucum[resourceType] += unallocated_demands
        if len(allocs) == 0:
          x_t['xAlloc_t'].loc[
            x_t['xAlloc_t']['Comb'].str.contains(resourceType),
            ['Allocd_t']
          ] = False
        else:
          x_t['xAlloc_t'].loc[ ##remove all allocs before adding new allocs
            x_t['xAlloc_t']['Comb'].str.contains(resourceType),
            ['Allocd_t']
          ] = False
          x_t['xAlloc_t'].loc[
            x_t['xAlloc_t']['Comb'].apply(lambda x: x.split("_")[0]).isin(allocs['ResourceId']),
            ['Allocd_t']
          ] = True
        self.model.update_Ccum(dt, S_t, x_t, theta)
      else:
        ## print(f'%%% Shift has no resource for demand {demand}, or demand is 0.')
        x_t['xAlloc_t'].loc[
          x_t['xAlloc_t']['Comb'].str.contains(resourceType),
          ['Allocd_t']
        ] = False
      ## print(f"x_t['xAlloc_t']:\n{x_t['xAlloc_t']}")

  def run_grid_sample_paths(self, theta, piName, record):
    CcumIomega__lI = []
    for l in range(1, L + 1): #for each sample-path
      M = Model()
      ## P = Policy(M) #NO!, overwrite existing global P
      self.model = M
      record_l = [piName, theta, l]
      dt = pd.to_datetime(START_DATE_TIME)
      dt_delta = pd.Timedelta(DATE_TIME_DELTA)
      for t in range(T): #for each transition/step
        ## print(f'\t%%% {t=}')
        ## >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
        getattr(self, piName)(t, dt, self.model.S_t, self.model.x_t, theta) #lookup (new) today's decision

        ## sit in post-decision state until end of cycle (evening)

        ## S_t, Ccum, x_t = self.model.step(t, x_t, theta)
        record_t = self.model.step(t, dt, theta) #change from today to tomorrow
        ## >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
        record.append(record_l + record_t)
        dt = dt + dt_delta
      CcumIomega__lI.append(self.model.Ccum) #just above (SDAM-eq2.9)
    return CcumIomega__lI

  def perform_grid_search_sample_paths(self, piName, thetas):
    Cbarcum = defaultdict(float)
    Ctilcum = defaultdict(float)
    expCbarcum = defaultdict(float)
    expCtilcum = defaultdict(float)
    numThetas = len(thetas)
    record = []
    print(f'{numThetas=:,}')
    nth = 1
    i = 0; print(f'... printing every {nth}th theta (if considered) ...')
    for theta in thetas:
      if True: ##in case relationships between thetas can be exploited
        ## a dict cannot be used as a key, so we define theta_key, e.g.
        ## theta_key = ((168.0, 72.0), (200.0, 90.0)):
        ## theta_key = tuple(tuple(itm.values()) for itm in theta)
        theta_key = theta ##if theta is not a dict
        if i%nth == 0: print(f'=== ({i:,} / {numThetas:,}), theta={theta} ===')

        ## >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
        CcumIomega__lI = self.run_grid_sample_paths(theta, piName, record)
        ## >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

        Cbarcum_tmp = np.array(CcumIomega__lI).mean() #(SDAM-eq2.9)
        Ctilcum_tmp = np.sum(np.square(np.array(CcumIomega__lI) - Cbarcum_tmp))/(L - 1)

        Cbarcum[theta_key] = Cbarcum_tmp
        Ctilcum[theta_key] = np.sqrt(Ctilcum_tmp/L)

        expCbarcum_tmp = pd.Series(CcumIomega__lI).expanding().mean()
        expCbarcum[theta_key] = expCbarcum_tmp

        expCtilcum_tmp = pd.Series(CcumIomega__lI).expanding().std()
        expCtilcum[theta_key] = expCtilcum_tmp
        i += 1
      ##endif
    best_theta = max(Cbarcum, key=Cbarcum.get)
    worst_theta = min(Cbarcum, key=Cbarcum.get)

    best_Cbarcum = Cbarcum[best_theta]
    best_Ctilcum = Ctilcum[best_theta]

    worst_Cbarcum = Cbarcum[worst_theta]
    worst_Ctilcum = Ctilcum[worst_theta]

    thetaStar_expCbarcum = expCbarcum[best_theta]
    thetaStar_expCtilcum = expCtilcum[best_theta]
    thetaStar_expCtilcum[0] = 0 ##set NaN to 0

    ## best_theta_w_names = tuple((
    #   ({
    #     a1NAMES[0]: subvec[0],
    #     a1NAMES[1]: subvec[1]
    #   })) for subvec in best_theta)
    ## best_theta_n = self.build_theta({'thAdj': best_theta_w_names[0]})
    ## best_theta_n = self.build_theta({'thAdj1': best_theta_w_names[0], 'thAdj3': best_theta_w_names[1]})
    ## print(f'best_theta_n:\n{best_theta_n}\n{best_Cbarcum=:.2f}\n{best_Ctilcum=:.2f}')

    ## worst_theta_w_names = tuple((
    #   ({
    #     a1NAMES[0]: subvec[0],
    #     a1NAMES[1]: subvec[1]})) for subvec in worst_theta)
    ## worst_theta_n = self.build_theta({'thAdj': worst_theta_w_names[0]})
    ## worst_theta_n = self.build_theta({'thAdj1': worst_theta_w_names[0], 'thAdj3': worst_theta_w_names[1]})
    ## print(f'worst_theta_n:\n{worst_theta_n}\n{worst_Cbarcum=:.2f}\n{worst_Ctilcum=:.2f}')

    return \
      thetaStar_expCbarcum, thetaStar_expCtilcum, \
      Cbarcum, Ctilcum, \
      best_theta, worst_theta, \
      best_Cbarcum, worst_Cbarcum, \
      best_Ctilcum, worst_Ctilcum, \
      record

  ## dispatch {prepend @}
  # def grid_search_theta_values(self, thetas0): #. using vectors reduces loops in perform_grid_search_sample_paths()
  #     thetas = [(th0,) for th0 in thetas0]
  #     return thetas

  ## dispatch {prepend @}
  # def grid_search_theta_values(self, thetas0, thetas1): #. using vectors reduces loops in perform_grid_search_sample_paths()
  #     thetas = [(th0, th1) for th0 in thetas0 for th1 in thetas1]
  #     return thetas

  ## dispatch {prepend @}
  # def grid_search_theta_values(self, thetas0, thetas1, thetas2): #. using vectors reduces loops in perform_grid_search_sample_paths()
  #     thetas = [(th0, th1, th2) for th0 in thetas0 for th1 in thetas1 for th2 in thetas2]
  #     return thetas

  ## EXAMPLE:
  ## thetasA: Buy
  ## thetasA_name: 'thBuy'
  ## names: ELA
  ## 1_1: 1 theta sub-vectors, each having 1 theta
  ## thetas = grid_search_thetas_1_2(thetasBuy 'thBuy', CAR_TYPES)
  def grid_search_thetas_1_1(self, thetasA, thetasA_name, names):
    thetas = [
    self.build_theta({thetasA_name: {names[0]: thA0}})
    for thA0 in thetasA[names[0]]
    ]
    return thetas

  ## EXAMPLE:
  ## thetasA: Buy
  ## thetasA_name: 'thBuy'
  ## names: ELA, SON
  ## 1_2: 1 theta sub-vectors, each having 2 thetas
  ## thetas = grid_search_thetas_1_2(thetasBuy 'thBuy', CAR_TYPES)
  def grid_search_thetas_1_2(self, thetasA, thetasA_name, names):
    thetas = [
    self.build_theta({thetasA_name: {names[0]: thA0, names[1]: thA1}})
    for thA0 in thetasA[names[0]]
      for thA1 in thetasA[names[1]]
    ]
    return thetas

  ## EXAMPLE:
  ## thetasA: Adj
  ## thetasA_name: 'thAdj'
  ## names: ELA, SON
  ## 1_4: 1 theta sub-vectors, each having 4 thetas
  ## thetas = grid_search_thetas_1_4(thetasBuy 'thAdj', bNAMES)
  def grid_search_thetas_1_4(self, thetasA, thetasA_name, names):
    thetas = [
    self.build_theta({thetasA_name: {names[0]: thA0, names[1]: thA1, names[2]: thA2, names[3]: thA3}})
    for thA0 in thetasA[names[0]]
      for thA1 in thetasA[names[1]]
        for thA2 in thetasA[names[2]]
          for thA3 in thetasA[names[3]]
    ]
    return thetas

  ## EXAMPLE:
  ## thetasA: Buy
  ## thetasB: Max
  ## thetasA_name: 'thBuy'
  ## thetasB_name: 'thMax'
  ## names: ELA
  ## 2_1: 2 theta sub-vectors, each having 1 theta
  ## thetas = grid_search_thetas_2_1(thetasBuy, thetasMax, 'thBuy', 'thMax', CAR_TYPES)
  def grid_search_thetas_2_1(self, thetasA, thetasB, thetasA_name, thetasB_name, names):
    thetas = [
    self.build_theta({thetasA_name: {names[0]: thA0}, thetasB_name: {names[0]: thB0}})
    for thA0 in thetasA[names[0]]
      for thB0 in thetasB[names[0]]
    ]
    return thetas

  ## EXAMPLE:
  ## thetasA: Buy
  ## thetasB: Max
  ## thetasA_name: 'thBuy'
  ## thetasB_name: 'thMax'
  ## names: ELA, SON
  ## 2_2: 2 theta sub-vectors, each having 2 thetas
  ## thetas = grid_search_thetas_4(thetasBuy, thetasMax, 'thBuy', 'thMax', CAR_TYPES)
  def grid_search_thetas_2_2(self, thetasA, thetasB, thetasA_name, thetasB_name, names):
    thetas = [
    self.build_theta({thetasA_name: {names[0]: thA0, names[1]: thA1}, thetasB_name: {names[0]: thB0, names[1]: thB1}})
    for thA0 in thetasA[names[0]]
      for thA1 in thetasA[names[1]]
        for thB0 in thetasB[names[0]]
          for thB1 in thetasB[names[1]]
    ]
    return thetas


  ############################################################################
  ### PLOTTING
  ############################################################################
  def round_theta(self, complex_theta):
    thetas_rounded = []
    for theta in complex_theta:
      evalues_rounded = []
      for _, evalue in theta.items():
        evalues_rounded.append(float(f"{evalue:f}"))
      thetas_rounded.append(tuple(evalues_rounded))
    return str(tuple(thetas_rounded))

  def plot_Fhat_map_2(self,
      FhatI_theta_I,
      thetasX, thetasY, labelX, labelY, title):
      Fhat_values = [
        FhatI_theta_I[
          (thetaX,thetaY)
          ## ((thetaX,),(thetaY,))
        ]
          for thetaY in thetasY for thetaX in thetasX
      ]
      Fhats = np.array(Fhat_values)
      increment_count = len(thetasX)
      Fhats = np.reshape(Fhats, (-1, increment_count))#.

      fig, ax = plt.subplots()
      im = ax.imshow(Fhats, cmap='hot', origin='lower', aspect='auto')
      ## create colorbar
      cbar = ax.figure.colorbar(im, ax=ax)
      ## cbar.ax.set_ylabel(cbarlabel, rotation=-90, va="bottom")

      ax.set_xticks(np.arange(0, len(thetasX), 5))#.
      ## ax.set_xticks(np.arange(len(thetasX)))

      ax.set_yticks(np.arange(0, len(thetasY), 5))#.
      ## ax.set_yticks(np.arange(len(thetasY)))

      ## NOTE: round tick labels, else very messy
      ## function round() does not work, have to do this way
      thetasX_form = [f'{th:.0f}' for th in thetasX]
      thetasY_form = [f'{th:.0f}' for th in thetasY]

      ax.set_xticklabels(thetasX[::5])
      ## ax.set_xticklabels(thetasX); ax.set_xticklabels(thetasX_form)

      ax.set_yticklabels(thetasY[::5])
      ## ax.set_yticklabels(thetasY); ax.set_yticklabels(thetasY_form)

      ## rotate the tick labels and set their alignment.
      ## plt.setp(ax.get_xticklabels(), rotation=45, ha="right",rotation_mode="anchor")

      ax.set_title(title)
      ax.set_xlabel(labelX)
      ax.set_ylabel(labelY)

      ## fig.tight_layout()
      plt.show()
      return True

  def plot_Fhat_map_4(self,
      FhatI_theta_I,
      thetasX, thetasY, labelX, labelY, title,
      thetaFixed1, thetaFixed2):
      ## Fhat_values = [FhatI_theta_I[(thetaX,thetaY)] for thetaY in thetasY for thetaX in thetasX]
      Fhat_values = [
        FhatI_theta_I[((thetaX,thetaY), (thetaFixed1,thetaFixed2))]
        for thetaY in thetasY
          for thetaX in thetasX]
      Fhats = np.array(Fhat_values)
      increment_count = len(thetasX)
      Fhats = np.reshape(Fhats, (-1, increment_count))#.

      fig, ax = plt.subplots()
      im = ax.imshow(Fhats, cmap='hot', origin='lower', aspect='auto')
      ## create colorbar
      cbar = ax.figure.colorbar(im, ax=ax)
      ## cbar.ax.set_ylabel(cbarlabel, rotation=-90, va="bottom")

      ax.set_xticks(np.arange(0, len(thetasX), 5))#.
      ## ax.set_xticks(np.arange(len(thetasX)))

      ax.set_yticks(np.arange(0, len(thetasY), 5))#.
      ## ax.set_yticks(np.arange(len(thetasY)))

      ## NOTE: round tick labels, else very messy
      ## function round() does not work, have to do this way
      ## thetasX_form = [f'{th:.1f}' for th in thetasX]
      ## thetasY_form = [f'{th:.1f}' for th in thetasY]

      ax.set_xticklabels(thetasX[::5])#.
      ## ax.set_xticklabels(thetasX)
      ## ax.set_xticklabels(thetasX_form)

      ax.set_yticklabels(thetasY[::5])#.
      ## ax.set_yticklabels(thetasY)
      ## ax.set_yticklabels(thetasY_form)

      ## rotate the tick labels and set their alignment.
      ## plt.setp(ax.get_xticklabels(), rotation=45, ha="right",rotation_mode="anchor")

      ax.set_title(title)
      ax.set_xlabel(labelX)
      ax.set_ylabel(labelY)

      ## fig.tight_layout()
      plt.show()
      return True

  ## color_style examples: 'r-', 'b:', 'g--'
  def plot_Fhat_chart(self, FhatI_theta_I, thetasX, labelX, labelY, title, color_style, thetaStar):
      mpl.rcParams['lines.linewidth'] = 1.2
      xylabelsize = 16
      ## plt.figure(figsize=(13, 9))
      plt.figure(figsize=(13, 4))
      plt.title(title, fontsize=20)
      Fhats = FhatI_theta_I.values()
      plt.plot(thetasX, Fhats, color_style)
      plt.axvline(x=thetaStar, color='k', linestyle=':')
      plt.xlabel(labelX, rotation=0, labelpad=10, ha='right', va='center', fontweight='bold', size=xylabelsize)
      plt.ylabel(labelY, rotation=0, labelpad=1, ha='right', va='center', fontweight='normal', size=xylabelsize)
      plt.show()

  ## expanding Fhat chart
  def plot_expFhat_chart(self, df, labelX, labelY, title, color_style):
    mpl.rcParams['lines.linewidth'] = 1.2
    xylabelsize = 16
    plt.figure(figsize=(13, 4))
    plt.title(title, fontsize=20)
    plt.plot(df, color_style)
    plt.xlabel(labelX, rotation=0, labelpad=10, ha='right', va='center', fontweight='bold', size=xylabelsize)
    plt.ylabel(labelY, rotation=0, labelpad=1, ha='right', va='center', fontweight='normal', size=xylabelsize)
    plt.show()

  ## expanding Fhat charts
  def plot_expFhat_charts(self, means, stdvs, labelX, labelY, suptitle, pars=defaultdict(str)):
    n_charts = 2
    xlabelsize = 14
    ylabelsize = 14
    mpl.rcParams['lines.linewidth'] = 1.2
    default_colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
    fig, axs = plt.subplots(n_charts, sharex=True)
    fig.set_figwidth(13); fig.set_figheight(9)
    fig.suptitle(suptitle, fontsize=18)

    xi = 0
    legendlabels = []
    axs[xi].set_title(r"$exp\bar{C}^{cum}(\theta^*)$", loc='right', fontsize=16)
    for i,itm in enumerate(means.items()):
      axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
      leg = axs[xi].plot(itm[1], color=pars['colors'][i])
      legendlabels.append(itm[0])
    axs[xi].set_ylabel(labelY, rotation=0, ha='right', va='center', fontweight='normal', size=ylabelsize)

    xi = 1
    axs[xi].set_title(r"$exp\tilde{C}^{cum}(\theta^*)$", loc='right', fontsize=16)
    for i,itm in enumerate(stdvs.items()):
      axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
      ## leg = axs[xi].plot(itm[1], default_colors[i], linestyle='--')
      leg = axs[xi].plot(itm[1], pars['colors'][i], linestyle='--')
    axs[xi].set_ylabel(labelY, rotation=0, ha='right', va='center', fontweight='normal', size=ylabelsize)

    fig.legend(
      ## [leg],
      labels=legendlabels,
      title="Policies",
      loc='upper right',
      fancybox=True,
      shadow=True,
      ncol=1)
    plt.xlabel(labelX, rotation=0, labelpad=10, ha='right', va='center', fontweight='normal', size=xlabelsize)
    plt.show()

  def plot_records(self, df, df_non, pars=defaultdict(str)):
    n_a = len(aNAMES)
    n_b = len(bNAMES)
    n_ab = len(abNAMES)
    n_x = 1
    n_charts = n_ab + n_b + n_a + n_b + 1 + 1 + 2
    ylabelsize = 14
    mpl.rcParams['lines.linewidth'] = 1.2
    mycolors = ['g', 'b']
    fig, axs = plt.subplots(n_charts, sharex=True)
    ## fig.set_figwidth(13); fig.set_figheight(9)
    fig.set_figwidth(13); fig.set_figheight(20)
    fig.suptitle(pars['suptitle'], fontsize=14)
    WHERE = 'post' #'pre' #'post'

    xi = 0
    for i,ab in enumerate(abNAMES):
      axs[xi+i].set_ylim(auto=True); axs[xi+i].spines['top'].set_visible(False); axs[xi+i].spines['right'].set_visible(True); axs[xi+i].spines['bottom'].set_visible(False)
      axs[xi+i].step(df[f'Allocd_t_{ab}'], 'm-', where=WHERE)
      if not df_non is None: axs[xi+i].step(df_non[f'Allocd_t_{ab}'], 'c-.', where=WHERE)
      axs[xi+i].axhline(y=0, color='k', linestyle=':')
      abl = ab.split("___")
      al = abl[0].split('_'); al = al[0]+'\_'+al[1]; bl = abl[1]
      y1ab = '$x^{Allocd}_{t,'+f'{",".join([al, bl])}'+'}$'
      axs[xi+i].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      for j in range(df.shape[0]//T): axs[xi+i].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = n_ab
    for i,b in enumerate(bNAMES):
      y1ab = '$D^{Shift1}_{t,'+f'{b}'+'}$'
      axs[xi+i].set_ylim(auto=True); axs[xi+i].spines['top'].set_visible(False); axs[xi+i].spines['right'].set_visible(True); axs[xi+i].spines['bottom'].set_visible(False)
      axs[xi+i].step(df[f'DShift1_t_{b}'], mycolors[xi%len(mycolors)], where=WHERE)
      if not df_non is None: axs[xi+i].step(df_non[f'DShift1_t_{b}'], 'c-.', where=WHERE)
      axs[xi+i].axhline(y=SIM.muD[b], color='k', linestyle=':')
      axs[xi+i].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      for j in range(df.shape[0]//T): axs[xi+i].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = n_ab + n_b
    for i,a in enumerate(aNAMES):
      axs[xi+i].set_ylim(auto=True); axs[xi+i].spines['top'].set_visible(False); axs[xi+i].spines['right'].set_visible(True); axs[xi+i].spines['bottom'].set_visible(False)
      axs[xi+i].step(df[f'RAvail_t_{a}'], 'r:', where=WHERE)
      axs[xi+i].step(df[f'RCumShifts_t_{a}'], 'm-', where=WHERE)
      ## if not df_non is None: axs[xi+i].step(df_non[f'RAvail_t_{a}'], 'c-.', where=WHERE)
      if not df_non is None: axs[xi+i].step(df_non[f'RCumShifts_t_{a}'], 'c-.', where=WHERE)
      axs[xi+i].axhline(y=0, color='k', linestyle=':')
      cap = CAPACITIES.loc[CAPACITIES['ResourceId'] == a.split('_')[0], 'Capacity'].values[0]
      axs[xi+i].axhline(y=cap, color='g', linestyle=':')
      al = a.split('_'); al = al[0]+'\_'+al[1]; y1ab = '$R^{CumShifts}_{t,'+f'{al}'+'}$'
      axs[xi+i].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      for j in range(df.shape[0]//T): axs[xi+i].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = n_ab + n_b + n_a
    Ucum_cols = ['Ucum_'+bn for bn in bNAMES]
    ymax = max(list(df[Ucum_cols].max()))
    for i,b in enumerate(bNAMES):
      y1ab = '$U^{cum}_{'+f'{b}'+'}$'
      axs[xi+i].set_ylim(0, ymax); axs[xi+i].spines['top'].set_visible(False); axs[xi+i].spines['right'].set_visible(True); axs[xi+i].spines['bottom'].set_visible(False)
      axs[xi+i].step(df[f'Ucum_{b}'], 'r-', where=WHERE)
      if not df_non is None: axs[xi+i].step(df_non[f'Ucum_{b}'], 'c-.', where=WHERE)
      axs[xi+i].axhline(y=ymax, color='r', linestyle=':')
      axs[xi+i].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      for j in range(df.shape[0]//T): axs[xi+i].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = n_ab + n_b + n_a + n_b
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df['Ucum_Total'], 'k-', where=WHERE)
    if not df_non is None: axs[xi].step(df_non['Ucum_Total'], 'c-.', where=WHERE)
    axs[xi].axhline(y=0, color='k', linestyle=':')
    axs[xi].set_ylabel('$U^{cum}_{Total}$'+'\n'+''+'$\mathrm{[Allocs]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
    for j in range(df.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = n_ab + n_b + n_a + n_b + 1
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df['Ccum_CumShifts'], 'm-', where=WHERE)
    if not df_non is None: axs[xi].step(df_non['Ccum_CumShifts'], 'c-.', where=WHERE)
    axs[xi].axhline(y=0, color='k', linestyle=':')
    axs[xi].set_ylabel('$C^{cum}_{CumShifts}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
    for j in range(df.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = n_ab + n_b + n_a + n_b + 2
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df['Ccum_SickProb'], 'm-', where=WHERE)
    if not df_non is None: axs[xi].step(df_non['Ccum_SickProb'], 'c-.', where=WHERE)
    axs[xi].axhline(y=0, color='k', linestyle=':')
    axs[xi].set_ylabel('$C^{cum}_{SickProb}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
    for j in range(df.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = n_ab + n_b + n_a + n_b + 3
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df['Ccum'], 'm-', where=WHERE)
    ## axs[xi].step(df['Ucum_Total'], 'k-', where=WHERE)
    if not df_non is None: axs[xi].step(df_non['Ccum'], 'c-.', where=WHERE)
    axs[xi].axhline(y=0, color='k', linestyle=':')
    axs[xi].set_ylabel('$C^{cum}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
    for j in range(df.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    axs[xi].set_xlabel('$t\ \mathrm{[decision\ windows]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
    if(pars['legendLabels']): fig.legend(labels=pars['legendLabels'], loc='lower left', fontsize=16)

  def plot_evalu_comparison(self, df1, df2, df3, pars=defaultdict(str)):
    legendlabels = ['X__BuyBelow', 'X__Bellman']
    n_charts = 5
    ylabelsize = 14
    mpl.rcParams['lines.linewidth'] = 1.2
    fig, axs = plt.subplots(n_charts, sharex=True)
    fig.set_figwidth(13); fig.set_figheight(9)
    thetaStarStr = []
    for cmp in pars["thetaStar"]: thetaStarStr.append(f'{cmp:.1f}')
    thetaStarStr = '(' + ', '.join(thetaStarStr) + ')'
    fig.suptitle(pars['suptitle'], fontsize=14)

    xi = 0
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df1[f'x_t'], 'r-', where='post')
    axs[xi].step(df2[f'x_t'], 'g-.', where='post')
    axs[xi].step(df3[f'x_t'], 'b:', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    y1ab = '$x_{t}$'
    axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
    for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = 1
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df1[f'R_t'], 'r-', where='post')
    axs[xi].step(df2[f'R_t'], 'g-.', where='post')
    axs[xi].step(df3[f'R_t'], 'b:', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    y1ab = '$R_{t}$'
    axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
    for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = 2
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df1[f'p_t'], 'r-', where='post')
    axs[xi].step(df2[f'p_t'], 'g-.', where='post')
    axs[xi].step(df3[f'p_t'], 'b:', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')

    if(pars['lower_non']): axs[xi].text(-4, pars['lower_non'], r'$\theta^{lower}$' + f"={pars['lower_non']:.1f}", size=10, color='c')
    if(pars['lower_non']): axs[xi].axhline(y=pars['lower_non'], color='c', linestyle=':')

    if(pars['upper_non']): axs[xi].text(-4, pars['upper_non'], r'$\theta^{upper}$' + f"={pars['upper_non']:.1f}", size=10, color='c')
    if(pars['upper_non']): axs[xi].axhline(y=pars['upper_non'], color='c', linestyle=':')

    if(pars['lower']): axs[xi].text(-4, pars['lower'], r'$\theta^{lower}$' + f"={pars['lower']:.1f}", size=10, color='m')
    if(pars['lower']): axs[xi].axhline(y=pars['lower'], color='m', linestyle=':')

    if(pars['upper']): axs[xi].text(-4, pars['upper'], r'$\theta^{upper}$' + f"={pars['upper']:.1f}", size=10, color='m')
    if(pars['upper']): axs[xi].axhline(y=pars['upper'], color='m', linestyle=':')

    if(pars['alpha_non']): axs[xi].text(-4, pars['alpha_non'], r'$\theta^{alpha}$' + f"={pars['alpha_non']:.1f}", size=10, color='c')
    if(pars['alpha_non']): axs[xi].axhline(y=pars['alpha_non'], color='c', linestyle=':')

    if(pars['trackSignal_non']): axs[xi].text(-4, pars['trackSignal_non'], r'$\theta^{trackSignal}$' + f"={pars['trackSignal_non']:.1f}", size=10, color='c')
    if(pars['trackSignal_non']): axs[xi].axhline(y=pars['trackSignal_non'], color='c', linestyle=':')

    if(pars['alpha']): axs[xi].text(-4, pars['alpha'], r'$\theta^{alpha}$' + f"={pars['alpha']:.1f}", size=10, color='m')
    if(pars['alpha']): axs[xi].axhline(y=pars['alpha'], color='m', linestyle=':')

    if(pars['trackSignal']): axs[xi].text(-4, pars['trackSignal'], r'$\theta^{trackSignal}$' + f"={pars['trackSignal']:.1f}", size=10, color='m')
    if(pars['trackSignal']): axs[xi].axhline(y=pars['trackSignal'], color='m', linestyle=':')

    y1ab = '$p_{t}$'
    axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
    for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = 3
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df1['b_t_val'], 'r-', where='post')
    axs[xi].step(df2['b_t_val'], 'g-.', where='post')
    axs[xi].step(df3['b_t_val'], 'b:', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    y1ab = '$b_{t,val}$'
    axs[xi].set_ylabel(y1ab, rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
    for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    xi = 4
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df1['Ccum'], 'r-', where='post')
    axs[xi].step(df2['Ccum'], 'g-.', where='post')
    axs[xi].step(df3['Ccum'], 'b:', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    axs[xi].set_ylabel('$\mathrm{cumC}$'+'\n'+'$\mathrm{(Profit)}$'+'\n'+''+'$\mathrm{[\$]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
    axs[xi].set_xlabel('$t\ \mathrm{[decision\ windows]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
    for j in range(df1.shape[0]//T): axs[xi].axvline(x=(j+1)*T, color='grey', ls=':')

    fig.legend(
      ## [leg],
      labels=legendlabels,
      title="Policies",
      loc='upper right',
      fontsize=16,
      fancybox=True,
      shadow=True,
      ncol=1)

4.6 Policy Evaluation

4.6.1 Training/Tuning

## setup labels to plot info
RAvail_t_labels = ['RAvail_t_'+an for an in aNAMES]
RCumShifts_t_labels = ['RCumShifts_t_'+an for an in aNAMES]
DShift1_t_labels = ['DShift1_t_'+rt for rt in RESOURCE_TYPES]
xAlloc_t_labels = ['Allocd_t_'+abn for abn in abNAMES]
labels = ['piName', 'theta', 'l'] + \
  ['t', 'dt'] + \
  RAvail_t_labels + RCumShifts_t_labels + \
  DShift1_t_labels + \
  ['Ucum_'+rt for rt in RESOURCE_TYPES] + \
  ['Ucum_Total'] + \
  ['Ccum_CumShifts'] + \
  ['Ccum_SickProb'] + \
  ['Ccum'] + \
  xAlloc_t_labels

## def grid_search_thetas(thetas1, thetas2, thetas1_name, thetas2_name):
#   thetas = [
#     P.build_theta({thetas1_name: thA0, thetas2_name: thB0})
#     for thA0 in thetas1
#       for thB0 in thetas2
#   ]
#   return thetas

## def grid_search_thetas(thetas1, thetas1_name):
#   thetas = [
#     P.build_theta({thetas1_name: thA0})
#     for thA0 in thetas1
#   ]
#   return thetas

%%time
L = 50 #20pub #20db #N_SAMPLEPATHS
T = 14 #14pub #14db #N_TRANSITIONS
first_n_t = 10*T #200
last_n_t = 10*T #200

M = Model()
P = Policy(M)
SIM = DemandSimulator(seed=SEED_TRAIN)

thetasCumShifts = np.arange(
  TH_CumShifts_SPEC[0],
  TH_CumShifts_SPEC[1],
  TH_CumShifts_SPEC[2])
thetasSickProb = np.arange(
  TH_SickProb_SPEC[0],
  TH_SickProb_SPEC[1],
  TH_SickProb_SPEC[2])
## thetasAvail = np.arange(
#   TH_Avail_SPEC[0],
#   TH_Avail_SPEC[1],
#   TH_Avail_SPEC[2])
thetas_list = [(th0, 1-(th0)) for th0 in thetasCumShifts if (th0) <= 1.0]
## thetas_list = [(th0, th1, 1-(th0+th1)) for th0 in thetasCumShifts for th1 in thetasSickProb if (th0 + th1) <= 1.0]
thetas = [
  P.build_theta({'thCumShifts': tup[0], 'thSickProb': tup[1]})
  ## P.build_theta({'thCumShifts': tup[0], 'thSickProb': tup[1], 'thAvail': tup[2]})
  for tup in thetas_list
]

thetaStar_expCbarcum_Alloc, thetaStar_expCtilcum_Alloc, \
Cbarcum_Alloc, Ctilcum_Alloc, \
best_theta_Alloc, worst_theta_Alloc, \
best_Cbarcum_Alloc, worst_Cbarcum_Alloc, \
best_Ctilcum_Alloc, worst_Ctilcum_Alloc, \
record_Alloc = \
  P.perform_grid_search_sample_paths('X__Alloc', thetas)
f'{thetaStar_expCbarcum_Alloc.iloc[-1]=:.2f}'
df_first_n_t = pd.DataFrame.from_records(record_Alloc[:first_n_t], columns=labels)
df_last_n_t = pd.DataFrame.from_records(record_Alloc[-last_n_t:], columns=labels)

numThetas=50
... printing every 1th theta (if considered) ...
=== (0 / 50), theta=Theta(thCumShifts=0.0, thSickProb=1.0) ===
=== (1 / 50), theta=Theta(thCumShifts=0.02, thSickProb=0.98) ===
=== (2 / 50), theta=Theta(thCumShifts=0.04, thSickProb=0.96) ===
=== (3 / 50), theta=Theta(thCumShifts=0.06, thSickProb=0.94) ===
=== (4 / 50), theta=Theta(thCumShifts=0.08, thSickProb=0.92) ===
=== (5 / 50), theta=Theta(thCumShifts=0.1, thSickProb=0.9) ===
=== (6 / 50), theta=Theta(thCumShifts=0.12, thSickProb=0.88) ===
=== (7 / 50), theta=Theta(thCumShifts=0.14, thSickProb=0.86) ===
=== (8 / 50), theta=Theta(thCumShifts=0.16, thSickProb=0.84) ===
=== (9 / 50), theta=Theta(thCumShifts=0.18, thSickProb=0.8200000000000001) ===
=== (10 / 50), theta=Theta(thCumShifts=0.2, thSickProb=0.8) ===
=== (11 / 50), theta=Theta(thCumShifts=0.22, thSickProb=0.78) ===
=== (12 / 50), theta=Theta(thCumShifts=0.24, thSickProb=0.76) ===
=== (13 / 50), theta=Theta(thCumShifts=0.26, thSickProb=0.74) ===
=== (14 / 50), theta=Theta(thCumShifts=0.28, thSickProb=0.72) ===
=== (15 / 50), theta=Theta(thCumShifts=0.3, thSickProb=0.7) ===
=== (16 / 50), theta=Theta(thCumShifts=0.32, thSickProb=0.6799999999999999) ===
=== (17 / 50), theta=Theta(thCumShifts=0.34, thSickProb=0.6599999999999999) ===
=== (18 / 50), theta=Theta(thCumShifts=0.36, thSickProb=0.64) ===
=== (19 / 50), theta=Theta(thCumShifts=0.38, thSickProb=0.62) ===
=== (20 / 50), theta=Theta(thCumShifts=0.4, thSickProb=0.6) ===
=== (21 / 50), theta=Theta(thCumShifts=0.42, thSickProb=0.5800000000000001) ===
=== (22 / 50), theta=Theta(thCumShifts=0.44, thSickProb=0.56) ===
=== (23 / 50), theta=Theta(thCumShifts=0.46, thSickProb=0.54) ===
=== (24 / 50), theta=Theta(thCumShifts=0.48, thSickProb=0.52) ===
=== (25 / 50), theta=Theta(thCumShifts=0.5, thSickProb=0.5) ===
=== (26 / 50), theta=Theta(thCumShifts=0.52, thSickProb=0.48) ===
=== (27 / 50), theta=Theta(thCumShifts=0.54, thSickProb=0.45999999999999996) ===
=== (28 / 50), theta=Theta(thCumShifts=0.56, thSickProb=0.43999999999999995) ===
=== (29 / 50), theta=Theta(thCumShifts=0.58, thSickProb=0.42000000000000004) ===
=== (30 / 50), theta=Theta(thCumShifts=0.6, thSickProb=0.4) ===
=== (31 / 50), theta=Theta(thCumShifts=0.62, thSickProb=0.38) ===
=== (32 / 50), theta=Theta(thCumShifts=0.64, thSickProb=0.36) ===
=== (33 / 50), theta=Theta(thCumShifts=0.66, thSickProb=0.33999999999999997) ===
=== (34 / 50), theta=Theta(thCumShifts=0.68, thSickProb=0.31999999999999995) ===
=== (35 / 50), theta=Theta(thCumShifts=0.7000000000000001, thSickProb=0.29999999999999993) ===
=== (36 / 50), theta=Theta(thCumShifts=0.72, thSickProb=0.28) ===
=== (37 / 50), theta=Theta(thCumShifts=0.74, thSickProb=0.26) ===
=== (38 / 50), theta=Theta(thCumShifts=0.76, thSickProb=0.24) ===
=== (39 / 50), theta=Theta(thCumShifts=0.78, thSickProb=0.21999999999999997) ===
=== (40 / 50), theta=Theta(thCumShifts=0.8, thSickProb=0.19999999999999996) ===
=== (41 / 50), theta=Theta(thCumShifts=0.8200000000000001, thSickProb=0.17999999999999994) ===
=== (42 / 50), theta=Theta(thCumShifts=0.84, thSickProb=0.16000000000000003) ===
=== (43 / 50), theta=Theta(thCumShifts=0.86, thSickProb=0.14) ===
=== (44 / 50), theta=Theta(thCumShifts=0.88, thSickProb=0.12) ===
=== (45 / 50), theta=Theta(thCumShifts=0.9, thSickProb=0.09999999999999998) ===
=== (46 / 50), theta=Theta(thCumShifts=0.92, thSickProb=0.07999999999999996) ===
=== (47 / 50), theta=Theta(thCumShifts=0.9400000000000001, thSickProb=0.05999999999999994) ===
=== (48 / 50), theta=Theta(thCumShifts=0.96, thSickProb=0.040000000000000036) ===
=== (49 / 50), theta=Theta(thCumShifts=0.98, thSickProb=0.020000000000000018) ===
CPU times: user 21min 23s, sys: 15.9 s, total: 21min 39s
Wall time: 21min 44s

best_theta_Alloc

Theta(thCumShifts=0.98, thSickProb=0.020000000000000018)

P.plot_Fhat_map_2(
  FhatI_theta_I=Cbarcum_Alloc,
  thetasX=thetasCumShifts,
  thetasY=thetasSickProb,
  labelX=r'$\theta^{CumShifts}$',
  labelY=r'$\theta^{SickProb}$',
  title="Sample mean of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$("+ str(f'{best_theta_Alloc[0]:,.2f}')+", "+ str(f'{best_theta_Alloc[1]:,.2f}')+"), " \
    r"$\bar{C}^{cum}(\theta^*) =$"+f"{best_Cbarcum_Alloc:,.0f}\n"
)
print()
P.plot_Fhat_map_2(
  FhatI_theta_I=copy(Ctilcum_Alloc),
  thetasX=thetasCumShifts,
  thetasY=thetasSickProb,
  labelX=r'$\theta^{CumShifts}$',
  labelY=r'$\theta^{SickProb}$',
  title="Standard error of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$("+ str(f'{best_theta_Alloc[0]:,.2f}')+", "+ str(f'{best_theta_Alloc[1]:,.2f}')+"), " \
    r"$\tilde{C}^{cum}(\theta^*) =$"+f"{best_Ctilcum_Alloc:,.0f}\n"
);

P.plot_expFhat_chart(
  df=thetaStar_expCbarcum_Alloc,
  labelX=r'$\ell$',
  labelY=r"$exp\bar{C}^{cum}(\theta^*)$"+"\n(Profit)\n[$]",
  title="Expanding sample mean of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$("+ str(f'{best_theta_Alloc[0]:,.2f}')+", "+ str(f'{best_theta_Alloc[1]:,.2f}')+"), " \
    r"$\bar{C}^{cum}(\theta^*) =$"+f"{best_Cbarcum_Alloc:,.0f}\n",
  color_style='b-'
)
print()
P.plot_expFhat_chart(
  df=thetaStar_expCtilcum_Alloc,
  labelX=r'$\ell$',
  labelY=r"$exp\bar{C}^{cum}(\theta^*)$"+"\n(Profit)\n[$]",
  title="Expanding standard error of the cumulative reward"+f"\n $L={L}, T={T}$, "+ \
    r"$\theta^* =$("+ str(f'{best_theta_Alloc[0]:,.2f}')+", "+ str(f'{best_theta_Alloc[1]:,.2f}')+"), " \
    r"$\tilde{C}^{cum}(\theta^*) =$"+f"{best_Ctilcum_Alloc:,.0f}\n",
  color_style='b--'
)

f'{len(record_Alloc):,}', L, T

('35,000', 50, 14)

best_theta_Alloc

Theta(thCumShifts=0.98, thSickProb=0.020000000000000018)

P.plot_records(
  df=df_first_n_t,
  df_non=None,
  pars=defaultdict(str, {
    'thCumShifts': best_theta_Alloc.thCumShifts,
    'thSickProb': best_theta_Alloc.thSickProb,
    ## 'legendLabels': [r'$\mathrm{opt}$', r'$\mathrm{non}$'],
    'suptitle': f'TRAINING OF X__Alloc POLICY'+'\n'+f'(first {first_n_t} records)'+'\n'+ \
    f'L = {L}, T = {T}, '+ \
    r"$\theta^*=$("+ str(f'{best_theta_Alloc[0]:,.2f}')+", "+ str(f'{best_theta_Alloc[1]:,.2f}')+")"
  }),
);

P.plot_records(
  df=df_last_n_t,
  df_non=None,
  pars=defaultdict(str, {
    'thCumShifts': best_theta_Alloc.thCumShifts,
    'thSickProb': best_theta_Alloc.thSickProb,
    ## 'legendLabels': [r'$\mathrm{opt}$', r'$\mathrm{non}$'],
    'suptitle': f'TRAINING OF X__Alloc POLICY'+'\n'+f'(last {last_n_t} records)'+'\n'+ \
    f'L = {L}, T = {T}, '+ \
    r"$\theta^*=$("+ str(f'{best_theta_Alloc[0]:,.2f}')+", "+ str(f'{best_theta_Alloc[1]:,.2f}')+")"
  }),
)

4.6.2 Evaluation

4.6.2.1 X__AllocBelow

best_theta_Alloc

Theta(thCumShifts=0.98, thSickProb=0.020000000000000018)

worst_theta_Alloc

Theta(thCumShifts=0.0, thSickProb=1.0)

%%time
L = 20 #20pub #20db #N_SAMPLEPATHS
T = 14 #20pub #10db #N_TRANSITIONS
first_n_t = int(.2*L*T)

M = Model()
P = Policy(M)
SIM = DemandSimulator(seed=SEED_EVALU)
thetasOpt = []; thetasOpt.append(best_theta_Alloc)
thetaStar_expCbarcum_Alloc_evalu_opt, thetaStar_expCtilcum_Alloc_evalu_opt, \
_, _, \
best_theta_Alloc_evalu_opt, worst_theta_Alloc_evalu_opt, \
_, _, \
_, _, \
record_Alloc_evalu_opt = \
  P.perform_grid_search_sample_paths('X__Alloc', thetasOpt)
df_Alloc_evalu_opt = pd.DataFrame.from_records(
    record_Alloc_evalu_opt[:first_n_t], columns=labels)

M = Model()
P = Policy(M)
SIM = DemandSimulator(seed=SEED_EVALU)
thetasNon = []; thetasNon.append(worst_theta_Alloc)
## thetasNon = []; thetasNon.append(
#   P.build_theta(
#     {'thCumShifts': 1.0, 'thSickProb': 1.0}
#   )
# )
thetaStar_expCbarcum_Alloc_evalu_non, thetaStar_expCtilcum_Alloc_evalu_non, \
_, _, \
best_theta_Alloc_evalu_non, worst_theta_Alloc_evalu_non, \
_, _, \
_, _, \
record_Alloc_evalu_non = \
  P.perform_grid_search_sample_paths('X__Alloc', thetasNon)
df_Alloc_evalu_non = pd.DataFrame.from_records(
    record_Alloc_evalu_non[:first_n_t], columns=labels)

print(
  f'{thetaStar_expCbarcum_Alloc_evalu_opt.iloc[-1]=:.2f}, \
    {thetaStar_expCbarcum_Alloc_evalu_non.iloc[-1]=:.2f}')

numThetas=1
... printing every 1th theta (if considered) ...
=== (0 / 1), theta=Theta(thCumShifts=0.98, thSickProb=0.020000000000000018) ===
numThetas=1
... printing every 1th theta (if considered) ...
=== (0 / 1), theta=Theta(thCumShifts=0.0, thSickProb=1.0) ===
thetaStar_expCbarcum_Alloc_evalu_opt.iloc[-1]=725.49,     thetaStar_expCbarcum_Alloc_evalu_non.iloc[-1]=-42.50
CPU times: user 21.3 s, sys: 199 ms, total: 21.5 s
Wall time: 21.6 s

P.plot_records(
  df=df_Alloc_evalu_opt,
  df_non=df_Alloc_evalu_non,
  pars=defaultdict(str, {
    'thCumShifts': best_theta_Alloc_evalu_opt.thCumShifts,
    'thSickProb': best_theta_Alloc_evalu_opt.thSickProb,
    'thCumShiftsNon': best_theta_Alloc_evalu_non.thCumShifts,
    'thSickProbNon': best_theta_Alloc_evalu_non.thSickProb,
    'legendLabels': [r'$\mathrm{opt}$', r'$\mathrm{non}$'],
    'suptitle': f'EVALUATION OF X__Alloc POLICY'+'\n'+f'(first {first_n_t} records)'+'\n'+ \
    f'L = {L}, T = {T}, '+ \
    r"$\theta^*=$("+ str(f'{best_theta_Alloc_evalu_opt[0]:,.2f}')+", "+ str(f'{best_theta_Alloc_evalu_opt[1]:,.2f}')+")"
  }),
)

## last_n_l = int(.99*L)
last_n_l = int(1.0*L)
P.plot_expFhat_charts(
  means={
      'AllocBelow optimal': thetaStar_expCbarcum_Alloc_evalu_opt[-last_n_l:],
      'AllocBelow non-optimal': thetaStar_expCbarcum_Alloc_evalu_non[-last_n_l:],
  },
  stdvs={
      'AllocBelow optimal': thetaStar_expCtilcum_Alloc_evalu_opt[-last_n_l:],
      'AllocBelow non-optimal': thetaStar_expCtilcum_Alloc_evalu_non[-last_n_l:],
  },
  labelX='Sample paths, ' + r'$\ell$',
  labelY='Profit\n[Allocs]',
  suptitle=f"Comparison of Optimal/Non-optimal Policies after Evaluation\n \
    L = {L}, T = {T}\n \
    last {last_n_l} records\n \
    ('exp' refers to expanding)",
  pars=defaultdict(str, {
    'colors': ['m', 'c']
  }),
)

Next, we evaluate with a single sample-path:

%%time
L = 1 #N_SAMPLEPATHS
T = 2*7 #N_TRANSITIONS
first_n_t = int(1*L*T)

M = Model()
P = Policy(M)
SIM = DemandSimulator(seed=SEED_EVALU)
thetasOpt = []; thetasOpt.append(best_theta_Alloc)
thetaStar_expCbarcum_Alloc_evalu_opt, thetaStar_expCtilcum_Alloc_evalu_opt, \
_, _, \
best_theta_Alloc_evalu_opt, worst_theta_Alloc_evalu_opt, \
_, _, \
_, _, \
record_Alloc_evalu_opt = \
  P.perform_grid_search_sample_paths('X__Alloc', thetasOpt)
df_Alloc_evalu_opt = pd.DataFrame.from_records(
    record_Alloc_evalu_opt[:first_n_t], columns=labels)

M = Model()
P = Policy(M)
SIM = DemandSimulator(seed=SEED_EVALU)
thetasNon = []; thetasNon.append(worst_theta_Alloc)
## thetasNon = []; thetasNon.append(
#   P.build_theta(
#     {'thHi': 9}
#   )
# )
thetaStar_expCbarcum_Alloc_evalu_non, thetaStar_expCtilcum_Alloc_evalu_non, \
_, _, \
best_theta_Alloc_evalu_non, worst_theta_Alloc_evalu_non, \
_, _, \
_, _, \
record_Alloc_evalu_non = \
  P.perform_grid_search_sample_paths('X__Alloc', thetasNon)
df_Alloc_evalu_non = pd.DataFrame.from_records(
    record_Alloc_evalu_non[:first_n_t], columns=labels)

print(
  f'{thetaStar_expCbarcum_Alloc_evalu_opt.iloc[-1]=:.2f}, \
    {thetaStar_expCbarcum_Alloc_evalu_non.iloc[-1]=:.2f}')

numThetas=1
... printing every 1th theta (if considered) ...
=== (0 / 1), theta=Theta(thCumShifts=0.98, thSickProb=0.020000000000000018) ===
numThetas=1
... printing every 1th theta (if considered) ...
=== (0 / 1), theta=Theta(thCumShifts=0.0, thSickProb=1.0) ===
thetaStar_expCbarcum_Alloc_evalu_opt.iloc[-1]=720.00,     thetaStar_expCbarcum_Alloc_evalu_non.iloc[-1]=-72.80
CPU times: user 1.02 s, sys: 11.1 ms, total: 1.03 s
Wall time: 1.04 s

RuntimeWarning: invalid value encountered in double_scalars
  Ctilcum_tmp = np.sum(np.square(np.array(CcumIomega__lI) - Cbarcum_tmp))/(L - 1)
<ipython-input-13-eee40a8d5048>:136: RuntimeWarning: invalid value encountered in double_scalars
  Ctilcum_tmp = np.sum(np.square(np.array(CcumIomega__lI) - Cbarcum_tmp))/(L - 1)

P.plot_records(
  df=df_Alloc_evalu_opt,
  ## df_non=df_Alloc_evalu_non,
  df_non=None,
  pars=defaultdict(str, {
    'thCumShifts': best_theta_Alloc_evalu_opt.thCumShifts,
    'thSickProb': best_theta_Alloc_evalu_opt.thSickProb,
    'thCumShiftsNon': best_theta_Alloc_evalu_non.thCumShifts,
    'thSickProbNon': best_theta_Alloc_evalu_non.thSickProb,
    'legendLabels': [r'$\mathrm{opt}$', r'$\mathrm{non}$'],
    'suptitle': f'EVALUATION OF X__Alloc POLICY'+'\n'+f'(first {first_n_t} records)'+'\n'+ \
    f'L = {L}, T = {T}, '+ \
    r"$\theta^*=$("+ str(f'{best_theta_Alloc_evalu_opt[0]:,.2f}')+", "+ str(f'{best_theta_Alloc_evalu_opt[1]:,.2f}')+")"
  }),
)

df_Alloc_evalu_opt[df_Alloc_evalu_opt['t']==T-1][['Ccum']]

	Ccum
13	720.0040

df_Alloc_evalu_non[df_Alloc_evalu_non['t']==T-1][['Ccum']]

	Ccum
13	-72.8000

From the Ccum plot we see that the cumulative reward for the optimal policy keeps on rising. The non-optimal policy does not do as well.

5 EVALUATION

mask = df_Alloc_evalu_opt.columns.str.contains('Allocd_t')
resource_allocs = list(df_Alloc_evalu_opt.columns[mask])
resource_allocs

['Allocd_t_1_Courtesy___Courtesy',
 'Allocd_t_2_Courtesy___Courtesy',
 'Allocd_t_3_Courtesy___Courtesy',
 'Allocd_t_4_Courtesy___Courtesy',
 'Allocd_t_5_Courtesy___Courtesy',
 'Allocd_t_6_Courtesy___Courtesy',
 'Allocd_t_7_Courtesy___Courtesy',
 'Allocd_t_8_Stocker___Stocker',
 'Allocd_t_9_Stocker___Stocker',
 'Allocd_t_10_Stocker___Stocker']

sched = copy(df_Alloc_evalu_opt)
schedule = sched[['dt']+resource_allocs]
schedule

	dt	Allocd_t_1_Courtesy___Courtesy	Allocd_t_2_Courtesy___Courtesy	Allocd_t_3_Courtesy___Courtesy	Allocd_t_4_Courtesy___Courtesy	Allocd_t_5_Courtesy___Courtesy	Allocd_t_6_Courtesy___Courtesy	Allocd_t_7_Courtesy___Courtesy	Allocd_t_8_Stocker___Stocker	Allocd_t_9_Stocker___Stocker	Allocd_t_10_Stocker___Stocker
0	2023-10-29	True	False	False	False	False	False	False	True	False	False
1	2023-10-30	False	True	False	False	False	False	False	True	True	False
2	2023-10-31	True	True	False	True	False	False	False	True	True	False
3	2023-11-01	False	True	True	True	False	True	True	True	True	False
4	2023-11-02	False	True	True	True	True	False	False	False	True	True
5	2023-11-03	False	True	False	False	True	True	False	True	True	False
6	2023-11-04	False	False	True	False	False	False	False	True	False	True
7	2023-11-05	True	False	False	False	False	True	False	True	True	True
8	2023-11-06	False	True	True	True	True	False	True	True	True	False
9	2023-11-07	False	False	True	False	False	False	False	True	True	False
10	2023-11-08	False	True	False	True	False	False	True	False	False	False
11	2023-11-09	False	True	True	True	False	False	False	False	False	False
12	2023-11-10	False	True	False	False	True	True	False	False	True	False
13	2023-11-11	False	False	True	True	False	False	True	True	False	True

## print out the 14-day (2-week) schedule
print(f"SCHEDULE:")
for res_alloc in resource_allocs:
  _,_,id,resType,_,_,_ = res_alloc.split('_')
  resName = id+'_'+resType
  print(f'\n{resName}:')
  sched_list = list(schedule.loc[
    schedule[res_alloc] == True,
    ['dt', res_alloc]
  ]['dt'])
  print(f"{[ts.strftime('%a %b %d %Y') for ts in sched_list]}")

SCHEDULE:

1_Courtesy:
['Sun Oct 29 2023', 'Tue Oct 31 2023', 'Sun Nov 05 2023']

2_Courtesy:
['Mon Oct 30 2023', 'Tue Oct 31 2023', 'Wed Nov 01 2023', 'Thu Nov 02 2023', 'Fri Nov 03 2023', 'Mon Nov 06 2023', 'Wed Nov 08 2023', 'Thu Nov 09 2023', 'Fri Nov 10 2023']

3_Courtesy:
['Wed Nov 01 2023', 'Thu Nov 02 2023', 'Sat Nov 04 2023', 'Mon Nov 06 2023', 'Tue Nov 07 2023', 'Thu Nov 09 2023', 'Sat Nov 11 2023']

4_Courtesy:
['Tue Oct 31 2023', 'Wed Nov 01 2023', 'Thu Nov 02 2023', 'Mon Nov 06 2023', 'Wed Nov 08 2023', 'Thu Nov 09 2023', 'Sat Nov 11 2023']

5_Courtesy:
['Thu Nov 02 2023', 'Fri Nov 03 2023', 'Mon Nov 06 2023', 'Fri Nov 10 2023']

6_Courtesy:
['Wed Nov 01 2023', 'Fri Nov 03 2023', 'Sun Nov 05 2023', 'Fri Nov 10 2023']

7_Courtesy:
['Wed Nov 01 2023', 'Mon Nov 06 2023', 'Wed Nov 08 2023', 'Sat Nov 11 2023']

8_Stocker:
['Sun Oct 29 2023', 'Mon Oct 30 2023', 'Tue Oct 31 2023', 'Wed Nov 01 2023', 'Fri Nov 03 2023', 'Sat Nov 04 2023', 'Sun Nov 05 2023', 'Mon Nov 06 2023', 'Tue Nov 07 2023', 'Sat Nov 11 2023']

9_Stocker:
['Mon Oct 30 2023', 'Tue Oct 31 2023', 'Wed Nov 01 2023', 'Thu Nov 02 2023', 'Fri Nov 03 2023', 'Sun Nov 05 2023', 'Mon Nov 06 2023', 'Tue Nov 07 2023', 'Fri Nov 10 2023']

10_Stocker:
['Thu Nov 02 2023', 'Sat Nov 04 2023', 'Sun Nov 05 2023', 'Sat Nov 11 2023']