Dealership Inventory Management using the Powell Unified Framework (Part 1)

Back to Portfolio of Projects | LearnableLoopAI.com | Blog |

0 STRUCTURE & FRAMEWORK

The overall structure of this project and report follows the traditional CRISP-DM format. However, instead of the CRISP-DM’S “4 Modeling” section, we inserted the “6 step modeling process” of Dr. Warren Powell in section 4 of this document. Dr Powell’s unified framework shows great promise for unifying the formalisms of at least a dozen different fields. Using his framework enables easier access to thinking patterns in these other fields that might be beneficial and informative to the sequential decision problem at hand. Traditionally, this kind of problem would be approached from the reinforcement learning perspective. However, using Dr. Powell’s wider and more comprehensive perspective almost certainly provides additional value.

Here is information on Dr. Powell’s perspective on Sequential Decision Analytics.

In order to make a strong mapping between the code in this notebook and the mathematics in the Powell Unified Framework (PUF), we follow the following convention for naming Python identifier names:

Superscripts
- variable names have a double underscore to indicate a superscript
- $X^{π}$ : has code X__pi, is read X pi
Subscripts
- variable names have a single underscore to indicate a subscript
- $S_{t}$ : has code S_t, is read ‘S at t’
- $M_{t}^{S p e n d}$ has code M__Spend_t which is read: “MSpend at t”
Arguments
- collection variable names may have argument information added
- $X^{π} (S_{t})$ : has code X__piIS_tI, is read ‘X pi in S at t’
- the surrounding I’s are used to imitate the parentheses around the argument
Next time/iteration
- variable names that indicate one step in the future are quite common
- $R_{t + 1}$ : has code R_tt1, is read ‘R at t+1’
- $R^{n + 1}$ : has code R__nt1, is read ‘R at n+1’
Rewards
- State-independent terminal reward and cumulative reward
  - $F$ : has code F for terminal reward
  - $\sum_{n} F$ : has code cumF for cumulative reward
- State-dependent terminal reward and cumulative reward
  - $C$ : has code C for terminal reward
  - $\sum_{t} C$ : has code cumC for cumulative reward
Vectors where components use different names
- $S_{t} (R_{t}, p_{t})$ : has code S_t.R_t and S_t.p_t, is read ‘S at t in R at t, and, S at t in p at t’
- the code implementation is by means of a named tuple
  - self.State = namedtuple('State', SVarNames) for the ‘class’ of the vector
  - self.S_t for the ‘instance’ of the vector
Vectors where components reuse names
- $x_{t} (x_{t, G B}, x_{t, B L})$ : has code x_t.x_t_GB and x_t.x_t_BL, is read ‘x at t in x at t for GB, and, x at t in x at t for BL’
- the code implementation is by means of a named tuple
  - self.Decision = namedtuple('Decision', xVarNames) for the ‘class’ of the vector
  - self.x_t for the ‘instance’ of the vector
Use of mixed-case variable names
- to reduce confusion, sometimes the use of mixed-case variable names are preferred (even though it is not a best practice in the Python community), reserving the use of underscores and double underscores for math-related variables

1 BUSINESS UNDERSTANDING

Inventory management is a critical component of any business, whether it be a small retail store or a multinational corporation. At its core, inventory management is the process of tracking and controlling a company’s inventory, from raw materials to finished products. Proper inventory management is important for several reasons.

First and foremost, inventory management helps businesses avoid stock overages and underages (overstocks and stockouts). By tracking inventory levels and forecasting demand, businesses can ensure that they always have the right amount of product on hand to meet customer needs without overbuying and tying up capital in excess inventory. This helps businesses maintain a healthy cash flow and avoid costly stockouts that can result in lost sales and dissatisfied customers.

In addition, effective inventory management can help businesses streamline their operations and improve their overall efficiency. By reducing excess inventory and optimizing order quantities and lead times, businesses can minimize waste and improve their supply chain management. This can lead to cost savings, improved profitability, and increased customer satisfaction.

Finally, inventory management is critical for businesses that need to comply with regulatory requirements, such as those in the pharmaceutical or food industries. Proper inventory tracking and documentation can help businesses meet these requirements and avoid costly fines and penalties.

Overall, inventory management is an essential function for any business that wants to operate efficiently, meet customer demand, and maximize profitability. Effective inventory management requires careful planning, accurate data, and the right tools and processes to ensure that businesses always have the right amount of product on hand, at the right time, and at the right cost.

In this project the client had a need to be convinced of the benefits of formal optimized sequential decision making. This was provided in the form of a series of POCs.

2 DATA UNDERSTANDING

Next, we will look at how we will simulate the data for this problem.

from collections import namedtuple, defaultdict
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from copy import copy
import time
from scipy.ndimage.interpolation import shift
import pickle
from bisect import bisect
import math
from pprint import pprint
import matplotlib as mpl
pd.options.display.float_format = '{:,.4f}'.format
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)
! python --version

Python 3.10.11

DeprecationWarning: Please use `shift` from the `scipy.ndimage` namespace, the `scipy.ndimage.interpolation` namespace is deprecated.
  from scipy.ndimage.interpolation import shift

We will simulate the inventory demand, $D$ , by: $D_{t + 1} = θ_{0} D_{t} + θ_{1} D_{t - 1} + θ_{2} D_{t - 2} + ϵ_{p o i s (5)}$ where $\begin{aligned} θ_{0} & = 0.14 \\ θ_{1} & = 0.31 \\ θ_{2} & = 0.25 \end{aligned}$

class DemandSimulator():
    def __init__(self, th_0, th_1, th_2, D_t, D_t_1, D_t_2):
        self.th_0 = th_0
        self.th_1 = th_1
        self.th_2 = th_2
        self.D_t = D_t
        self.D_t_1 = D_t_1
        self.D_t_2 = D_t_2

    def simulate(self):
        D_tt1 = self.th_0*self.D_t + self.th_1*self.D_t_1 + self.th_2*self.D_t_2 + np.random.poisson(5)
        d = int(max(0, D_tt1))
        self.D_t_2 = self.D_t_1
        self.D_t_1 = self.D_t
        self.D_t = D_tt1
        return d

dem_sim = DemandSimulator(.14,.31,.25, 5,5,5)
DemandData = []
for i in range(100):
  d = dem_sim.simulate()
  DemandData.append([d])

labels = ['demand']
df = pd.DataFrame.from_records(data=DemandData, columns=labels); df[:10]

	demand
0	7
1	8
2	11
3	11
4	12
5	12
6	11
7	10
8	14
9	13

def plot_output(df1, df2):
  n_charts = 2
  ylabelsize = 16
  mpl.rcParams['lines.linewidth'] = 1.2
  fig, axs = plt.subplots(n_charts, sharex=True)
  fig.set_figwidth(13); fig.set_figheight(9)
  fig.suptitle('Demand Simulation', fontsize=20)

  i = 0 #Demand
  axs[i].set_title('Demanded units')
  axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
  axs[i].step(df1['demand'], 'k')

  i = 1 #parameters
  axs[i].set_title('Parameter values')
  axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
  axs[i].axhline(y=dem_sim.th_0, color='r', linestyle=':')
  axs[i].axhline(y=dem_sim.th_1, color='g', linestyle=':')
  axs[i].axhline(y=dem_sim.th_2, color='b', linestyle=':')
plot_output(df, None)

seed = 189654913
file = 'Parameters.xlsx'

# NOTE:
# R__max: maximum number of inventory units
# R_0:    initial number of inventory units
parDf = pd.read_excel(f'{base_dir}/{file}', sheet_name='ParamsModel', index_col=0); print(f'{parDf}')
parDict = parDf.T.to_dict('list') #.
params = {key:v for key, value in parDict.items() for v in value}
params['seed'] = seed
params['T'] = min(params['T'], 192); print(f'{params=}')

                    0
Index                
Algorithm  GridSearch
T                 195
eta                 1
R__max             57
R_0                 0
params={'Algorithm': 'GridSearch', 'T': 192, 'eta': 1, 'R__max': 57, 'R_0': 0, 'seed': 189654913}

parDf = pd.read_excel(f'{base_dir}/{file}', sheet_name='GridSearch', index_col=0); print(parDf)
parDict = parDf.T.to_dict('list')
paramsPolicy = {key:v for key, value in parDict.items() for v in value}; print(f'{paramsPolicy=}')
params.update(paramsPolicy); pprint(f'{params=}')

                  0
Index              
theta_sell_min   10
theta_sell_max  100
theta_buy_min    10
theta_buy_max   100
theta_inc         1
paramsPolicy={'theta_sell_min': 10, 'theta_sell_max': 100, 'theta_buy_min': 10, 'theta_buy_max': 100, 'theta_inc': 1}
("params={'Algorithm': 'GridSearch', 'T': 192, 'eta': 1, 'R__max': 57, 'R_0': "
 "0, 'seed': 189654913, 'theta_sell_min': 10, 'theta_sell_max': 100, "
 "'theta_buy_min': 10, 'theta_buy_max': 100, 'theta_inc': 1}")

pprint(f"{params=}")

("params={'Algorithm': 'GridSearch', 'T': 192, 'eta': 1, 'R__max': 57, 'R_0': "
 "0, 'seed': 189654913, 'theta_sell_min': 10, 'theta_sell_max': 100, "
 "'theta_buy_min': 10, 'theta_buy_max': 100, 'theta_inc': 1}")

3 DATA PREPARATION

We will use the data provided by the simulator directly. There is no need to perform additional data preparation.

4 MODELING

4.1 Narrative

This first project in the Inventory Series tackles the simplest of inventory problems.

We have the following setting: Mr. Optimal is car lot inventory manager for the largest dealership in a big city. For now, he will only be responsible to manage the inventory level of a single car model, that of the Hyunday Elantra. He has a maximum number of lot spaces assigned to him (which is 57). He has a choice to strive to always keep these spaces occupied by new cars. This way he is unlikely to run out of stock and lose a sale due to that. However, capital is tied up by the unsold inventory in his lot space.

At the other extreme, he may choose to work on a just-in-time principle: Each time a potential customer expresses interest in this model, the customer will have to wait until he obtains a new car from the supplier. Of course, he will likely lose the sale, but the upside is that no capital is tied up in his inventory.

It seems intuitive that the optimal level of inventory will be somewhere between these extremes. The challenge is to find that optimal level. For now, we will assume that the buy and sell prices will remain constant. The only random variable will be the demand for this specific model. Another assumption is that ordered inventory will arrive immediately. In addition, unsatisfied demands are lost, i.e. there will be no ability to backlog unsatisfied demands.

4.2 Core Elements

This section attempts to answer three important questions: - What metrics are we going to track? - What decisions do we intend to make? - What are the sources of uncertainty?

For this problem, the only metric we are interested in is the amount of profit we make after each decision window. A single type of decision needs to be made at the start of each window - how many new cars to order. The only source of uncertainty is the level of demand for this model.

4.3 Mathematical Model | SUM Design

A Python class is used to implement the model for the SUM (System Under Management):

class InventoryStorageModel():
    def __init__(
        self, SVarNames, xVarNames, S_0, params, exogParams, possibleDecisions,
        p__buy, p__sell, W_fn=None, S__M_fn=None, C_fn=None):
        ...
        ...

4.3.1 State variables

The state variables represent what we need to know.

$R_{t}$
- the inventory on hand at time $t$ before we make a new ordering decision, and before we have satisfied any demands arising in time interval $t$
- measured in inventory units
$D_{t}$
- the demand
- measured in inventory units

The state is:

$S_{t} = (R_{t}, D_{t})$

The state variables are represented by the following variables in the InventoryStorageModel class:

        self.SVarNames = SVarNames
        self.State = namedtuple('State', SVarNames) # 'class'
        self.S_t = self.build_state(self.S_0) # 'instance'

where

SVarNames = ['R_t', 'D_t']

4.3.2 Decision variables

The decision variables represent what we control.

$x_{t} = (x_{t, E L A})$
- number of Elantras ordered ( $x_{t} \geq 0$ ) where $x_{t}$ is a positive integer
- we represent this scalar by a vector with a single component to allow for easy expansion later in the series
Constraints
- $x_{t, E L A} \leq (R^{m a x} - R_{t})$
- $R^{m a x}$ is the number of lot units (i.e. parking spaces) assigned to Mr. Optimal
Decisions are made with a policy (TBD below):
- $X^{π} (S_{t})$

The decision variables are represented by the following variables in the InventoryStorageModel class:

self.Decision = namedtuple('Decision', xVarNames) # 'class'

where

xVarNames = ['x_t_ELA']

4.3.3 Exogenous information variables

The exogenous information variables represent what we did not know (when we made a decision). These are the variables that we cannot control directly. The information in these variables become available after we make the decision $x_{t}$ .

We assume that any unsatisfied demand is lost. Additionally, we assume that the demand in each time period is revealed, so that we have:

$W_{t + 1} = {\hat{D}}_{t + 1} = D_{t + 1}$

The exogenous information is obtained by a call to

DemandSimulator.simulate_demand(...)

The latest exogenous information can be accessed by calling the following method from class InventoryStorageModel():

def W_fn(self, t):
  W_tt1 = dem_sim.simulate()
  return W_tt1

4.3.4 Transition function

The transition function describe how the state variables evolve over time. Because we currently have two state variables in the state, $S_{t} = (R_{t}, D_{t})$ , we have the equations:

$\begin{aligned} R_{t + 1} & = R_{t} - \min {R_{t}, D_{t}} + x_{t} (E q . 1) \\ D_{t + 1} & = {\hat{D}}_{t + 1} (E q . 2) \end{aligned}$

Collectively, they represent the general transition function:

$S_{t + 1} = S^{M} (S_{t}, X^{π} (S_{t}))$ The transition function is implemented by the following method in class InventoryStorageModel():

def S__M_fn(self, t, x_t): #.
    D_tt1 = self.W_fn(t)
    R_tt1 = max( 0, self.S_t.R_t - min(self.S_t.R_t, self.S_t.D_t) + x_t.x_t_ELA ) #max to keep >0
    
    if len(self.SVarNames) == 2:
        S_tt1 = self.build_state({'R_t': R_tt1, 'D_t': D_tt1})
    elif len(self.SVarNames) == 3:
        S_tt1 = self.build_state({'R_t': R_tt1, 'D_t': D_tt1, 'D_t_1': self.S_t.D_t})
    return S_tt1

Note that x_t.x_t_ELA is currently the only component of $x_{t}$ . The ELA stands for Elantra.

4.3.5 Objective function

The objective function captures the performance metrics of the solution to the problem.

We can write the state-dependant reward (also called contribution) based on what we will receive between $t - 1$ and $t$ (i.e. looking backward relative to $(S_{t}, x_{t})$ ):

$\begin{array}{r} C (S_{t}, x_{t}) = p^{s e l l} min {R_{t}, D_{t}} - p^{b u y} x_{t} \end{array}$ This is a deterministic expression.

Alternatively, we can write the state-dependant reward based on what we will receive between $t$ and $t + 1$ (i.e. looking forward relative to $(S_{t}, x_{t})$ ):

$\begin{aligned} C (S_{t}, x_{t}, {\hat{D}}_{t + 1}) & = p^{s e l l} min {R_{t + 1}, D_{t + 1}} - p^{b u y} x_{t} \\ = p^{s e l l} \min {(R_{t} - \min {R_{t}, D_{t}} + x_{t}), {\hat{D}}_{t + 1}} - p^{b u y} x_{t} (RLSO - Eq 8.5) \end{aligned}$ because, from (Eq. 1) and (Eq. 2) above: $\begin{aligned} R_{t + 1} & = R_{t} - \min {R_{t}, D_{t}} + x_{t} (E q . 1) \\ D_{t + 1} & = {\hat{D}}_{t + 1} (E q . 2) \end{aligned}$ This is a stochastic expression due to the dependence on the random variable ${\hat{D}}_{t + 1}$ . It is random because it comes from a stochastic process but it is also in the future.

This second form leads to the objective function:

$max_{π} E {\sum_{t = 0}^{T} C (S_{t}, x_{t}, W_{t + 1})}$

where $C (S_{t}, x_{t}, W_{t + 1}) = p^{s e l l} \min {(R_{t} - \min {R_{t}, D_{t}} + x_{t}), {\hat{D}}_{t + 1}} - p^{b u y} x_{t}$

Note that $x_{t}$ is a single-component vector. In the present case $x_{t} = (x_{t, E L A})$

The contribution (reward) function is implemented by the following method in class InventoryStorageModel():

def C_fn(self, x_t):
    Dhat_tt1 = dem_sim.simulate()
    C_t = \
    self.p__sell*min((self.S_t.R_t - min(self.S_t.R_t, self.S_t.D_t) + x_t.x_t_ELA), Dhat_tt1) \
      - self.p__buy*x_t.x_t_ELA
    return C_t

4.3.6 Implementation of SUM Model

Here is the complete implementation of the InventoryStorageModel class:

class InventoryStorageModel():
    def __init__(
        self, SVarNames, xVarNames, S_0, params, exogParams, possibleDecisions,
        p__buy, p__sell, W_fn=None, S__M_fn=None, C_fn=None):
        self.initArgs = params
        self.prng = np.random.RandomState(params['seed'])
        self.exogParams = exogParams
        self.S_0 = S_0
        self.SVarNames = SVarNames
        self.xVarNames = xVarNames
        self.possibleDecisions = possibleDecisions
        self.p__buy = p__buy
        self.p__sell = p__sell
        self.State = namedtuple('State', SVarNames) #. 'class'
        self.S_t = self.build_state(self.S_0) #. 'instance'
        self.Decision = namedtuple('Decision', xVarNames) #. 'class'
        self.cumC = 0.0 #. cumulative reward; use F or cumF for final (i.e. no cumulative) reward

    def reset(self):
        self.cumC = 0.0
        self.S_t = self.build_state(self.S_0)

    def build_state(self, info):
        return self.State(*[info[sn] for sn in self.SVarNames])

    def build_decision(self, info):
        return self.Decision(*[info[xn] for xn in self.xVarNames])

    def W_fn(self, t): #.
        W_tt1 = dem_sim.simulate()
        return W_tt1

    def S__M_fn(self, t, x_t): #.
        D_tt1 = self.W_fn(t) 
        R_tt1 = max( 0, self.S_t.R_t - min(self.S_t.R_t, self.S_t.D_t) + x_t.x_t_ELA ) #max to keep >0
        if len(self.SVarNames) == 2:
            S_tt1 = self.build_state({'R_t': R_tt1, 'D_t': D_tt1})
        elif len(self.SVarNames) == 3:
            S_tt1 = self.build_state({'R_t': R_tt1, 'D_t': D_tt1, 'D_t_1': self.S_t.D_t})
        return S_tt1

    # based on what we will receive between t and t+1 (i.e. looking *forward* relative to (S_t,x_t) #.
    # RLSO-Eq8.5
    def C_fn(self, x_t):
        Dhat_tt1 = dem_sim.simulate()
        C = \
        self.p__sell*min((self.S_t.R_t - min(self.S_t.R_t, self.S_t.D_t) + x_t.x_t_ELA), Dhat_tt1) \
          - self.p__buy*x_t.x_t_ELA
        return C

    def step(self, t, x_t):
        self.cumC += self.C_fn(x_t)
        self.S_t = self.S__M_fn(t, x_t)
        return (self.S_t, self.cumC, x_t) #. for plotting

4.4 Uncertainty Model

As described in section 2, we will simulate the inventory demand, $D$ , by: $D_{t + 1} = θ_{0} D_{t} + θ_{1} D_{t - 1} + θ_{2} D_{t - 2} + ϵ_{p o i s (5)}$

4.5 Policy Design

There are two main meta-classes of policy design. Each of these has two subclasses: - Policy Search - Policy Function Approximations (PFAs) - Cost Function Approximations (CFAs) - Lookahead - Value Function Approximations (VFAs) - Direct Lookaheads (DLAs)

In this project we will only use one approach: - A simple buy below parameterized policy (from the PFA class)

The buy below policy is implemented by the following method in class InventoryStoragePolicy():

def X__BuyBelow(self, t, S_t, theta, T):
    info = {
        'x_t_ELA': 0, #number of Elantras ordered
    }
    if t >= T:
        print(f"ERROR: t={t} should not reach or exceed the max steps ({T})")
        return self.model.build_decision(info)
    theta__buy = theta[0] # theta__buy
    if S_t.R_t <= theta__buy: # BUY if R_t <= theta__buy
        info['x_t_ELA'] = self.model.initArgs['R__max'] - S_t.R_t
    return self.model.build_decision(info)

4.5.1 Implementation of Policy Design

The InventoryStoragePolicy() class implements the policy design.

import random
class InventoryStoragePolicy():
    def __init__(self, model, piNames):
        self.model = model
        self.piNames = piNames
        self.Policy = namedtuple('Policy', piNames)

    def X__BuyBelow(self, t, S_t, theta, T):
        info = {
            'x_t_ELA': 0, #number of Elantras ordered
        }
        if t >= T:
            print(f"ERROR: t={t} should not reach or exceed the max steps ({T})")
            return self.model.build_decision(info)
        theta__buy = theta[0] # theta__buy
        if S_t.R_t <= theta__buy: # BUY if R_t <= theta__buy
            info['x_t_ELA'] = self.model.initArgs['R__max'] - S_t.R_t
        return self.model.build_decision(info)

    def run_policy(self, piInfo, piName, params):
        model_copy = copy(self.model)
        T = params['T']
        for t in range(T): #for each transition/step
            x_t = getattr(self, piName)(t, model_copy.S_t, piInfo, T) # piInfo is theta value
            _, _, _ = model_copy.step(t, x_t)
        cumC = model_copy.cumC        
        return cumC

    def perform_grid_search(self, params, thetas):
        tS = time.time()
        cumCI_theta_I = {}
        bestTheta = None
        i = 0; print(f'... printing every 100th theta ...')
        for theta in thetas:
            if i%100 == 0: print(f'=== {theta=} ===')
            cumC = self.run_policy(theta, "X__BuyBelow", params)
            cumCI_theta_I[theta] = cumC
            best_theta = max(cumCI_theta_I, key=cumCI_theta_I.get)
            # print(f"Finishing theta {theta} with cumC {cumC:.2f}. Best theta so far {best_theta}. Best cumC {cumCI_theta_I[best_theta]:,}")
            i += 1
        print(f"Finishing GridSearch in {time.time() - tS:.2f} secs")
        print(f"Best theta: {best_theta}. Best cumC: {cumCI_theta_I[best_theta]:,}")
        return cumCI_theta_I, best_theta

    def run_policy_sample_paths(self, theta, piName, params):
        FhatIomega__lI = []
        for l in range(1, params['L'] + 1): #for each sample-path
            model_copy = copy(self.model)
            record_l = [piName, theta, l]
            T = params['T']
            for t in range(T): #for each transition/step
                x_t = getattr(self, piName)(t, model_copy.S_t, theta, T)
                _, _, _ = model_copy.step(t, x_t)
            FhatIomega__lI.append(model_copy.cumC) # just above (SDAM-eq2.9); Fhat for this sample-path is in model_copy.cumC
        return FhatIomega__lI

    def perform_grid_search_sample_paths(self, params, thetas):
        tS = time.time()
        Fhat_mean = None
        Fhat_var = None
        Fhat__meanI_th_I = {}
        Fhat__stdvI_th_I = {}
        Fhat_mean = None
        i = 0; print(f'... printing every 100th theta ...')
        for theta in thetas:
            if i%100 == 0: print(f'=== {theta=} ===')
            FhatIomega__lI = self.run_policy_sample_paths(theta, "X__BuyBelow", params)
            Fhat_mean = np.array(FhatIomega__lI).mean() #. (SDAM-eq2.9); call Fbar in future
            Fhat_var = np.sum(np.square(np.array(FhatIomega__lI) - Fhat_mean))/(params['L'] - 1)
            Fhat__meanI_th_I[theta] = Fhat_mean
            Fhat__stdvI_th_I[theta]= np.sqrt(Fhat_var/params['L'])
            best_theta = max(Fhat__meanI_th_I, key=Fhat__meanI_th_I.get)
            # print(f"Finishing theta {theta} with cumC {Fhat__meanI_th_I[best_theta]:,}. Best theta so far {best_theta}. Best cumC {Fhat__meanI_th_I[best_theta]:,}")
            i += 1
        print(f"Finishing GridSearch in {time.time() - tS:.2f} secs")
        print(f"Best theta: {best_theta}. Best cumC: {Fhat__meanI_th_I[best_theta]:,}")
        return Fhat__meanI_th_I, Fhat__stdvI_th_I, best_theta

    def grid_search_theta_values(self, thetas1): #. using vectors reduces loops in perform_grid_search_sample_paths()
        thetas = [(th1,) for th1 in thetas1]
        return thetas

    def plot_Fhat_map(self, FhatI_theta_I, thetasX, thetasY, labelX, labelY, title):
        Fhat_values = [FhatI_theta_I[(thetaX,thetaY)]  for thetaY in thetasY for thetaX in thetasX]
        Fhats = np.array(Fhat_values)
        increment_count = len(thetasX)
        Fhats = np.reshape(Fhats, (-1, increment_count))

        fig, ax = plt.subplots()
        im = ax.imshow(Fhats, cmap='hot', origin='lower', aspect='auto')
        # create colorbar
        cbar = ax.figure.colorbar(im, ax=ax)
        # cbar.ax.set_ylabel(cbarlabel, rotation=-90, va="bottom")
        # we want to show all ticks...
        ax.set_xticks(np.arange(0,len(thetasX), 5))
        ax.set_yticks(np.arange(0,len(thetasY), 5))
        # ... and label them with the respective list entries
        ax.set_xticklabels(thetasX[::5])
        ax.set_yticklabels(thetasY[::5])
        # rotate the tick labels and set their alignment.
        #plt.setp(ax.get_xticklabels(), rotation=45, ha="right",rotation_mode="anchor")

        ax.set_title(title)

        ax.set_xlabel(labelX)
        ax.set_ylabel(labelY)

        #fig.tight_layout()
        plt.show()
        return True

    def plot_Fhat_chart(self, FhatI_theta_I, thetasX, labelX, labelY, title, color_style):
        mpl.rcParams['lines.linewidth'] = 1.2
        xylabelsize = 18
        plt.figure(figsize=(25, 8))
        plt.title(title, fontsize=20)
        Fhats = FhatI_theta_I.values()
        plt.plot(thetasX, Fhats, color_style)
        plt.xlabel(labelX, rotation=0, ha='right', va='center', fontweight='bold', size=xylabelsize)
        plt.ylabel(labelY, rotation=0, ha='right', va='center', fontweight='bold', size=xylabelsize)
        plt.show()

4.6 Policy Evaluation

4.6.1 Training/Tuning

# UPDATE PARAMETERS
params.update({'Algorithm': 'GridSearch'}); pprint(f'{params=}')
params.update({'R__max': 57})
params.update({'R_0': 0})
params.update({'eta': None})

params.update({'theta_buy_min': 10}) #order level
params.update({'theta_buy_max': 50}) #order level
params.update({'theta_sell_min': None})
params.update({'theta_sell_max': None})

params.update({'L': 2}) #number of sample-paths
params.update({'T': 100_000}) #number of transitions/steps in each sample-path

# ADDITIONAL PARAMETERS
piNames = ['X__BuyBelow']
SVarNames = ['R_t', 'D_t']
S_0 = {
    'R_t': params['R_0'],
    'D_t': dem_sim.simulate(),
}
xVarNames = ['x_t_ELA']
possibleDecisions = None
p__buy = 19_300 #dollars
p__sell = 23_470 #dollars
exogParams = {} # we use simulation

params

("params={'Algorithm': 'GridSearch', 'T': 200, 'eta': None, 'R__max': 57, "
 "'R_0': 0, 'seed': 189654913, 'theta_sell_min': None, 'theta_sell_max': None, "
 "'theta_buy_min': 10, 'theta_buy_max': 50, 'theta_inc': 1, 'L': 4000}")

{'Algorithm': 'GridSearch',
 'T': 100000,
 'eta': None,
 'R__max': 57,
 'R_0': 0,
 'seed': 189654913,
 'theta_sell_min': None,
 'theta_sell_max': None,
 'theta_buy_min': 10,
 'theta_buy_max': 50,
 'theta_inc': 1,
 'L': 2}

4.6.1.1 With a few long sample-paths

# UPDATE PARAMETERS
params.update({'L': 2}) #number of sample-paths
params.update({'T': 400_000}) #number of transitions/steps in each sample-path
params

{'Algorithm': 'GridSearch',
 'T': 400000,
 'eta': None,
 'R__max': 57,
 'R_0': 0,
 'seed': 189654913,
 'theta_sell_min': None,
 'theta_sell_max': None,
 'theta_buy_min': 10,
 'theta_buy_max': 50,
 'theta_inc': 1,
 'L': 2}

# create a model and a policy
M = InventoryStorageModel(
    SVarNames, 
    xVarNames, 
    S_0, 
    params, 
    exogParams,
    possibleDecisions,
    p__buy,
    p__sell
)
P = InventoryStoragePolicy(M, piNames)

%%time
##########################################################################
#GridSearch #. SDAM-9.4.1
if params['Algorithm'] == 'GridSearch':
    thetasBuyELA = np.arange(params['theta_buy_min'], params['theta_buy_max'], params['theta_inc'])
    thetas = P.grid_search_theta_values(thetasBuyELA)  
    # cumCI_theta_I, thetaStar_few = \
    #   P.perform_grid_search(params, grid_search_theta_values[0])
    Fhat__meanI_th_I, Fhat__stdvI_th_I, thetaStar_few = \
      P.perform_grid_search_sample_paths(params, thetas)
##################################################################################

... printing every 100th theta ...
=== theta=(10,) ===
Finishing GridSearch in 327.01 secs
Best theta: (41,). Best cumC: 27,024,422,315.0
CPU times: user 5min 23s, sys: 671 ms, total: 5min 24s
Wall time: 5min 27s

P.plot_Fhat_chart(
    Fhat__meanI_th_I, 
    thetasBuyELA,
    r'$\theta$', 
    r'$\bar{F}$'+'\n$\hat{F}^{mean}$'+'\n$[\$]$', 
    f"Inventory Model: Cumulative Reward (mean)\n L = {params['L']}, T = {params['T']}, thetaStar = {thetaStar_few}",
    'ro-'
)
P.plot_Fhat_chart(
    Fhat__stdvI_th_I, 
    thetasBuyELA,
    r'$\theta$', 
    r'$\bar{F}$'+'\n$\hat{F}^{stdv}$'+'\n$[\$]$', 
    f"Inventory Model: Cumulative Reward (stdv)\n (L = {params['L']}, T = {params['T']}, thetaStar = {thetaStar_few}",
    'bo-'
)

4.6.1.2 With many shorter sample-paths

# UPDATE PARAMETERS
params.update({'L': 4_000})
params.update({'T': 200})

params

{'Algorithm': 'GridSearch',
 'T': 200,
 'eta': None,
 'R__max': 57,
 'R_0': 0,
 'seed': 189654913,
 'theta_sell_min': None,
 'theta_sell_max': None,
 'theta_buy_min': 10,
 'theta_buy_max': 50,
 'theta_inc': 1,
 'L': 4000}

# create a model and a policy
M = InventoryStorageModel(
    SVarNames, 
    xVarNames, 
    S_0, 
    params, 
    exogParams,
    possibleDecisions,
    p__buy,
    p__sell
)
P = InventoryStoragePolicy(M, piNames)

%%time
##########################################################################
#GridSearch #. SDAM-9.4.1
if params['Algorithm'] == 'GridSearch':
    thetasBuyELA = np.arange(params['theta_buy_min'], params['theta_buy_max'], params['theta_inc'])
    thetas = P.grid_search_theta_values(thetasBuyELA)
    # cumCI_theta_I, thetaStar_many = \
    #   P.perform_grid_search(params, thetas)
    Fhat__meanI_th_I, Fhat__stdvI_th_I, thetaStar_many = \
      P.perform_grid_search_sample_paths(params, thetas)
##################################################################################

... printing every 100th theta ...
=== theta=(10,) ===
Finishing GridSearch in 330.59 secs
Best theta: (37,). Best cumC: 13,149,770.4375
CPU times: user 5min 27s, sys: 705 ms, total: 5min 27s
Wall time: 5min 30s

P.plot_Fhat_chart(
    Fhat__meanI_th_I, 
    thetasBuyELA,
    r'$\theta$', 
    r'$\bar{F}$'+'\n$\hat{F}^{mean}$'+'\n$[\$]$', 
    f"Inventory Model: Cumulative Reward (mean)\n L = {params['L']}, T = {params['T']}, thetaStar = {thetaStar_many}",
    'ro-'
)
P.plot_Fhat_chart(
    Fhat__stdvI_th_I, 
    thetasBuyELA,
    r'$\theta$', 
    r'$\bar{F}$'+'\n$\hat{F}^{stdv}$'+'\n$[\$]$', 
    f"Inventory Model: Cumulative Reward (stdv)\n (L = {params['L']}, T = {params['T']}, thetaStar = {thetaStar_many}",
    'bo-'
)

4.6.2 Evaluation

M_evalu = InventoryStorageModel(
    SVarNames, 
    xVarNames, 
    S_0, 
    params, 
    exogParams,
    possibleDecisions,
    p__buy,
    p__sell
)
P_evalu = InventoryStoragePolicy(M_evalu, piNames)
params

{'Algorithm': 'GridSearch',
 'T': 200,
 'eta': None,
 'R__max': 57,
 'R_0': 0,
 'seed': 189654913,
 'theta_sell_min': None,
 'theta_sell_max': None,
 'theta_buy_min': 10,
 'theta_buy_max': 50,
 'theta_inc': 1,
 'L': 4000}

def run_policy_evalu(piInfo_evalu, piName_evalu, stop_time_evalu):
    model_copy = copy(M_evalu)
    record = []
    for t in range(stop_time_evalu):
        x_t = getattr(P_evalu, piName_evalu)(t, model_copy.S_t, piInfo_evalu, stop_time_evalu)        
        res = model_copy.step(t, x_t) # step the model forward one iteration
        record.append([res[0].R_t, res[0].D_t, res[1], res[2].x_t_ELA])
    cumC = model_copy.cumC    
    return cumC, record

# EVALUATION
piName_evalu = 'X__BuyBelow'
stop_time_evalu = 100 #180

4.6.2.1 Non-optimal policy

# theta_evalu_non=(10, None)
theta_evalu_non=(10,)
cumC, record = run_policy_evalu(theta_evalu_non, piName_evalu, stop_time_evalu)
labels = ["R_t", "D_t", "cumC", 'x_t_ELA']
print(f'{theta_evalu_non=}')
df_non = pd.DataFrame.from_records(data=record, columns=labels); df_non[:10]

theta_evalu_non=(10,)

	R_t	D_t	cumC	x_t_ELA
0	57	16	-630,700.0000	57
1	41	15	-255,180.0000	0
2	26	21	120,340.0000	0
3	5	16	237,690.0000	0
4	52	15	-390,390.0000	52
5	37	18	8,600.0000	0
6	19	17	384,120.0000	0
7	2	17	431,060.0000	0
8	55	16	-207,980.0000	55
9	39	15	144,070.0000	0

4.6.2.2 Optimal policy

from certifi.core import where
def plot_output(df1, df2, thetaStar):
  legendlabels = [r'$\mathrm{opt}$', r'$\mathrm{non}$']
  n_charts = 4
  ylabelsize = 16
  mpl.rcParams['lines.linewidth'] = 1.2
  fig, axs = plt.subplots(n_charts, sharex=True)
  # fig.set_figwidth(50); fig.set_figheight(10)
  fig.set_figwidth(13); fig.set_figheight(9)
  fig.suptitle(f'PERFORMANCE OF OPTIMIZED Buy-Below POLICY\nOptimal (magenta), Non-optimal (cyan), thetaStar = {thetaStar}', fontsize=20)

  i = 0 #x_t
  axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
  axs[i].step(df1['x_t_ELA'], 'm')
  axs[i].step(df2['x_t_ELA'], 'c')
  axs[i].axhline(y=0, color='k', linestyle=':')
  axs[i].set_ylabel('$x_{t,ELA}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

  i = 1 #D_t
  axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
  axs[i].step(df1['D_t'], 'k')
  axs[i].set_ylabel('$D_t$'+'\n'+'$\mathrm{[units]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

  i = 2 #R_t
  axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
  axs[i].step(df1['R_t'], 'm')
  axs[i].axhline(y=theta_evalu[0], color='m', linestyle=':')
  axs[i].text(0, theta_evalu[0], r'$\theta^{buy}$', size=16)

  axs[i].step(df2['R_t'], 'c')
  axs[i].axhline(y=theta_evalu_non[0], color='c', linestyle=':')
  axs[i].text(0, theta_evalu_non[0], r'$\theta^{buy}$', size=16)

  axs[i].set_ylabel('$R_t$'+'\n'+'$\mathrm{[units]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

  i = 3 #cumC
  axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
  axs[i].step(df1['cumC'], 'm')
  axs[i].step(df2['cumC'], 'c')
  axs[i].set_ylabel('$\mathrm{cumC}$'+'\n'+'$\mathrm{(Profit)}$'+'\n'+''+'$\mathrm{[\$]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
  axs[i].set_xlabel('$t\ \mathrm{[order\ windows]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

  fig.legend(labels=legendlabels, loc='center', fontsize=18)

4.6.2.2.1 Optimal $θ$ from training with a few long sample-paths

# theta_evalu = (43, None)
# theta_evalu = (42, None)
# theta_evalu = (38, None)
# theta_evalu = (42,)
theta_evalu = thetaStar_few
cumC, record = run_policy_evalu(theta_evalu, piName_evalu, stop_time_evalu)
labels = ["R_t", "D_t", "cumC", 'x_t_ELA']
print(f'{theta_evalu=}')
df = pd.DataFrame.from_records(data=record, columns=labels); df[:10]

theta_evalu=(41,)

	R_t	D_t	cumC	x_t_ELA
0	57	14	-748,050.0000	57
1	43	19	-419,470.0000	0
2	24	17	-67,420.0000	0
3	40	15	-352,270.0000	33
4	42	16	-257,910.0000	17
5	26	19	70,670.0000	0
6	38	14	-175,580.0000	31
7	43	16	-166,760.0000	19
8	27	17	161,820.0000	0
9	40	14	-135,540.0000	30

plot_output(df, df_non, thetaStar_few)

4.6.2.2.1 Optimal $θ$ from training with many shorter sample-paths

# theta_evalu = (39, None)
# theta_evalu = (40, None)
# theta_evalu = (38, None)
# theta_evalu = (37,)
theta_evalu = thetaStar_many
cumC, record = run_policy_evalu(theta_evalu, piName_evalu, stop_time_evalu)
labels = ["R_t", "D_t", "cumC", 'x_t_ELA']
print(f'{theta_evalu=}')
df = pd.DataFrame.from_records(data=record, columns=labels); df[:10]

theta_evalu=(37,)

	R_t	D_t	cumC	x_t_ELA
0	57	17	-630,700.0000	57
1	40	19	-208,240.0000	0
2	21	22	237,690.0000	0
3	36	16	59,230.0000	36
4	41	16	193,740.0000	21
5	25	21	710,080.0000	0
6	36	20	561,880.0000	32
7	37	19	602,510.0000	21
8	38	15	568,560.0000	20
9	23	18	967,550.0000	0

plot_output(df, df_non, thetaStar_many)