Dealership Inventory Management using the Powell Unified Framework (Part 2)

Back to Portfolio of Projects | LearnableLoopAI.com | Blog |

0 INTRODUCTION

In part 1 we had Mr. Optimal manage the number of Elantras on his dealership space. In this project we expand his reponsibility to also manage the number of Sonatas.

The overall structure of this project and report follows the traditional CRISP-DM format. However, instead of the CRISP-DM’S “4 Modeling” section, we inserted the “6 step modeling process” of Dr. Warren Powell in section 4 of this document. Dr Powell’s unified framework shows great promise for unifying the formalisms of at least a dozen different fields. Using his framework enables easier access to thinking patterns in these other fields that might be beneficial and informative to the sequential decision problem at hand. Traditionally, this kind of problem would be approached from the reinforcement learning perspective. However, using Dr. Powell’s wider and more comprehensive perspective almost certainly provides additional value.

Here is information on Dr. Powell’s perspective on Sequential Decision Analytics.

In order to make a strong mapping between the code in this notebook and the mathematics in the Powell Unified Framework (PUF), we follow the following convention for naming Python identifier names:

Superscripts
- variable names have a double underscore to indicate a superscript
- $X^{π}$ : has code X__pi, is read X pi
Subscripts
- variable names have a single underscore to indicate a subscript
- $S_{t}$ : has code S_t, is read ‘S at t’
- $M_{t}^{S p e n d}$ has code M__Spend_t which is read: “MSpend at t”
Arguments
- collection variable names may have argument information added
- $X^{π} (S_{t})$ : has code X__piIS_tI, is read ‘X pi in S at t’
- the surrounding I’s are used to imitate the parentheses around the argument
Next time/iteration
- variable names that indicate one step in the future are quite common
- $R_{t + 1}$ : has code R_tt1, is read ‘R at t+1’
- $R^{n + 1}$ : has code R__nt1, is read ‘R at n+1’
Rewards
- State-independent terminal reward and cumulative reward
  - $F$ : has code F for terminal reward
  - $\sum_{n} F$ : has code cumF for cumulative reward
- State-dependent terminal reward and cumulative reward
  - $C$ : has code C for terminal reward
  - $\sum_{t} C$ : has code cumC for cumulative reward
Vectors where components use different names
- $S_{t} (R_{t}, p_{t})$ : has code S_t.R_t and S_t.p_t, is read ‘S at t in R at t, and, S at t in p at t’
- the code implementation is by means of a named tuple
  - self.State = namedtuple('State', SVarNames) for the ‘class’ of the vector
  - self.S_t for the ‘instance’ of the vector
Vectors where components reuse names
- $x_{t} (x_{t, G B}, x_{t, B L})$ : has code x_t.x_t_GB and x_t.x_t_BL, is read ‘x at t in x at t for GB, and, x at t in x at t for BL’
- the code implementation is by means of a named tuple
  - self.Decision = namedtuple('Decision', xVarNames) for the ‘class’ of the vector
  - self.x_t for the ‘instance’ of the vector
Use of mixed-case variable names
- to reduce confusion, sometimes the use of mixed-case variable names are preferred (even though it is not a best practice in the Python community), reserving the use of underscores and double underscores for math-related variables

1 BUSINESS UNDERSTANDING

Inventory management is a critical component of any business, whether it be a small retail store or a multinational corporation. At its core, inventory management is the process of tracking and controlling a company’s inventory, from raw materials to finished products. Proper inventory management is important for several reasons.

First and foremost, inventory management helps businesses avoid stock overages and underages (overstocks and stockouts). By tracking inventory levels and forecasting demand, businesses can ensure that they always have the right amount of product on hand to meet customer needs without overbuying and tying up capital in excess inventory. This helps businesses maintain a healthy cash flow and avoid costly stockouts that can result in lost sales and dissatisfied customers.

In addition, effective inventory management can help businesses streamline their operations and improve their overall efficiency. By reducing excess inventory and optimizing order quantities and lead times, businesses can minimize waste and improve their supply chain management. This can lead to cost savings, improved profitability, and increased customer satisfaction.

Finally, inventory management is critical for businesses that need to comply with regulatory requirements, such as those in the pharmaceutical or food industries. Proper inventory tracking and documentation can help businesses meet these requirements and avoid costly fines and penalties.

Overall, inventory management is an essential function for any business that wants to operate efficiently, meet customer demand, and maximize profitability. Effective inventory management requires careful planning, accurate data, and the right tools and processes to ensure that businesses always have the right amount of product on hand, at the right time, and at the right cost.

In this project the client had a need to be convinced of the benefits of formal optimized sequential decision making. This was provided in the form of a series of POCs.

2 DATA UNDERSTANDING

Next, we look at how we will simulate the data for this problem.

We will simulate the inventory demand for Elantras, $D^{E L A}$ , by: $D_{t + 1}^{E L A} = ϵ_{p o i s (17)}$

Similarly,

the inventory demand for Sonatas, $D^{S O N}$ , is given by: $D_{t + 1}^{S O N} = ϵ_{p o i s (9)}$

The order window is 1 month and these simulations are for the monthly demands.

from collections import namedtuple, defaultdict
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from copy import copy
import time
from scipy.ndimage.interpolation import shift
import pickle
from bisect import bisect
import math
from pprint import pprint
import matplotlib as mpl
pd.options.display.float_format = '{:,.4f}'.format
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)
! python --version

Python 3.10.11

DeprecationWarning: Please use `shift` from the `scipy.ndimage` namespace, the `scipy.ndimage.interpolation` namespace is deprecated.
  from scipy.ndimage.interpolation import shift

class DemandSimulator():
    def __init__(self):
        pass

    def simulate(self):
        D_tt1_ELA = np.random.poisson(17)
        D_tt1_SON = np.random.poisson(8)
        return (D_tt1_ELA, D_tt1_SON)

dem_sim = DemandSimulator()
DemandData = []
for i in range(100):
  d_ELA, d_SON = dem_sim.simulate()
  DemandData.append([d_ELA, d_SON])

labels = ['ELA_demand', 'SON_demand']
df = pd.DataFrame.from_records(data=DemandData, columns=labels); df[:10]

	ELA_demand	SON_demand
0	14	9
1	21	9
2	22	15
3	29	12
4	9	11
5	14	13
6	11	8
7	14	10
8	13	10
9	18	8

def plot_output(df1, df2):
  n_charts = 2
  ylabelsize = 16
  mpl.rcParams['lines.linewidth'] = 1.2
  fig, axs = plt.subplots(n_charts, sharex=True)
  fig.set_figwidth(13); fig.set_figheight(9)
  fig.suptitle('Demand Simulation', fontsize=20)

  i = 0 #ELA Demand
  axs[i].set_title('Demanded Elantras')
  axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
  axs[i].step(df1['ELA_demand'], 'g')
  axs[i].axhline(y=17, color='g', linestyle=':')

  i = 1 #SON Demand
  axs[i].set_title('Demanded Sonatas')
  axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
  axs[i].step(df1['SON_demand'], 'b')
  axs[i].axhline(y=8, color='b', linestyle=':')
  axs[i].set_xlabel('$t\ \mathrm{[monthly\ order\ windows]}$', rotation=0, ha='center', va='center', fontweight='bold', size=ylabelsize)
plot_output(df, None)

seed = 189654913
file = 'Parameters.xlsx'

# NOTE:
# R__max: maximum number of inventory units
# R_0:    initial number of inventory units
parDf = pd.read_excel(f'{base_dir}/{file}', sheet_name='ParamsModel', index_col=0); print(f'{parDf}')
parDict = parDf.T.to_dict('list') #.
params = {key:v for key, value in parDict.items() for v in value}
params['seed'] = seed
params['T'] = min(params['T'], 192); print(f'{params=}')

                    0
Index                
Algorithm  GridSearch
T                 195
eta                 1
R__max             57
R_0                 0
params={'Algorithm': 'GridSearch', 'T': 192, 'eta': 1, 'R__max': 57, 'R_0': 0, 'seed': 189654913}

parDf = pd.read_excel(f'{base_dir}/{file}', sheet_name='GridSearch', index_col=0); print(parDf)
parDict = parDf.T.to_dict('list')
paramsPolicy = {key:v for key, value in parDict.items() for v in value}; print(f'{paramsPolicy=}')
params.update(paramsPolicy); pprint(f'{params=}')

                  0
Index              
theta_sell_min   10
theta_sell_max  100
theta_buy_min    10
theta_buy_max   100
theta_inc         1
paramsPolicy={'theta_sell_min': 10, 'theta_sell_max': 100, 'theta_buy_min': 10, 'theta_buy_max': 100, 'theta_inc': 1}
("params={'Algorithm': 'GridSearch', 'T': 192, 'eta': 1, 'R__max': 57, 'R_0': "
 "0, 'seed': 189654913, 'theta_sell_min': 10, 'theta_sell_max': 100, "
 "'theta_buy_min': 10, 'theta_buy_max': 100, 'theta_inc': 1}")

pprint(f"{params=}")

("params={'Algorithm': 'GridSearch', 'T': 192, 'eta': 1, 'R__max': 57, 'R_0': "
 "0, 'seed': 189654913, 'theta_sell_min': 10, 'theta_sell_max': 100, "
 "'theta_buy_min': 10, 'theta_buy_max': 100, 'theta_inc': 1}")

3 DATA PREPARATION

We will use the data provided by the simulator directly. There is no need to perform additional data preparation.

4 MODELING

4.1 Narrative

As pointed out in the introduction, this second project in the Inventory Series expands the problem in part 1 to also handle the inventory management of Sonatas in addition to the Elantras.

To remind the reader, we have the following setting: Mr. Optimal is car lot inventory manager for the largest dealership in a big city. He is responsible to manage the inventory levels of the two mentioned Hyundai models. He has a maximum number of lot spaces assigned to him (which is 57). Mr. Optimal decided to reserve a maximum of 40 spaces for the Elantras. The remaining 17 spaces will be used for Sonata. He has a choice to strive to always keep these spaces occupied by new cars. This way he is unlikely to run out of stock and lose a sale due to that. However, capital is tied up by the unsold inventory in his lot space.

At the other extreme, he may choose to work on a just-in-time principle: Each time a potential customer expresses interest in a model, the customer will have to wait until he obtains a new car from the supplier. Of course, he will likely lose the sale, but the upside is that no capital is tied up in his inventory.

It seems intuitive that the optimal levels of inventory will be somewhere between these extremes. The challenge is to find that optimal levels. For now, we will assume that the buy and sell prices will remain constant. The only random variables will be the demands for these models. Another assumption is that ordered inventory will arrive immediately. In addition, unsatisfied demands are lost, i.e. there will be no ability to backlog unsatisfied demands.

4.2 Core Elements

This section attempts to answer three important questions: - What metrics are we going to track? - What decisions do we intend to make? - What are the sources of uncertainty?

For this problem, the only metric we are interested in is the amount of profit we make after each decision window. A single type of decision needs to be made at the start of each window - how many new cars to order of each model. The only source of uncertainty are the levels of demand for the models.

4.3 Mathematical Model | SUM Design

A Python class is used to implement the model for the SUM (System Under Management):

class InventoryStorageModel():
    def __init__(
        self, SVarNames, xVarNames, S_0, params, exogParams, possibleDecisions,
        p__buyELA, p__sellELA, p__buySON, p__sellSON, W_fn=None, S__M_fn=None, C_fn=None):
        ...
        ...

4.3.1 State variables

The state variables represent what we need to know.

$R_{t} = (R_{t, E L A}, R_{t, S O N})$
- the inventory on hand at time $t$ before we make a new ordering decision, and before we have satisfied any demands arising in time interval $t$
- measured in inventory units
$D_{t} = (D_{t, E L A}, D_{t, S O N})$
- the demand
- measured in inventory units

The state is:

$S_{t} = (R_{t}, D_{t}) = ((R_{t, E L A}, R_{t, S O N}), (D_{t, E L A}, D_{t, S O N}))$

The state variables are represented by the following variables in the InventoryStorageModel class:

        self.SVarNames = SVarNames
        self.State = namedtuple('State', SVarNames) # 'class'
        self.S_t = self.build_state(self.S_0) # 'instance'

where

SVarNames = ['R_t_ELA', 'R_t_SON', 'D_t_ELA', 'D_t_SON']

4.3.2 Decision variables

The decision variables represent what we control.

$x_{t} = (x_{t, E L A}, x_{t, S O N})$
- number of Elantras and Sonatas ordered ( $x_{t} \geq 0$ ) where $x_{t}$ is a positive integer
Constraints
- $x_{t, E L A} \leq (R^{m a x E L A} - R_{t, E L A})$
- $x_{t, S O N} \leq (R^{m a x S O N} - R_{t, S O N})$
- $R^{m a x}$ is the number of lot units (i.e. parking spaces) assigned to Mr. Optimal
- $R^{m a x} = R^{m a x E L A} + R^{m a x S O N}$ [this means $57 = 40 + 17$ ]
Decisions are made with a policy (TBD below):
- $X^{π} (S_{t})$

The decision variables are represented by the following variables in the InventoryStorageModel class:

self.Decision = namedtuple('Decision', xVarNames) # 'class'

where

xVarNames = ['x_t_ELA', 'x_t_SON']

4.3.3 Exogenous information variables

The exogenous information variables represent what we did not know (when we made a decision). These are the variables that we cannot control directly. The information in these variables become available after we make the decision $x_{t}$ .

We assume that any unsatisfied demand is lost. Additionally, we assume that the demand in each time period is revealed, so that we have:

$W_{t + 1} = {\hat{D}}_{t + 1} = D_{t + 1}$

The exogenous information is obtained by a call to

DemandSimulator.simulate(...)

The latest exogenous information can be accessed by calling the following method from class InventoryStorageModel():

def W_fn(self, t):
    W_tt1_ELA = dem_sim.simulate_ELA_demand()
    W_tt1_SON = dem_sim.simulate_SON_demand()
    return (W_tt1_ELA, W_tt1_SON)

4.3.4 Transition function

The transition function describe how the state variables evolve over time. Because we currently have two state variables in the state, $S_{t} = (R_{t}, D_{t})$ , we have the equations:

$\begin{aligned} R_{t + 1} & = (R_{t, E L A} - \min {R_{t, E L A}, D_{t, E L A}} + x_{t, E L A}, R_{t, S O N} - \min {R_{t, S O N}, D_{t, S O N}} + x_{t, S O N}) (E q . 1) \\ D_{t + 1} & = ({\hat{D}}_{t + 1, E L A}, {\hat{D}}_{t + 1, S O N}) (E q . 2) \end{aligned}$

Collectively, they represent the general transition function:

$S_{t + 1} = S^{M} (S_{t}, X^{π} (S_{t}))$ The transition function is implemented by the following method in class InventoryStorageModel():

def S__M_fn(self, t, x_t):
    R_tt1_ELA = max( 0, self.S_t.R_t_ELA - min(self.S_t.R_t_ELA, self.S_t.D_t_ELA) + x_t.x_t_ELA ) #max to keep >0
    R_tt1_SON = max( 0, self.S_t.R_t_SON - min(self.S_t.R_t_SON, self.S_t.D_t_SON) + x_t.x_t_SON ) #max to keep >0

    D_tt1_ELA, D_tt1_SON = self.W_fn(t) 

    S_tt1 = self.build_state({'R_t_ELA':R_tt1_ELA,'R_t_SON':R_tt1_SON, 'D_t_ELA':D_tt1_ELA,'D_t_SON':D_tt1_SON})
    return S_tt1

4.3.5 Objective function

The objective function captures the performance metrics of the solution to the problem.

We can write the state-dependant reward (also called contribution) based on what we will receive between $t - 1$ and $t$ (i.e. looking backward relative to $(S_{t}, x_{t})$ ):

$\begin{array}{r} C (S_{t}, x_{t}) = p^{s e l l E L A} min {R_{t, E L A}, D_{t, E L A}} - p^{b u y E L A} x_{t, E L A} + p^{s e l l S O N} min {R_{t, S O N}, D_{t, S O N}} - p^{b u y S O N} x_{t, S O N} \end{array}$ This is a deterministic expression.

Alternatively, we can write the state-dependant reward based on what we will receive between $t$ and $t + 1$ (i.e. looking forward relative to $(S_{t}, x_{t})$ ):

$\begin{aligned} C (S_{t}, x_{t}, {\hat{D}}_{t + 1}) & = p^{s e l l E L A} min {R_{t + 1, E L A}, D_{t + 1, E L A}} - p^{b u y E L A} x_{t, E L A} + p^{s e l l S O N} min {R_{t + 1, S O N}, D_{t + 1, S O N}} - p^{b u y S O N} x_{t, S O N} \\ = p^{s e l l E L A} \min {(R_{t, E L A} - \min {R_{t, E L A}, D_{t, E L A}} + x_{t, E L A}), {\hat{D}}_{t + 1, E L A}} - p^{b u y E L A} x_{t, E L A} + p^{s e l l S O N} \min {(R_{t, S O N} - \min {R_{t, S O N}, D_{t, S O N}} + x_{t, S O N}), {\hat{D}}_{t + 1, S O N}} - p^{b u y S O N} x_{t, S O N} \end{aligned}$

because, from (Eq. 1) and (Eq. 2) above:

$\begin{aligned} R_{t + 1} & = (R_{t, E L A} - \min {R_{t, E L A}, D_{t, E L A}} + x_{t, E L A}, R_{t, E L A} - \min {R_{t, E L A}, D_{t, E L A}} + x_{t, E L A}) (E q . 1) \\ D_{t + 1} & = ({\hat{D}}_{t + 1, E L A}, {\hat{D}}_{t + 1, S O N}) (E q . 2) \end{aligned}$

This is a stochastic expression due to the dependence on the random variable ${\hat{D}}_{t + 1}$ . It is random because it comes from a stochastic process but it is also in the future.

This second form leads to the objective function:

$max_{π} E {\sum_{t = 0}^{T} C (S_{t}, x_{t}, W_{t + 1})}$

The contribution (reward) function is implemented by the following method in class InventoryStorageModel():

def C_fn(self, x_t):
    Dhat_tt1_ELA, Dhat_tt1_SON = dem_sim.simulate()

    C = \
    self.p__sellELA*min((self.S_t.R_t_ELA - min(self.S_t.R_t_ELA, self.S_t.D_t_ELA) + x_t.x_t_ELA), Dhat_tt1_ELA) - self.p__buyELA*x_t.x_t_ELA +\
    self.p__sellSON*min((self.S_t.R_t_SON - min(self.S_t.R_t_SON, self.S_t.D_t_SON) + x_t.x_t_SON), Dhat_tt1_SON) - self.p__buySON*x_t.x_t_SON
    return C

4.3.6 Implementation of SUM Model

Here is the complete implementation of the InventoryStorageModel class:

class InventoryStorageModel():
    def __init__(
        self, SVarNames, xVarNames, S_0, params, exogParams, possibleDecisions,
        p__buyELA, p__sellELA, p__buySON, p__sellSON, W_fn=None, S__M_fn=None, C_fn=None):
        self.initArgs = params
        self.prng = np.random.RandomState(params['seed'])
        self.exogParams = exogParams
        self.S_0 = S_0
        self.SVarNames = SVarNames
        self.xVarNames = xVarNames
        self.possibleDecisions = possibleDecisions
        self.p__buyELA = p__buyELA
        self.p__sellELA = p__sellELA
        self.p__buySON = p__buySON
        self.p__sellSON = p__sellSON        
        self.State = namedtuple('State', SVarNames) #. 'class'
        self.S_t = self.build_state(self.S_0) #. 'instance'
        self.Decision = namedtuple('Decision', xVarNames) #. 'class'
        self.cumC = 0.0 #. cumulative reward; use F or cumF for final (i.e. no cumulative) reward

    def reset(self):
        self.cumC = 0.0
        self.S_t = self.build_state(self.S_0)

    def build_state(self, info):
        return self.State(*[info[sn] for sn in self.SVarNames])

    def build_decision(self, info):
        return self.Decision(*[info[xn] for xn in self.xVarNames])

    def W_fn(self, t):
        W_tt1_ELA, W_tt1_SON = dem_sim.simulate()
        return (W_tt1_ELA, W_tt1_SON)

    def S__M_fn(self, t, x_t):
        R_tt1_ELA = max( 0, self.S_t.R_t_ELA - min(self.S_t.R_t_ELA, self.S_t.D_t_ELA) + x_t.x_t_ELA ) #max to keep >0
        R_tt1_SON = max( 0, self.S_t.R_t_SON - min(self.S_t.R_t_SON, self.S_t.D_t_SON) + x_t.x_t_SON ) #max to keep >0

        D_tt1_ELA, D_tt1_SON = self.W_fn(t) 

        S_tt1 = self.build_state({'R_t_ELA':R_tt1_ELA,'R_t_SON':R_tt1_SON, 'D_t_ELA':D_tt1_ELA,'D_t_SON':D_tt1_SON})
        return S_tt1

    # based on what we will receive between t and t+1 (i.e. looking *forward* relative to (S_t,x_t) #.
    # RLSO-Eq8.5
    def C_fn(self, x_t):
        Dhat_tt1_ELA, Dhat_tt1_SON = dem_sim.simulate()
        C = \
        self.p__sellELA*min((self.S_t.R_t_ELA - min(self.S_t.R_t_ELA, self.S_t.D_t_ELA) + x_t.x_t_ELA), Dhat_tt1_ELA) - self.p__buyELA*x_t.x_t_ELA +\
        self.p__sellSON*min((self.S_t.R_t_SON - min(self.S_t.R_t_SON, self.S_t.D_t_SON) + x_t.x_t_SON), Dhat_tt1_SON) - self.p__buySON*x_t.x_t_SON
        return C

    def step(self, t, x_t):
        self.cumC += self.C_fn(x_t)
        self.S_t = self.S__M_fn(t, x_t)
        return (self.S_t, self.cumC, x_t) #. for plotting

4.4 Uncertainty Model

We will simulate the inventory demand vector $D_{t + 1} = (D_{t + 1, E L A}, D_{t + 1, S O N})$ as described in section 2.

4.5 Policy Design

There are two main meta-classes of policy design. Each of these has two subclasses: - Policy Search - Policy Function Approximations (PFAs) - Cost Function Approximations (CFAs) - Lookahead - Value Function Approximations (VFAs) - Direct Lookaheads (DLAs)

In this project we will only use one approach: - A simple buy below parameterized policy (from the PFA class)

The buy below policy is implemented by the following method in class InventoryStoragePolicy():

def X__BuyBelow(self, t, S_t, theta, T):
    info = {
        'x_t_ELA': 0, #number of Elantras ordered
        'x_t_SON': 0, #number of Sonatas ordered
    }
    if t >= T:
        print(f"ERROR: t={t} should not reach or exceed the max steps ({T})")
        return self.model.build_decision(info)
    theta__buy_ELA = theta[0]
    if S_t.R_t_ELA <= theta__buy_ELA: # BUY if R_t_ELA <= theta__buy_ELA
        info['x_t_ELA'] = self.model.initArgs['R__maxELA'] - S_t.R_t_ELA
    theta__buy_SON = theta[1]
    if S_t.R_t_SON <= theta__buy_SON: # BUY if R_t_SON <= theta__buy_SON
        info['x_t_SON'] = self.model.initArgs['R__maxSON'] - S_t.R_t_SON
    return self.model.build_decision(info)

4.5.1 Implementation of Policy Design

The InventoryStoragePolicy() class implements the policy design.

import random
class InventoryStoragePolicy():
    def __init__(self, model, piNames):
        self.model = model
        self.piNames = piNames
        self.Policy = namedtuple('Policy', piNames)

    def X__BuyBelow(self, t, S_t, theta, T): #theta is a vector
        info = {
            'x_t_ELA': 0, #number of Elantras to order
            'x_t_SON': 0, #number of Sonatas to order
        }
        if t >= T:
            print(f"ERROR: t={t} should not reach or exceed the max steps ({T})")
            return self.model.build_decision(info)
        theta__buy_ELA = theta[0]
        if S_t.R_t_ELA <= theta__buy_ELA: # BUY if R_t_ELA <= theta__buy_ELA
            info['x_t_ELA'] = self.model.initArgs['R__maxELA'] - S_t.R_t_ELA
        theta__buy_SON = theta[1]
        if S_t.R_t_SON <= theta__buy_SON: # BUY if R_t_SON <= theta__buy_SON
            info['x_t_SON'] = self.model.initArgs['R__maxSON'] - S_t.R_t_SON
        return self.model.build_decision(info)

    def run_policy(self, piInfo, piName, params):
        model_copy = copy(self.model)
        T = params['T']
        for t in range(T): #for each transition/step
            x_t = getattr(self, piName)(t, model_copy.S_t, piInfo, T) # piInfo is theta value
            _, _, _ = model_copy.step(t, x_t)
        cumC = model_copy.cumC        
        return cumC

    def perform_grid_search(self, params, thetas):
        tS = time.time()
        cumCI_theta_I = {}
        bestTheta = None
        i = 0; print(f'... printing every 100th theta ...')
        for theta in thetas:
            if i%100 == 0: print(f'=== {theta=} ===')
            cumC = self.run_policy(theta, "X__BuyBelow", params)
            cumCI_theta_I[theta] = cumC
            best_theta = max(cumCI_theta_I, key=cumCI_theta_I.get)
            # print(f"Finishing theta {theta} with cumC {cumC:,}. Best theta so far {best_theta}. Best cumC {cumCI_theta_I[best_theta]:,}")
            i += 1
        print(f"Finishing GridSearch in {time.time() - tS:.2f} secs")
        print(f"Best theta: {best_theta}. Best cumC: {cumCI_theta_I[best_theta]:,}")
        return cumCI_theta_I, best_theta

    def run_policy_sample_paths(self, theta, piName, params): #theta could be a vector
        FhatIomega__lI = []
        for l in range(1, params['L'] + 1): #for each sample-path
            model_copy = copy(self.model)
            record_l = [piName, theta, l]
            T = params['T']
            for t in range(T): #for each transition/step
                x_t = getattr(self, piName)(t, model_copy.S_t, theta, T)
                _, _, _ = model_copy.step(t, x_t)
            FhatIomega__lI.append(model_copy.cumC) # just above (SDAM-eq2.9); Fhat for this sample-path is in model_copy.cumC
        return FhatIomega__lI

    def perform_grid_search_sample_paths(self, params, thetas):
        tS = time.time()
        Fhat_mean = None
        Fhat_var = None
        Fhat__meanI_th_I = {}
        Fhat__stdvI_th_I = {}
        Fhat_mean = None
        i = 0; print(f'... printing every 100th theta ...')
        for theta in thetas:
            if i%100 == 0: print(f'=== {theta=} ===')
            FhatIomega__lI = self.run_policy_sample_paths(theta, "X__BuyBelow", params)
            Fhat_mean = np.array(FhatIomega__lI).mean() #. (SDAM-eq2.9); call Fbar in future
            Fhat_var = np.sum(np.square(np.array(FhatIomega__lI) - Fhat_mean))/(params['L'] - 1)
            Fhat__meanI_th_I[theta] = Fhat_mean
            Fhat__stdvI_th_I[theta]= np.sqrt(Fhat_var/params['L'])
            best_theta = max(Fhat__meanI_th_I, key=Fhat__meanI_th_I.get)
            # print(f"Finishing theta {theta} with cumC {Fhat__meanI_th_I[best_theta]:,}. Best theta so far {best_theta}. Best cumC {Fhat__meanI_th_I[best_theta]:,}")
            i += 1
        print(f"Finishing GridSearch in {time.time() - tS:.2f} secs")
        print(f"Best theta: {best_theta}. Best cumC: {Fhat__meanI_th_I[best_theta]:,}")
        return Fhat__meanI_th_I, Fhat__stdvI_th_I, best_theta

    def grid_search_theta_values(self, thetas1): #. using vectors reduces loops in perform_grid_search_sample_paths()
        thetas = [(th1,) for th1 in thetas1]
        return thetas

    def grid_search_theta_values(self, thetas1, thetas2): #. using vectors reduces loops in perform_grid_search_sample_paths()
        thetas = [(th1, th2) for th1 in thetas1 for th2 in thetas2]
        return thetas

    def plot_Fhat_map(self, FhatI_theta_I, thetasX, thetasY, labelX, labelY, title):
        Fhat_values = [FhatI_theta_I[(thetaX,thetaY)] for thetaY in thetasY for thetaX in thetasX]
        Fhats = np.array(Fhat_values)
        increment_count = len(thetasX)
        Fhats = np.reshape(Fhats, (-1, increment_count))

        fig, ax = plt.subplots()
        im = ax.imshow(Fhats, cmap='hot', origin='lower', aspect='auto')
        # create colorbar
        cbar = ax.figure.colorbar(im, ax=ax)
        # cbar.ax.set_ylabel(cbarlabel, rotation=-90, va="bottom")
        # we want to show all ticks...
        ax.set_xticks(np.arange(0,len(thetasX), 5))
        ax.set_yticks(np.arange(0,len(thetasY), 5))
        # ... and label them with the respective list entries
        ax.set_xticklabels(thetasX[::5])
        ax.set_yticklabels(thetasY[::5])
        # rotate the tick labels and set their alignment.
        #plt.setp(ax.get_xticklabels(), rotation=45, ha="right",rotation_mode="anchor")

        ax.set_title(title, fontsize=16)

        ax.set_xlabel(labelX)
        ax.set_ylabel(labelY)

        #fig.tight_layout()
        plt.show()
        return True

    def plot_Fhat_chart(self, FhatI_theta_I, thetasX, labelX, labelY, title, color_style):
        mpl.rcParams['lines.linewidth'] = 1.2
        xylabelsize = 18
        plt.figure(figsize=(25, 8))
        plt.title(title, fontsize=20)
        Fhats = FhatI_theta_I.values()
        plt.plot(thetasX, Fhats, color_style)
        plt.xlabel(labelX, rotation=0, ha='right', va='center', fontweight='bold', size=xylabelsize)
        plt.ylabel(labelY, rotation=0, ha='right', va='center', fontweight='bold', size=xylabelsize)
        plt.show()

4.6 Policy Evaluation

4.6.1 Training/Tuning

# UPDATE PARAMETERS
params.update({'Algorithm': 'GridSearch'}); pprint(f'{params=}')
# params.update({'R__max': 57})
params.update({'R__maxELA': 40})
params.update({'R__maxSON': 17})
# params.update({'R_0': 0})
params.update({'R_0': (0,0)})
params.update({'eta': None})

# params.update({'theta_buy_min': 10}) #order level
# params.update({'theta_buy_max': 50}) #order level
params.update({'theta_sell_min': None})
params.update({'theta_sell_max': None})

# params.update({'L': 2}) #number of sample-paths
# params.update({'T': 100_000}) #number of transitions/steps in each sample-path

# ADDITIONAL PARAMETERS
params.update({'theta_buy_min_ELA': 1}) #order level
params.update({'theta_buy_max_ELA': 40}) #order level

params.update({'theta_buy_min_SON': 1}) #order level
params.update({'theta_buy_max_SON': 17}) #order level

piNames = ['X__BuyBelow']
SVarNames = ['R_t_ELA', 'R_t_SON', 'D_t_ELA', 'D_t_SON'] #expand each scalar into a vector
dem_ELA, dem_SON = dem_sim.simulate()
S_0 = {
    'R_t_ELA': params['R_0'][0],
    'R_t_SON': params['R_0'][1],
    'D_t_ELA': dem_ELA,
    'D_t_SON': dem_SON,
}
xVarNames = ['x_t_ELA', 'x_t_SON'] #expand each scalar into a vector
possibleDecisions = None
# p__buy = 19_300 #dollars
# p__sell = 23_470 #dollars
p__buyELA = 19_300 #dollars
p__sellELA = 23_470 #dollars
p__buySON = 22_100 #dollars
p__sellSON = 27_250 #dollars
exogParams = {} # we use simulation

params

("params={'Algorithm': 'GridSearch', 'T': 192, 'eta': 1, 'R__max': 57, 'R_0': "
 "0, 'seed': 189654913, 'theta_sell_min': 10, 'theta_sell_max': 100, "
 "'theta_buy_min': 10, 'theta_buy_max': 100, 'theta_inc': 1}")

{'Algorithm': 'GridSearch',
 'T': 192,
 'eta': None,
 'R__max': 57,
 'R_0': (0, 0),
 'seed': 189654913,
 'theta_sell_min': None,
 'theta_sell_max': None,
 'theta_buy_min': 10,
 'theta_buy_max': 100,
 'theta_inc': 1,
 'R__maxELA': 40,
 'R__maxSON': 17,
 'theta_buy_min_ELA': 1,
 'theta_buy_max_ELA': 40,
 'theta_buy_min_SON': 1,
 'theta_buy_max_SON': 17}

4.6.1.1 With a few long sample-paths

# UPDATE PARAMETERS
# params.update({'L': 2}) #number of sample-paths
# params.update({'T': 100_000}) #number of transitions/steps in each sample-path
params.update({'L': 2}) #number of sample-paths
params.update({'T': 10_000}) #number of transitions/steps in each sample-path
params

{'Algorithm': 'GridSearch',
 'T': 10000,
 'eta': None,
 'R__max': 57,
 'R_0': (0, 0),
 'seed': 189654913,
 'theta_sell_min': None,
 'theta_sell_max': None,
 'theta_buy_min': 10,
 'theta_buy_max': 100,
 'theta_inc': 1,
 'R__maxELA': 40,
 'R__maxSON': 17,
 'theta_buy_min_ELA': 1,
 'theta_buy_max_ELA': 40,
 'theta_buy_min_SON': 1,
 'theta_buy_max_SON': 17,
 'L': 2}

SVarNames

['R_t_ELA', 'R_t_SON', 'D_t_ELA', 'D_t_SON']

# create a model and a policy
M = InventoryStorageModel(
    SVarNames, 
    xVarNames, 
    S_0, 
    params, 
    exogParams,
    possibleDecisions,
    p__buyELA,
    p__sellELA,
    p__buySON,
    p__sellSON,
)
P = InventoryStoragePolicy(M, piNames)

%%time
##########################################################################
#GridSearch #. SDAM-9.4.1
if params['Algorithm'] == 'GridSearch':
    thetasBuyELA = np.arange(params['theta_buy_min_ELA'], params['theta_buy_max_ELA'], params['theta_inc'])
    thetasBuySON = np.arange(params['theta_buy_min_SON'], params['theta_buy_max_SON'], params['theta_inc'])
    thetas = P.grid_search_theta_values(thetasBuyELA, thetasBuySON)
    # cumCI_theta_I, thetaStar_few = \
    #   P.perform_grid_search(params, thetas)
    Fhat__meanI_th_I, Fhat__stdvI_th_I, thetaStar_few = \
      P.perform_grid_search_sample_paths(params, thetas)
##################################################################################

... printing every 100th theta ...
=== theta=(1, 1) ===
=== theta=(7, 5) ===
=== theta=(13, 9) ===
=== theta=(19, 13) ===
=== theta=(26, 1) ===
=== theta=(32, 5) ===
=== theta=(38, 9) ===
Finishing GridSearch in 166.64 secs
Best theta: (34, 16). Best cumC: 1,089,429,400.0
CPU times: user 2min 44s, sys: 374 ms, total: 2min 45s
Wall time: 2min 46s

P.plot_Fhat_map(
    Fhat__meanI_th_I, 
    thetasBuyELA, 
    thetasBuySON, 
    'thetaBuyELA', 
    'thetaBuySON', 
    r"$\hat{F}^{mean}(\theta)$"+f"\n L = {params['L']}, T = {params['T']}, thetaStar = {thetaStar_few}"
)
P.plot_Fhat_map(
    Fhat__stdvI_th_I, 
    thetasBuyELA, 
    thetasBuySON, 
    'thetaBuyELA', 
    'thetaBuySON', 
    r"$\hat{F}^{stdv}(\theta)$"+f"\n L = {params['L']}, T = {params['T']}, thetaStar = {thetaStar_few}"
)

True

4.6.1.2 With many shorter sample-paths

# UPDATE PARAMETERS
# params.update({'L': 2_000}) #add sample-paths
# params.update({'T': 100}) #optimal theta not consistent, so raise T
params.update({'L': 400})
params.update({'T': 50})

params

{'Algorithm': 'GridSearch',
 'T': 50,
 'eta': None,
 'R__max': 57,
 'R_0': (0, 0),
 'seed': 189654913,
 'theta_sell_min': None,
 'theta_sell_max': None,
 'theta_buy_min': 10,
 'theta_buy_max': 100,
 'theta_inc': 1,
 'R__maxELA': 40,
 'R__maxSON': 17,
 'theta_buy_min_ELA': 1,
 'theta_buy_max_ELA': 40,
 'theta_buy_min_SON': 1,
 'theta_buy_max_SON': 17,
 'L': 400}

# create a model and a policy
M = InventoryStorageModel(
    SVarNames, 
    xVarNames, 
    S_0, 
    params, 
    exogParams,
    possibleDecisions,
    p__buyELA,
    p__sellELA,
    p__buySON,
    p__sellSON,
)
P = InventoryStoragePolicy(M, piNames)

%%time
##########################################################################
#GridSearch #. SDAM-9.4.1
if params['Algorithm'] == 'GridSearch':
    thetasBuyELA = np.arange(params['theta_buy_min_ELA'], params['theta_buy_max_ELA'], params['theta_inc'])
    thetasBuySON = np.arange(params['theta_buy_min_SON'], params['theta_buy_max_SON'], params['theta_inc'])
    thetas = P.grid_search_theta_values(thetasBuyELA, thetasBuySON)
    # cumCI_theta_I, thetaStar_many = \
    #   P.perform_grid_search(params, grid_search_theta_values[0])
    Fhat__meanI_th_I, Fhat__stdvI_th_I, thetaStar_many = \
      P.perform_grid_search_sample_paths(params, thetas)
##################################################################################

... printing every 100th theta ...
=== theta=(1, 1) ===
=== theta=(7, 5) ===
=== theta=(13, 9) ===
=== theta=(19, 13) ===
=== theta=(26, 1) ===
=== theta=(32, 5) ===
=== theta=(38, 9) ===
Finishing GridSearch in 171.65 secs
Best theta: (35, 16). Best cumC: 5,256,724.3
CPU times: user 2min 49s, sys: 341 ms, total: 2min 50s
Wall time: 2min 51s

P.plot_Fhat_map(
    Fhat__meanI_th_I, 
    thetasBuyELA, 
    thetasBuySON, 
    'thetaBuyELA', 
    'thetaBuySON', 
    r"$\hat{F}^{mean}(\theta)$"+f"\n L = {params['L']}, T = {params['T']}, thetaStar = {thetaStar_many}"
)
print()
P.plot_Fhat_map(
    Fhat__stdvI_th_I, 
    thetasBuyELA, 
    thetasBuySON, 
    'thetaBuyELA', 
    'thetaBuySON', 
    r"$\hat{F}^{stdv}(\theta)$"+f"\n L = {params['L']}, T = {params['T']}, thetaStar = {thetaStar_many}"
)

True

4.6.2 Evaluation

M_evalu = InventoryStorageModel(
    SVarNames, 
    xVarNames, 
    S_0, 
    params, 
    exogParams,
    possibleDecisions,
    p__buyELA,
    p__sellELA,
    p__buySON,
    p__sellSON,
)
P_evalu = InventoryStoragePolicy(M_evalu, piNames)
params

{'Algorithm': 'GridSearch',
 'T': 50,
 'eta': None,
 'R__max': 57,
 'R_0': (0, 0),
 'seed': 189654913,
 'theta_sell_min': None,
 'theta_sell_max': None,
 'theta_buy_min': 10,
 'theta_buy_max': 100,
 'theta_inc': 1,
 'R__maxELA': 40,
 'R__maxSON': 17,
 'theta_buy_min_ELA': 1,
 'theta_buy_max_ELA': 40,
 'theta_buy_min_SON': 1,
 'theta_buy_max_SON': 17,
 'L': 400}

def run_policy_evalu(piInfo_evalu, piName_evalu, stop_time_evalu):
    model_copy = copy(M_evalu)
    record = []
    for t in range(stop_time_evalu):
        x_t = getattr(P_evalu, piName_evalu)(t, model_copy.S_t, piInfo_evalu, stop_time_evalu)        
        res = model_copy.step(t, x_t) # step the model forward one iteration
        record.append([res[0].R_t_ELA, res[0].R_t_SON, res[0].D_t_ELA, res[0].D_t_SON, res[1], res[2].x_t_ELA, res[2].x_t_SON])
    cumC = model_copy.cumC    
    return cumC, record

# EVALUATION
piName_evalu = 'X__BuyBelow'
stop_time_evalu = 100 #180

4.6.2.1 Non-optimal policy

# theta_evalu_non=(10, None)
theta_evalu_non=(10, 2)
cumC, record = run_policy_evalu(theta_evalu_non, piName_evalu, stop_time_evalu)
labels = ['R_t_ELA', 'R_t_SON', 'D_t_ELA', 'D_t_SON', "cumC", 'x_t_ELA', 'x_t_SON']
print(f'{theta_evalu_non=}')
df_non = pd.DataFrame.from_records(data=record, columns=labels); df_non[:10]

theta_evalu_non=(10, 2)

	R_t_ELA	R_t_SON	D_t_ELA	D_t_SON	cumC	x_t_ELA	x_t_SON
0	40	17	25	8	-483,770.0000	40	17
1	15	9	11	5	59,030.0000	0	0
2	4	4	15	10	207,410.0000	0	0
3	36	0	19	9	5,480.0000	36	0
4	17	17	15	6	437,520.0000	0	17
5	2	11	20	6	620,710.0000	0	0
6	38	5	12	5	492,960.0000	38	0
7	26	0	15	10	821,540.0000	0	0
8	11	17	17	7	813,010.0000	0	17
9	0	10	18	9	1,085,510.0000	0	0

4.6.2.2 Optimal policy

from certifi.core import where
def plot_output(df_non, df, thetaStar):
  legendlabels = [r'$\mathrm{opt}$', r'$\mathrm{non}$']
  n_charts = 6
  ylabelsize = 16
  mpl.rcParams['lines.linewidth'] = 1.2
  fig, axs = plt.subplots(n_charts, sharex=True)
  # fig.set_figwidth(50); fig.set_figheight(10)
  fig.set_figwidth(13); fig.set_figheight(9)
  fig.suptitle(f'PERFORMANCE OF OPTIMIZED Buy-Below POLICY\nOptimal (magenta), Non-optimal (cyan), thetaStar = {thetaStar}', fontsize=20)

  i = 0 #x_t_ELA
  axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
  axs[i].step(df['x_t_ELA'], 'm')
  axs[i].step(df_non['x_t_ELA'], 'c')
  axs[i].axhline(y=0, color='k', linestyle=':')
  axs[i].set_ylabel('$x_{t,ELA}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

  i = 1 #x_t_SON
  axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
  axs[i].step(df['x_t_SON'], 'm')
  axs[i].step(df_non['x_t_SON'], 'c')
  axs[i].axhline(y=0, color='k', linestyle=':')
  axs[i].set_ylabel('$x_{t,SON}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

  i = 2 #D_t
  axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
  axs[i].step(df['D_t_ELA'], 'k')
  axs[i].step(df['D_t_SON'], 'b')
  axs[i].set_ylabel('$D_{t,ELA}$'+'\n'+'$D_{t,SON}$'+'$\mathrm{[units]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

  i = 3 #R_t_ELA
  axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
  axs[i].step(df['R_t_ELA'], 'm')
  axs[i].axhline(y=theta_evalu[0], color='m', linestyle=':')
  axs[i].text(0, theta_evalu[0], r'$\theta^{buy}$', size=16)
  axs[i].step(df_non['R_t_ELA'], 'c')
  axs[i].axhline(y=theta_evalu_non[0], color='c', linestyle=':')
  axs[i].text(0, theta_evalu_non[0], r'$\theta^{buy}$', size=16)
  axs[i].set_ylabel('$R_{t,ELA}$'+'\n'+'$\mathrm{[units]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

  i = 4 #R_t_SON
  axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
  axs[i].step(df['R_t_SON'], 'm')
  axs[i].axhline(y=theta_evalu[1], color='m', linestyle=':')
  axs[i].text(0, theta_evalu[1], r'$\theta^{buy}$', size=16)
  axs[i].step(df_non['R_t_SON'], 'c')
  axs[i].axhline(y=theta_evalu_non[1], color='c', linestyle=':')
  axs[i].text(0, theta_evalu_non[1], r'$\theta^{buy}$', size=16)
  axs[i].set_ylabel('$R_{t,SON}$'+'\n'+'$\mathrm{[units]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

  i = 5 #cumC
  axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
  axs[i].step(df['cumC'], 'm')
  axs[i].step(df_non['cumC'], 'c')
  axs[i].set_ylabel('$\mathrm{cumC}$'+'\n'+'$\mathrm{(Profit)}$'+'\n'+''+'$\mathrm{[\$]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
  axs[i].set_xlabel('$t\ \mathrm{[order\ windows]}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

  fig.legend(labels=legendlabels, loc='center', fontsize=18)

4.6.2.2.1 Optimal $θ$ from training with a few sample-paths

# theta_evalu = (46, None)
# theta_evalu = (45, None)
# theta_evalu = (43, None)
# theta_evalu=(36, 14)
theta_evalu = thetaStar_few
cumC, record = run_policy_evalu(theta_evalu, piName_evalu, stop_time_evalu)
labels = ['R_t_ELA', 'R_t_SON', 'D_t_ELA', 'D_t_SON', "cumC", 'x_t_ELA', 'x_t_SON']
print(f'{theta_evalu=}')
df = pd.DataFrame.from_records(data=record, columns=labels); df[:10]

theta_evalu=(34, 16)

	R_t_ELA	R_t_SON	D_t_ELA	D_t_SON	cumC	x_t_ELA	x_t_SON
0	40	17	13	8	-507,240.0000	40	17
1	27	9	17	11	145,350.0000	0	0
2	23	8	17	12	381,580.0000	13	8
3	23	9	10	7	514,730.0000	17	9
4	30	10	15	5	592,010.0000	17	8
5	25	12	15	9	767,420.0000	10	7
6	25	8	15	11	765,620.0000	15	5
7	25	9	10	7	820,020.0000	15	9
8	30	10	17	9	1,053,250.0000	15	8
9	23	8	22	12	1,205,190.0000	10	7

plot_output(df_non, df, thetaStar_few)

4.6.2.2.1 Optimal $θ$ from training with many sample-paths

# theta_evalu = (37, None)
# theta_evalu = (38, None)
# theta_evalu=(34, 15)
theta_evalu = thetaStar_many
cumC, record = run_policy_evalu(theta_evalu, piName_evalu, stop_time_evalu)
labels = ['R_t_ELA', 'R_t_SON', 'D_t_ELA', 'D_t_SON', "cumC", 'x_t_ELA', 'x_t_SON']
print(f'{theta_evalu=}')
df = pd.DataFrame.from_records(data=record, columns=labels); df[:10]

theta_evalu=(35, 16)

	R_t_ELA	R_t_SON	D_t_ELA	D_t_SON	cumC	x_t_ELA	x_t_SON
0	40	17	26	3	-687,440.0000	40	17
1	14	14	20	7	-195,360.0000	0	0
2	26	10	23	11	-197,190.0000	26	3
3	17	7	12	9	-79,290.0000	14	7
4	28	10	25	9	-228,640.0000	23	10
5	15	8	17	4	-99,390.0000	12	7
6	25	13	20	4	-315,960.0000	25	9
7	20	13	19	10	-60,960.0000	15	4
8	21	7	17	6	7,440.0000	20	4
9	23	11	18	11	141,950.0000	19	10

# plot_output(df, df_non)
plot_output(df_non, df, thetaStar_many)