PORTFOLIO OF PROJECTS - Learning the best medical treatment

0 INTRODUCTION

In this project the client had a need to investigate the use of the NBA (Next Best Action) technique to optimize marketing effort and to have a better handle on customer journeys.An example project (from Dr. Warren Powell) was used as a starting point to approach this need. Although this example is from the medical industry, the concepts are directly transferrable to the marketing industry.

The overall structure of this project and report follows the traditional CRISP-DM format. However, instead of the CRISP-DM’S “4 Modeling” section, we inserted the “6 step modeling process” of Dr. Warren Powell in section 4 of this document.

The example explored (but also modified in many ways) in this report comes from Dr. Warren Powell (formerly at Princeton). It was chosen as a template to create a POC for the client. It was important to understand this example thoroughly before embarking on creating the POC. Dr Powell’s unified framework shows great promise for unifying the formalisms of at least a dozen different fields. Using his framework enables easier access to thinking patterns in these other fields that might be beneficial and informative to the sequential decision problem at hand. Traditionally, this kind of problem would be approached from the reinforcement learning perspective. However, using Dr. Powell’s wider and more comprehensive perspective almost certainly provides additional value.

Here is information on Dr. Powell’s perspective on Sequential Decision Analytics.

The original code for this example can be found here.

In order to make a strong mapping between the code in this notebook and the mathematics in the Powell Unified Framework (PUF), we follow the following convention for naming Python identifier names:

Superscripts
- variable names have a double underscore to indicate a superscript
- \(X^{\pi}\): has code X__pi, is read X pi
Subscripts
- variable names have a single underscore to indicate a subscript
- \(S_t\): has code S_t, is read ‘S at t’
- \(M^{Spend}_t\) has code M__Spend_t which is read: “MSpend at t”
Arguments
- collection variable names may have argument information added
- \(X^{\pi}(S_t)\): has code X__piIS_tI, is read ‘X pi in S at t’
- the surrounding I’s are used to imitate the parentheses around the argument
Next time/iteration
- variable names that indicate one step in the future are quite common
- \(R_{t+1}\): has code R_tt1, is read ‘R at t+1’
- \(R^{n+1}\): has code R__nt1, is read ‘R at n+1’
Rewards
- State-independent terminal reward and cumulative reward
  - \(F\): has code F for terminal reward
  - \(\sum_{n}F\): has code cumF for cumulative reward
- State-dependent terminal reward and cumulative reward
  - \(C\): has code C for terminal reward
  - \(\sum_{t}C\): has code cumC for cumulative reward
Vectors where components use different names
- \(S_t(R_t, p_t)\): has code S_t.R_t and S_t.p_t, is read ‘S at t in R at t, and, S at t in p at t’
- the code implementation is by means of a named tuple
  - self.State = namedtuple('State', SVarNames) for the ‘class’ of the vector
  - self.S_t for the ‘instance’ of the vector
Vectors where components reuse names
- \(x_t(x_{t,GB}, x_{t,BL})\): has code x_t.x_t_GB and x_t.x_t_BL, is read ‘x at t in x at t for GB, and, x at t in x at t for BL’
- the code implementation is by means of a named tuple
  - self.Decision = namedtuple('Decision', xVarNames) for the ‘class’ of the vector
  - self.x_t for the ‘instance’ of the vector
Use of mixed-case variable names
- to reduce confusion, sometimes the use of mixed-case variable names are preferred (even though it is not a best practice in the Python community), reserving the use of underscores and double underscores for math-related variables

1 BUSINESS UNDERSTANDING

The following business description comes from the free book by Dr. Powell, Sequential Decision Analytics and Modeling:

When people find they have high blood sugar, typically evaluated using a metric called the “A1C” level, there are several dozen drugs that fall into four major groups: - Sensitizers - These target liver, muscle, and fat cells to directly increase insulin sensitivity, but may cause fluid retention and therefore should not be used for patients with a history of kidney failure. - Secretagoges - These drugs increase insulin sensitivity by targeting the pancreas but often causes hypoglycemia and weight gain. - Alphaglucosidase inhibitors - These slow the rate of starch metabolism in the intestine, but can cause digestive problems. - Peptide analogs - These mimic natural hormones in the body that stimulate insulin production.

The most popular drug is a type of sensitizer called metformin, which is almost always the first medication that is prescribed for a new diabetic, but this does not always work. Prior to working with a particular patient, a physician may have a belief about the potential of metformin, and drugs from each of the four groups, to reduce blood sugar that is illustrated in figure 4.1.

Beliefs about reduction in blood sugar_Fig4-1

A physician will typically start with metformin, but this only works for about 70 percent of patients. Often, patients simply cannot tolerate a medication (it may cause severe digestive problems). When this is the case, physicians have to begin experimenting with different drugs. This is a slow process, since it takes several weeks before it is possible to assess the effect a drug is having on a patient. After testing a drug on a patient for a period of time, we observe the reduction in the A1C level, and then use this observation to update our estimate of how well the drug works on the patient.

2 DATA UNDERSTANDING

# import pdb
from collections import namedtuple, Counter, defaultdict
import numpy as np
import pandas as pd
import math
from random import randint
import matplotlib.pyplot as plt
from copy import copy
import time
import matplotlib as mpl
from certifi.core import where
pd.options.display.float_format = '{:,.4f}'.format
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)
! python --version

Python 3.10.11

3 DATA PREPARATION

def formatFloatList(L,p):
    sFormat = "{{:.{}f}} ".format(p) * len(L) 
    outL = sFormat.format(*L)
    return outL.split()

def normalizeCounter(counter):
    total = sum(counter.values(), 0.0)
    for key in counter:
        counter[key] /= total
    return counter
    
# This function returns the precision (beta), given the s.d. (sigma)
def Beta(sigma):
    return 1/sigma**2

# PARAMETERS
seed = 19783167
file = 'MDDMparameters.xlsx'
print_excel_file=False

SNames = ['mub__n', 'bet__n', 'N__n']
xNames = ['x__n']
eNames = ['M', 'Sens', 'Secr', 'AGI', 'PA'] #Metformin, Sensitizer, Secretagoge, Alpha-glucosidase inhibitor, Peptide analog
piNames = ['X__UCB', 'X__IE', 'X__PureExploitation', 'X__PureExploration']

params = pd.read_excel(f'{base_dir}/{file}', sheet_name='parameters1', index_col=0); print(f'{params}')

      mubar__0  sigbar__0  mu_truth  sig_truth  mu_fixed  fixed_uniform_a  \
M       0.3200     0.1200    0.2500          0    0.3000          -0.1500   
Sens    0.2800     0.1900    0.3000          0    0.3000          -0.1500   
Secr    0.3000     0.1700    0.2800          0    0.3000          -0.1500   
AGI     0.2600     0.1500    0.3400          0    0.3000          -0.1500   
PA      0.2100     0.2100    0.2400          0    0.3000          -0.1500   

      fixed_uniform_b  prior_mult_a  prior_mult_b  Unnamed: 10  mu_prior_0  \
M              0.1500       -0.5000        0.5000          NaN      0.3200   
Sens           0.1500       -0.5000        0.5000          NaN      0.2800   
Secr           0.1500       -0.5000        0.5000          NaN      0.3000   
AGI            0.1500       -0.5000        0.5000          NaN      0.2600   
PA             0.1500       -0.5000        0.5000          NaN      0.2100   

      sigma_prior_0  \
M            0.1200   
Sens         0.1900   
Secr         0.1700   
AGI          0.1500   
PA           0.2100   

                                                                                                                                                                                      Notes:  
M     Adjust columns mubar__0 and sigbar__0 to set up the initial parameters of the prior. Those columns are used for ALL types of "truth_type" that you might select in sheet "parameters2"  
Sens                                                                                                     Adjust columns mu_truth and sig_truth when the "truth_type" are "known" or "normal"  
Secr                                                                                  Adjust columns mu_fixed,  fixed_uniform_a and fixed_uniform_b when the "truth_type" is "fixed_uniform"  
AGI                                                                                                    Adjust columns prior_mult_a and prior_mult_b when the "truth_type" is "prior_uniform"  
PA                                                                                                                                                                                       NaN

additional_params = pd.read_excel(f'{base_dir}/{file}', sheet_name='parameters2', index_col=0); additional_params #.

	0	Unnamed: 2	Unnamed: 3
sig__W	0.5000	NaN	"Noise when observing the truth"
N	20	NaN	NaN
L	1000	NaN	NaN
theta_start	0	NaN	The interval that is going to be considered in the algorithm is [theta_start, theta_start + increment, …, theta_end)
theta_end	2.1000	NaN	NaN
increment	0.2000	NaN	NaN
truth_type	fixed_uniform	NaN	Possible values are "known", "fixed_uniform", "prior_uniform" or "normal". Pick one.
policy	IE	NaN	The possible policies are IE, UCB, PureExploitation, and PureExploration. You can select more than one by typing more than one policy name (separated by blank space)

4 MODELING

4.1 Narrative

From section 4.2 in the free book by Dr. Powell, Sequential Decision Analytics and Modeling:

For our basic model, we are going to assume that we have five choices of medications: metformin, or a drug (other than metformin) drawn from one of the four major drug groups. Let \(\cal{X} = \{x_\mathrm{M}, x_\mathrm{Sens}, x_\mathrm{Secr}, x_\mathrm{AGI}, x_\mathrm{PA}\}\) be the five choices. From observing the performance of each drug over many (that is, millions) of patients, it is possible to construct a probability distribution of the reduction in A1C levels across all patients. The results of this analysis are shown in table 4.1 which reports the average reduction and the standard deviation across all patients. We assume that the distribution of reductions in A1C across the population is normally distributed, with means and standard deviations as given in the table:

4.2 Core Elements

This section attempts to answer three important questions: - What metrics are we going to track? - What decisions do we intend to make? - What are the sources of uncertainty?

For this problem, the only metric we are interested in is the amount of reduction in blood sugar at the end of N trials. A single type of decision needs to be made at the start of each trial - which medication should be prescribed next. The only source of uncertainty is the level of reduction for each medication.

4.3 Mathematical Model | SUM Design

To have a model, we want

\(\bar{\mu}^0_e\) = mean reduction in A1C for drug \(e\) across the population

\(\bar{\sigma}^0_e\) = stdv reduction in A1C for drug \(e\) across the population

Our interest is learning the best drug for a particular individual. Although we can describe the patient using a set of attributes, for now we are only going to assume that the characteristics of the patient do not change our belief about the performance of each drug for an individual patient.

We do not know the reduction we can expect from each drug, so we represent it as a random variable \(\mu_e\), where we assume that \(\mu_e\) is normally distributed, which we write as

\(\mu_e \sim N(\bar{\mu}^0_e, \bar{\sigma}^0_e)\).

We refer to the normal distribution \(N(\bar{\mu}^0_e, \bar{\sigma}^0_e)\) as the prior distribution of belief about \(\mu_e\).

We index each iteration of prescribing a medication by \(n\) which starts at 0. Assume that we always observe a patient for a fixed period of time (say, a month). If we try a drug \(e\) on a patient, we make a noisy observation of the truth value \(\mu_e\). Assume we make a choice of drug \(x^n\) using what we know after \(n\) trials, after which we observe the outcome of the \(n + 1\)st trial, which we denote \(W^{n+1}\) (this is the reduction in the A1C level). This can be written

\(W^{n+1} = \mu_e + \epsilon^{n+1}\).

Remember that we do not know \(\mu_e\); this is a random variable, where \(\bar{\mu}^n_e\) is our current estimate of the mean of \(\mu_e\).

4.3.1 State variables

The state variables represent what we need to know. We are using a Bayesian belief model wherein we treat the unknown value of a drug, \(\mu_e\), as a random variable with initial prior distribution given by \(S^0\). After \(n\) experiments with different drugs, we obtain the posterior distribution of belief \(S^n\).

\(\bar{\mu}^n = (\bar{\mu}^n_{e})_{e \in \cal E}\) where \(\cal{E} = \mathrm{\{M, Sens, Secr, AGI, PA\}}\)
- the estimated means at time \(n\)
- measured in units of reduction
\(\bar{\sigma}^n = (\bar{\sigma}^n_{e})_{e \in \cal E}\) where \(\cal{E} = \mathrm{\{M, Sens, Secr, AGI, PA\}}\)
- the estimated standard deviations at time \(n\)
- measured in units of reduction

For mathematical convenience we will also work with the precision rather than the standard deviation some times:

\(\beta^n_e = \frac{1}{(\bar{\sigma}^n_{e})^2}\)

We will also keep track of how many times a treatment \(e\) has been applied and indicate it by:

\(N^n_e\)

The state is then:

\(S^n = (\bar{\mu}^n, \bar{\sigma}^n, N^n) = ((\bar{\mu}^n_{e})_{e \in \cal E}, (\bar{\sigma}^n_{e})_{e \in \cal E}), (N^n_{e})_{e \in \cal E})\)

Alternatively,

\(S^n = (\bar{\mu}^n, \beta^n, N^n) = ((\bar{\mu}^n_{e})_{e \in \cal E}, (\beta^n_{e})_{e \in \cal E}), (N^n_{e})_{e \in \cal E})\)

The state variables are represented by the following variables in the MedicalDecisionDiabetesModel class:

self.SNames = SNames
self.State = namedtuple('State', SNames) # 'class'
self.S__n = self.build_state(self.S__0) # 'instance'

where

SNames = ['mub__n', 'sgb__n', 'N__n']

4.3.2 Decision variables

The decision variables represent what we control.

\(x^n = (x^n_{e})_{e\in \cal X}\) where \(\cal{X} = \mathrm{\{M, Sens, Secr, AGI, PA\}} = \cal E\)
- \(x^n_{e} = \begin{cases} 1 & \text{if we treat with entity } e \in \cal E, \\ 0 & \text{if we do not.} \end{cases}\)
- \(x^n\) is a 1-hot decision vector, e.g. \((0, 0, 1, 0, 0)\) which means treatment Secr is applied
Constraints
- only one of the \(x^n_{e}\)’s could be 1
Decisions are made with a policy (TBD below):
- \(X^{\pi}(S^n)\)

The decision variables are represented by the following variables in the InventoryStorageModel class:

self.Decision = namedtuple('Decision', xNames) # 'class'

where

xNames = ['x__n']

4.3.3 Exogenous information variables

The exogenous information variables represent what we did not know (when we made a decision). These are the variables that we cannot control directly. The information in these variables become available after we make the decision \(x^n\).

After we make the decision \(x^n\), we observe

\[ W^{n+1}_x = \text{the reduction in A1C level resulting from }x = x^n \text{ for the }n+1 \text{ st trial.} \]

The latest exogenous information can be accessed by calling the following method from class MedicalDecisionDiabetesModel():

def W_fn(self, x__n):
    # W^n+1 ~~tilde~~ N(mu_x, beta^W_x)
    hot_e = [k for k,v in x__n.x__n.items() if v == 1][0]
    W__nt1 = self.prng.normal(self.mu[hot_e], self.sigma__W)
    beta__W = Beta(self.sigma__W)
    return {"W": W__nt1, "mu": self.mu[hot_e], "beta__W": beta__W}

4.3.4 Transition function

The transition function describe how the state variables evolve over time. Because we currently have three state variables in the state, \(S_t=(\bar{\mu}^n,\beta^n,N^n)\), we have the equations:

\[ \begin{aligned} \bar{\mu}^{n+1} &= \frac{\beta^n \bar{\mu}^n + \beta^W W^{n+1}}{\beta^n + \beta^W} \quad (Eq. 1) \\ \beta^{n+1} &= \beta^n + \beta^W \quad\quad\quad\quad (Eq. 2) \\ N^{n+1} &= N^n + 1 \quad\quad\quad\quad (Eq. 3) \end{aligned} \]

Collectively, they represent the general transition function:

\[ S_{t+1} = S^M(S_t,X^{\pi}(S_t)) \] The transition function is implemented by the following method in class MedicalDecisionDiabetesModel():

def S__M_fn(self, x__n, exog_info):
    hot_e = [k for k,v in x__n.x__n.items() if v == 1][0]
    mub__n = getattr(self.S__n, 'mub__n')[hot_e]
    bet__n = getattr(self.S__n, 'bet__n')[hot_e]
    betaW = exog_info["betaW"]    
    W__nt1 = exog_info["W"]
    N__n = getattr(self.S__n, 'N__n')[hot_e]
    bet__nt1 = bet__n + betaW
    mub__nt1 = (bet__n*mub__n + betaW*W__nt1)/bet__nt1
    N__nt1 = N__n + 1
    tmp = {      
        'mub__n': {hot_e: mub__nt1}, 
        'bet__n': {hot_e: bet__nt1}, 
        'N__n': {hot_e: N__nt1}
    }
    exog_info.update(tmp)
    S__n_info = {s: getattr(self.S__n, s) for s in self.SNames}
    for s in self.SNames:
        S__n_info[s][hot_e] = exog_info[s][hot_e]
    S__nt1 = self.build_state(S__n_info)
    return S__nt1

4.3.5 Objective function

The objective function captures the performance metrics of the solution to the problem.

Each time a drug (entity) is prescribed by a decision \(x = x^n\), we observe a reduction in A1C, \(W^{n+1}_{x^n}\). We need to find a policy that prescribes a treatment \(x^n = X^{\pi}(S^n)\) that maximizes the expected total reduction in A1C. Our performance metric is therefor

\(C(S^n, x^n, W^{n+1}) = W^{n+1}_{x^n}\)

Our objective function will be

\[ \max_{\pi}\mathbb{E}\{\sum_{n=0}^{N-1}C(S^n,x^n,W^{n+1}) | S^0 \} \]

The reward function is implemented by the following method in class MedicalDecisionDiabetesModel:

def F_fn(self, x__n, exog_info):
    mu = exog_info["mu"]
    # W__nt1 = exog_info["W"]
    return mu
    # return W__nt1

4.3.6 Implementation of SUM Model

Here is the complete implementation of the MedicalDecisionDiabetesModel class:

class MedicalDecisionDiabetesModel():
    def __init__(self, SNames, xNames, eNames, params, additional_params, W_fn=None, S__M_fn=None, F_fn=None, seed=20180529):
        self.initArgs = {seed: seed}
        self.prng = np.random.RandomState(seed)
        self.S__0 = {
            'mub__n': {e: params.loc[e]['mubar__0'] for e in eNames},
            'bet__n': {e: Beta(params.loc[e]['sigbar__0']) for e in eNames},
            'N__n': {e: 0 for e in eNames}}
        self.SNames = SNames
        self.xNames = xNames
        self.eNames = eNames
        self.State = namedtuple('State', SNames) # 'class'
        self.S__n = self.build_state(self.S__0) # 'instance'
        self.Decision = namedtuple('Decision', xNames)
        self.cumF = 0.0
        self.sigmaW = additional_params.loc['sig__W', 0]
        self.truth_type = additional_params.loc['truth_type', 0]
        self.mu = {} #updated using W_sample_mu() at the beginning of each sample path #.truth
        self.t = 0 #time counter (in months)
        self.truth_params = {}
        if self.truth_type == 'fixed_uniform':
            self.truth_params = {e: [
                params.loc[e, 'mu_fixed'], 
                params.loc[e, 'fixed_uniform_a'], 
                params.loc[e, 'fixed_uniform_b']] for e in self.eNames}
        elif self.truth_type == 'prior_uniform':
            self.truth_params = {e: [
                params.loc[e, 'mu_0'], 
                params.loc[e, 'prior_mult_a'], 
                params.loc[e, 'prior_mult_b']] for e in self.eNames}
        else:
            self.truth_params = {e: [
                params.loc[e, 'mu_truth'], 
                params.loc[e, 'sigma_truth'], 0] for e in self.eNames}

    def printState(self):
        print("Current state ")
        for sn in self.SNames:
            if sn=='mub__n':
                mubars = getattr(self.S__n, 'mub__n')
                mubars_f = {k: f'{v:.2f}' for k,v in mubars.items()}
                print(f'mub__n={mubars_f}')
            elif sn=='bet__n': 
                betas = getattr(self.S__n, 'bet__n')
                betas_f = {k: f'{v:.2f}' for k,v in betas.items()}
                print(f'bet__n={betas_f}')
                sigmas = {k: 1/math.sqrt(v) for k,v in betas.items()}
                sigmas_f = {k: f'{v:.2f}' for k,v in sigmas.items()}
                print(f'sgb__n={sigmas_f}')
            else:
                print(f'{sn}={getattr(self.S__n, sn)}')
        print("\n\n")
    
    def printTruth(self):
        print("Model truth_type {}. Measurement noise sigmaW {} ".format(self.truth_type, self.sigmaW))
        for e in self.eNames:
            print("Treatment {}: par1 {:.2f}, par2 {:.2f} and par3 {}".format(e, self.truth_params[e][0], self.truth_params[e][1], self.truth_params[e][2]))
        print("\n\n")
    
    def build_state(self, info):
        return self.State(*[info[sn] for sn in self.SNames])

    def build_decision(self, info):
        return self.Decision(*[info[xn] for xn in self.xNames])

    def W_sample_mu(self):
        if self.truth_type == "known":
            self.mu = {e: self.truth_params[e][0] for e in self.eNames} #.
        elif self.truth_type == "fixed_uniform": #. all mu btw (.30 - .15) and (.30 + .15), i.e. reductions always btw .15 and .45
            self.mu = {e: self.truth_params[e][0] + self.prng.uniform(self.truth_params[e][1], self.truth_params[e][2]) for e in self.eNames} #.
        elif self.truth_type == "prior_uniform":
            self.mu = {e: self.truth_params[e][0] + self.prng.uniform(self.truth_params[e][1]*self.truth_params[e][0], self.truth_params[e][2]*self.truth_params[e][0]) for e in self.eNames} #.
        else:
            self.mu = {e: self.prng.normal(self.truth_params[e][0], self.truth_params[e][1]) for e in self.eNames} #.

    # Gives the exogenous information that is dependent on a random process
    # In our case, exogeneous information: W^(n+1) = mu_x + eps^(n+1),
    # W^n+1 = mu_x + eps^n+1 #.
    #   where:
    #   eps^n+1 is normally distributed with mean 0 and known variance (here s.d. 0.05)
    #   W^n+1_x is the reduction in A1C level
    def W_fn(self, x__n):
        # W^n+1 ~~tilde~~ N(mu_x, beta^W_x)
        hot_e = [k for k,v in x__n.x__n.items() if v == 1][0]
        W__nt1 = self.prng.normal(self.mu[hot_e], self.sigmaW)
        betaW = Beta(self.sigmaW)
        return {"W": W__nt1, "mu": self.mu[hot_e], "betaW": betaW}

    # Takes in the decision and exogenous information to return
    # the new mu_empirical and beta values corresponding to the decision
    def S__M_fn(self, x__n, exog_info):
        hot_e = [k for k,v in x__n.x__n.items() if v == 1][0]
        mub__n = getattr(self.S__n, 'mub__n')[hot_e]
        bet__n = getattr(self.S__n, 'bet__n')[hot_e]
        betaW = exog_info["betaW"]
        W__nt1 = exog_info["W"]
        N__n = getattr(self.S__n, 'N__n')[hot_e]
        #. beta^n+1_x = beta^n_x + beta^W (SDAM-4.2), (OL-2.7)
        bet__nt1 = bet__n + betaW
        #. (beta^n_x*mubar^n+1_x + beta^W_x * W^n+1_x) / beta^n+1_x (SDAM-4.1), (OL-2.6)
        mub__nt1 = (bet__n*mub__n + betaW*W__nt1)/bet__nt1 #. was mu_empirical
        N__nt1 = N__n + 1 #. count of no. times given, for drug x
        tmp = {      
            'mub__n': {hot_e: mub__nt1}, 
            'bet__n': {hot_e: bet__nt1}, 
            'N__n': {hot_e: N__nt1}
        }
        exog_info.update(tmp); #print(f'### {exog_info=}')
        S__n_info = {s: getattr(self.S__n, s) for s in self.SNames}; #print(f'### {S__n_info=}')
        for s in self.SNames:
            S__n_info[s][hot_e] = exog_info[s][hot_e]
        S__nt1 = self.build_state(S__n_info); #print(f'### {S__n_info=}')
        return S__nt1

    # Calculates W (reduction in A1C level)
    def F_fn(self, x__n, exog_info):
        mu = exog_info["mu"]
        # W__nt1 = exog_info["W"]
        return mu #. ORIG
        # return W__nt1 #.

    # Steps the process forward by one time increment by updating the sum of 
    # the contributions, the exogenous information and the state variable
    def step(self, x__n):
        # compute new mu_empirical and beta for the decision
        exog_info = self.W_fn(x__n); #print(f'### {exog_info=}')
        self.S__n = self.S__M_fn(x__n, exog_info)
        # update reward (add new W to previous obj)
        F = self.F_fn(x__n, exog_info)
        self.cumF += F
        self.t_update()        
        return (self.S__n, self.cumF, x__n, exog_info)
    
    # Update method for time counter
    def t_update(self):
        self.t += 1
        return self.t
    
    def mubs_by_eNames(self, withForNames=True):
        stats = {}
        for e in self.eNames:
          ss = []
          for s in self.SNames:
            ss.append(getattr(self.S__n, s)[e])
          value = ss[0]; #print(value)
          stats.update({e: value})
        # print(f'{stats=}')
        if withForNames: return stats
        else: return [stats[key] for key in stats]

    def bets_by_eNames(self, withForNames=True):
        stats = {}
        for e in self.eNames:
          ss = []
          for s in self.SNames:
            ss.append(getattr(self.S__n, s)[e])
          value = ss[1]; #print(value)
          stats.update({e: value})
        # print(f'{stats=}')
        if withForNames: return stats
        else: return [stats[key] for key in stats]

    def sgbs_by_eNames(self, withForNames=True):
        stats = {}
        for e in self.eNames:
          ss = []
          for s in self.SNames:
            ss.append(getattr(self.S__n, s)[e])
          value = 1/math.sqrt(ss[1]); #print(value)
          stats.update({e: value})
        # print(f'{stats=}')
        if withForNames: return stats
        else: return [stats[key] for key in stats]

    def Ns_by_eNames(self, withForNames=True):
        stats = {}
        for e in self.eNames:
          ss = []
          for s in self.SNames:
            ss.append(getattr(self.S__n, s)[e])
          value = ss[2]; #print(value)
          stats.update({e: value})
        # print(f'{stats=}')
        if withForNames: return stats
        else: return [stats[key] for key in stats]

4.4 Uncertainty Model

As stated in 4.3.3, we have

After we make the decision \(x^n\), we observe

\[ W^{n+1}_x = \text{the reduction in A1C level resulting from }x = x^n \text{ for the }n+1 \text{ st trial.} \]

4.5 Policy Design

There are two main meta-classes of policy design. Each of these has two subclasses: - Policy Search - Policy Function Approximations (PFAs) - Cost Function Approximations (CFAs) - Lookahead - Value Function Approximations (VFAs) - Direct Lookaheads (DLAs)

In this project we will use 4 policies from the PFA class:

X__UCB
X__IE
X__PureExploitation
X__PureExploration

4.5.1 Implementation of Policy Design

from collections import namedtuple
import math
import random
import numpy as np

class MDDMPolicy():
    def __init__(self, model, piNames, seed=1456897):
        self.model = model
        self.piNames = piNames
        self.Policy = namedtuple('Policy', piNames)
        self.seed = seed
        self.prng = np.random.RandomState(seed)

    def build_policy(self, info):
        return self.Policy(*[info[pn] for pn in self.piNames]) #.

    def X__UCB(self, model_curr, theta): #. (SDAM 4.5)
        # this method implements the Upper Confidence Bound policy
        # N.B: can't implement this at time t=0 (from t=1 at least). 
        # Also can't divide by zero, which means we need each drug to have been tested at least once.
        info = {
          'x__n': {e: 0 for e in eNames}
        }
        mubs = model_curr.mubs_by_eNames(); #print(f'{mubs=}')
        Ns = model_curr.Ns_by_eNames(); #print(f'{Ns=}')
        stats = {}
        for e,mu in mubs.items():
          # print(f'{mu=}, {theta=}, {model_curr.t=}, {Ns[fn]=}')
          value = mu + theta[0]*math.sqrt(math.log(model_curr.t + 1)/(Ns[e] + 1)); #print(f'### {value=}')
          stats.update({e: value})
        # print(f'### {stats=}')
        opt_eName = max(stats, key=stats.get); #print(f'{opt_eName=}')
        info['x__n'][opt_eName] = 1
        return self.model.build_decision(info)

    def X__IE(self, model_curr, theta): #. (SDAM 4.6)
        # This method implements the Interval Estimation policy
        #                                   mubar^n_x  + theta^IE*(1/beta^n_x)
        info = {
          'x__n': {e: 0 for e in eNames}
        }
        mubs = model_curr.mubs_by_eNames(); #print(f'{mubs=}')
        bets = model_curr.bets_by_eNames(); #print(f'{bets=}')
        stats = {}
        for e,mu in mubs.items():
          # print(f'{mu=}, {theta=}, {model_curr.t=}, {Ns[e]=}')
          value = mu + theta[0]/math.sqrt(bets[e]); #print(f'### {value=}')
          stats.update({e: value})
        # print(f'### {stats=}')
        opt_eName = max(stats, key=stats.get); #print(f'{opt_eName=}')
        info['x__n'][opt_eName] = 1
        return self.model.build_decision(info)
    
    def X__PureExploitation(self, model_curr, theta):
        # This method implements the Pure Exploitation policy (theta = 0)
        info = {
          'x__n': {e: 0 for e in eNames}
        }
        mubs = model_curr.mubs_by_eNames(); #print(f'{mubs=}')
        Ns = model_curr.Ns_by_eNames(); #print(f'{Ns=}')
        stats = {}
        for e,mu in mubs.items():
          # print(f'{mu=}, {theta=}, {model_curr.t=}, {Ns[fn]=}')
          value = mu; #print(f'### {value=}')
          stats.update({e: value})
        # print(f'### {stats=}')     
        opt_eName = max(stats, key=stats.get); #print(f'{opt_eName=}')
        info['x__n'][opt_eName] = 1
        return self.model.build_decision(info)
        
    def X__PureExploration(self, model_curr, theta):
        # This method implements the Pure Exploration policy (random drug every time)
        info = {
          'x__n': {e: 0 for e in eNames}
        }
        mubs = model_curr.mubs_by_eNames(); #print(f'{mubs=}')
        Ns = model_curr.Ns_by_eNames(); #print(f'{Ns=}')
        stats = {}
        for e,mu in mubs.items():
          # print(f'{mu=}, {theta=}, {model_curr.t=}, {Ns[fn]=}')
          value = mu + theta[0]*math.sqrt(math.log(model_curr.t + 1)/(Ns[e] + 1)); #print(f'### {value=}')
          stats.update({e: value})
        # print(f'### {stats=}')
        opt_eName = self.prng.choice(list(stats)); #print(f'{opt_eName=}')
        info['x__n'][opt_eName] = 1
        return self.model.build_decision(info)

    def run_policy_sample_paths(self, N, L, theta, pi, record, Fhat_mean): #theta could be a vector
        FhatIomega__lI = []
        states_cum = {e: [0., 0., 0] for e in eNames}#.
        states_avg = {e: [0., 0., 0] for e in eNames}#.
        for l in range(1, L + 1): #for each sample-path, Table 5.1
            # print(f'---.---.--- l: {l} (sample path omega^{l})') #.
            model_copy = copy(self.model)
            model_copy.W_sample_mu() #sample the truth, the same for the N experiments in budget
            
            # determine the best treatment for the sampled truth
            best_x = max(model_copy.mu, key=model_copy.mu.get)
            
            # prepare record for output
            mu_f = [model_copy.mu[e] for e in eNames] #. mu-for-f; truth; the same for the N experiments in budget
            Scum_mu_f = [states_cum[e][0] for e in eNames] #. cumulative mu
            Savg_mu_f = [states_avg[e][0] for e in eNames] #. average mu
            record_l = [pi, self.model.truth_type, theta[0], Fhat_mean, l] + mu_f + Scum_mu_f + Savg_mu_f + [best_x] #.

            for n in range(N): #for each transition/step
                # print(f'---.---.---.--- n: {n} (trial/experiment W^{n})') #.
                x__n = getattr(self, pi)(model_copy, theta); #print(f'### {x__n=}')
                
                S__n, F, x__n, exog_info = model_copy.step(x__n)
            
                # adding record for output
                hot_e = [k for k,v in x__n.x__n.items() if v == 1][0]
                record_n = [n] + \
                  [S__n.mub__n[e] for e in eNames] + \
                  [1/math.sqrt(S__n.bet__n[e]) for e in eNames] + \
                  [S__n.N__n[e] for e in eNames] + \
                  [x__n.x__n[e] for e in eNames] + \
                  [exog_info['W'], F, x__n.x__n[hot_e]==best_x and 1 or 0]
                record.append(record_l + record_n)
            
            # updating end of experiments stats
            FhatIomega__lI.append(model_copy.cumF) #. just above (SDAM-eq2.9) #. Fhat for this sample-path is in model_copy.obj
            mubs,bets,Ns = model_copy.mubs_by_eNames(), model_copy.bets_by_eNames(), model_copy.Ns_by_eNames(); #mubs,bets,Ns
            states_cum.update({e: [states_cum[e][0] + mubs[e], states_cum[e][1] + bets[e], states_cum[e][2] + Ns[e]] 
                for e in eNames})            
        return FhatIomega__lI, states_cum

    def perform_grid_search_sample_paths(self, N, L, thetas, pi):
        # dictionaries to store the stats for different values of theta
        theta_obj = {p:[] for p in piNames}; print('theta_obj:', theta_obj)
        theta_obj_std = {p:[] for p in piNames}; print('theta_obj_std:', theta_obj_std)
        record = []

        tS = time.time()
        Fhat_mean = None
        Fhat_var = None
        Fhat__meanI_th_I = defaultdict(float) #{}
        Fhat__stdvI_th_I = defaultdict(float) #{}

        states_avg = {e: [0., 0., 0] for e in eNames}
        print("--- Starting policy {}".format(pi))
        for theta in thetas:
            print(f'---.--- theta: {theta}') #.
            self.model.prng = np.random.RandomState(seed)
            self.prng = np.random.RandomState(seed)

            FhatIomega__lI, states_cum = self.run_policy_sample_paths(
                N, L, theta, pi, record, Fhat_mean)

            Fhat_mean = np.array(FhatIomega__lI).mean() #. (SDAM-eq2.9); call Fbar in future
            Fhat_var = np.sum(np.square(np.array(FhatIomega__lI) - Fhat_mean))/(L-1) #.

            Fhat__meanI_th_I[theta] = Fhat_mean
            Fhat__stdvI_th_I[theta]= np.sqrt(Fhat_var/L)
            best_theta = max(Fhat__meanI_th_I, key=Fhat__meanI_th_I.get)

            theta_obj[pi].append(Fhat_mean)
            theta_obj_std[pi].append(np.sqrt(Fhat_var/L))
            print("---.--- Finishing policy = {}, Truth_type {} and theta = {}. Fhat_mean = {:.3f} and Fhat_std = {:.3f}".format(pi,self.model.truth_type,theta,Fhat_mean,np.sqrt(Fhat_var/L))) #.
            states_avg = {e: [states_cum[e][0]/L, states_cum[e][1]/L, states_cum[e][2]/L] for e in eNames}
            print("---.--- Averages along {} sample-paths, each with {} trials:".format(L, N))
            for e in eNames:
                print("---.--- Treatment {}: mu_bar {:.2f}, beta_bar {:.2f} and N {}".format(e, states_avg[e][0], states_avg[e][1], states_avg[e][2]))            
            print('---.---')
            print("\n\n")
        policy_end = time.time()
        print(f"Finishing policy {pi} in {time.time() - tS:.2f} secs")
        print(f"Best theta: {best_theta}. Best cumC: {Fhat__meanI_th_I[best_theta]:,}")
        return Fhat__meanI_th_I, Fhat__stdvI_th_I, best_theta, record

    # dispatch {prepend @}
    def grid_search_theta_values(self, thetas0): #. using vectors reduces loops in perform_grid_search_sample_paths()
        thetas = [(th0,) for th0 in thetas0]
        return thetas

    def plot_Fhat_chart(self, FhatI_theta_I, thetasX, labelX, labelY, title, color_style):
        mpl.rcParams['lines.linewidth'] = 1.2
        xylabelsize = 18
        plt.figure(figsize=(25, 8))
        plt.title(title, fontsize=20)
        Fhats = FhatI_theta_I.values()
        plt.plot(thetasX, Fhats, color_style)
        plt.xlabel(labelX, rotation=0, ha='right', va='center', fontweight='bold', size=xylabelsize)
        plt.ylabel(labelY, rotation=0, ha='right', va='center', fontweight='bold', size=xylabelsize)
        plt.show()

    def plot_Fhat_charts(self, Fhat__means, Fhat__stdvs, thetasX, labelX, labelY1, labelY2, title):
        legendlabels = [r'$\mathrm{IE}$', r'$\mathrm{UCB}$', r'$\mathrm{PureExploitation}$', r'$\mathrm{PureExploration}$']
        n_charts = 2
        xylabelsize = 14
        mpl.rcParams['lines.linewidth'] = 1.2
        fig, axs = plt.subplots(n_charts, sharex=True)
        fig.set_figwidth(13); fig.set_figheight(9)
        fig.suptitle(title, fontsize=16)
        i = 0
        for j in range(len(Fhat__means)):
            axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
            axs[i].step(thetasX, Fhat__means[j]['Fhat__mean'])
        axs[i].set_ylabel(labelY1, rotation=0, ha='right', va='center', fontweight='bold', size=xylabelsize)
        i = 1
        for j in range(len(Fhat__stdvs)):
            axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
            axs[i].step(thetasX, Fhat__stdvs[j]['Fhat__stdv'])
        axs[i].set_ylabel(labelY2, rotation=0, ha='right', va='center', fontweight='bold', size=xylabelsize)
        axs[i].set_xlabel(labelX, rotation=0, ha='right', va='center', fontweight='bold', size=xylabelsize);
        fig.legend(labels=legendlabels, loc='lower left', fontsize=16)

    def plot_train(self, df, policy, comment):
      print(mpl.lines.lineStyles.keys())
      legendlabels =     [r'M', r'Sens', r'Secr', r'AGI', r'PA']
      n_charts = 15
      ylabelsize = 12
      mpl.rcParams['lines.linewidth'] = 1.2
      fig, axs = plt.subplots(n_charts, sharex=True)
      fig.set_figwidth(13); fig.set_figheight(18)
      fig.suptitle(f'TRAINING OF {policy} POLICY'+'\n'+f'{comment}'+'\n'+f'L = {L}, T = {N}', fontsize=16)

      i = 0 #Ns
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].plot(df['N__n_M'], 'r')
      axs[i].plot(df['N__n_Sens'], 'g')
      axs[i].plot(df['N__n_Secr'], 'b')
      axs[i].plot(df['N__n_AGI'], 'c')
      axs[i].plot(df['N__n_PA'], 'm')
      axs[i].set_ylabel('$N^n_x$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      for j in range(df.shape[0]//N): axs[i].axvline(x=j*N, color='grey', ls=':')

      i = 1 #x__n
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['best_x'], 'k:', label='_no_legend_', where='post');
      axs[i].set_ylabel(r'$best\_x$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      for j in range(df.shape[0]//N): axs[i].axvline(x=j*N, color='grey', ls=':')

      i = 2 #x__n
      axs[i].set_ylim(0,1); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['x__n_M'], 'r', label='_no_legend_', where='post')
      axs[i].set_ylabel(r'$x^n_{M}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      for j in range(df.shape[0]//N): axs[i].axvline(x=j*N, color='grey', ls=':')

      i = 3 #x__n
      axs[i].set_ylim(0,1); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['x__n_Sens'], 'g', label='_no_legend_', where='post')
      axs[i].set_ylabel(r'$x^n_{Sens}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      for j in range(df.shape[0]//N): axs[i].axvline(x=j*N, color='grey', ls=':')

      i = 4 #x__n
      axs[i].set_ylim(0,1); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['x__n_Secr'], 'b', label='_no_legend_', where='post')
      axs[i].set_ylabel(r'$x^n_{Secr}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      for j in range(df.shape[0]//N): axs[i].axvline(x=j*N, color='grey', ls=':')

      i = 5 #x__n
      axs[i].set_ylim(0,1); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['x__n_AGI'], 'c', label='_no_legend_', where='post')
      axs[i].set_ylabel(r'$x^n_{AGI}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      for j in range(df.shape[0]//N): axs[i].axvline(x=j*N, color='grey', ls=':')

      i = 6 #x__n
      axs[i].set_ylim(0,1); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['x__n_PA'], 'm', label='_no_legend_', where='post')
      axs[i].set_ylabel(r'$x^n_{PA}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      for j in range(df.shape[0]//N): axs[i].axvline(x=j*N, color='grey', ls=':')

      i = 7 #x=M
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['mu_M'], 'r:', label='_no_legend_', where='post');
      axs[i].step(df['mub__n_M'], 'r--', label='_no_legend_', where='post');
      axs[i].plot(df['Savg_mu_M'], 'k--', label='_no_legend_');
      axs[i].set_ylabel(r'$\mu^n_M$'+'\n'+r'$\bar{\mu}^n_M$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      for j in range(df.shape[0]//N): axs[i].axvline(x=j*N, color='grey', ls=':')

      i = 8 #x=Sens
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['mu_Sens'], 'g:', label='_no_legend_', where='post'); 
      axs[i].plot(df['mub__n_Sens'], 'g--', label='_no_legend_');
      axs[i].plot(df['Savg_mu_Sens'], 'k--', label='_no_legend_');
      axs[i].set_ylabel(r'$\mu^n_{Sens}$'+'\n'+r'$\bar{\mu}^n_{Sens}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      for j in range(df.shape[0]//N): axs[i].axvline(x=j*N, color='grey', ls=':')

      i = 9 #x=Secr
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['mu_Secr'], 'b:', label='_no_legend_', where='post'); 
      axs[i].plot(df['mub__n_Secr'], 'b--', label='_no_legend_'); 
      axs[i].plot(df['Savg_mu_Secr'], 'k--', label='_no_legend_');
      axs[i].set_ylabel(r'$\mu^n_{Secr}$'+'\n'+r'$\bar{\mu}^n_{Secr}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      for j in range(df.shape[0]//N): axs[i].axvline(x=j*N, color='grey', ls=':')

      i = 10 #x=AGI
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['mu_AGI'], 'c:', label='_no_legend_', where='post'); 
      axs[i].plot(df['mub__n_AGI'], 'c--', label='_no_legend_'); 
      axs[i].plot(df['Savg_mu_AGI'], 'k--', label='_no_legend_');
      axs[i].set_ylabel(r'$\mu^n_{AGI}$'+'\n'+r'$\bar{\mu}^n_{AGI}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      for j in range(df.shape[0]//N): axs[i].axvline(x=j*N, color='grey', ls=':')

      i = 11 #x=PA
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['mu_PA'], 'm:', label='_no_legend_', where='post'); 
      axs[i].plot(df['mub__n_PA'], 'm--', label='_no_legend_'); 
      axs[i].plot(df['Savg_mu_PA'], 'k--', label='_no_legend_');
      axs[i].set_ylabel(r'$\mu^n_{PA}$'+'\n'+r'$\bar{\mu}^n_{PA}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      for j in range(df.shape[0]//N): axs[i].axvline(x=j*N, color='grey', ls=':')

      i = 12 #CumReward
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['CumReward'], 'k', label='_no_legend_'); 
      axs[i].axhline(y=0, color='k', linestyle=':')
      axs[i].set_ylabel(r'$Cum$'+'\n'+r'$Reward$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      for j in range(df.shape[0]//N): axs[i].axvline(x=j*N, color='grey', ls=':')

      i = 13 #Fhat_mean
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['Fhat_mean'], 'k', label='_no_legend_', where='post');
      axs[i].set_ylabel(r'$Fhat\_mean$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      for j in range(df.shape[0]//N): axs[i].axvline(x=j*N, color='grey', ls=':')

      i = 14 #theta
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['Theta'], 'k', label='_no_legend_', where='post');
      axs[i].set_ylabel(r'$\theta$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      for j in range(df.shape[0]//N): axs[i].axvline(x=j*N, color='grey', ls=':')

      fig.legend(labels=legendlabels, loc='center')

    def plot_evalu(self, df_non, df, thetaStar):
      legendlabels =     [r'M', r'Sens', r'Secr', r'AGI', r'PA']

      n_charts = 17
      ylabelsize = 16
      mpl.rcParams['lines.linewidth'] = 1.2
      fig, axs = plt.subplots(n_charts, sharex=True)
      fig.set_figwidth(13); fig.set_figheight(18)
      fig.suptitle(f'PERFORMANCE OF OPTIMIZED IE POLICY\nOptimal (magenta), Non-optimal (cyan), '+r'$\theta^*$'+f'= {thetaStar}', fontsize=16)

      i = 0
      axs[i].set_ylim(0,1); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['x__n_M'], 'm', label='_no_legend_', where='post')
      axs[i].step(df_non['x__n_M'], 'c', label='_no_legend_', where='post')
      axs[i].set_ylabel(r'$x^n_{M}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

      i = 1
      axs[i].set_ylim(0,1); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['x__n_Sens'], 'm', label='_no_legend_', where='post')
      axs[i].step(df_non['x__n_Sens'], 'c', label='_no_legend_', where='post')
      axs[i].set_ylabel(r'$x^n_{Sens}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

      i = 2
      axs[i].set_ylim(0,1); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['x__n_Secr'], 'm', label='_no_legend_', where='post')
      axs[i].step(df_non['x__n_Secr'], 'c', label='_no_legend_', where='post')
      axs[i].set_ylabel(r'$x^n_{Secr}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

      i = 3
      axs[i].set_ylim(0,1); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['x__n_AGI'], 'm', label='_no_legend_', where='post')
      axs[i].step(df_non['x__n_AGI'], 'c', label='_no_legend_', where='post')
      axs[i].set_ylabel(r'$x^n_{AGI}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

      i = 4
      axs[i].set_ylim(0,1); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['x__n_PA'], 'm', label='_no_legend_', where='post')
      axs[i].step(df_non['x__n_PA'], 'c', label='_no_legend_', where='post')
      axs[i].set_ylabel(r'$x^n_{PA}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

      i = 5
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['mu__n_M'], 'r', label='_no_legend_')
      axs[i].step(df['mu__n_Sens'], 'g', label='_no_legend_')
      axs[i].step(df['mu__n_Secr'], 'b', label='_no_legend_')
      axs[i].step(df['mu__n_AGI'], 'y', label='_no_legend_')
      axs[i].step(df['mu__n_PA'], 'orange', label='_no_legend_')
      axs[i].set_ylabel(r'$\mu^n_f$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

      i = 6
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['mub__n_M'], 'm', label='_no_legend_')
      axs[i].step(df_non['mub__n_M'], 'c', label='_no_legend_')
      axs[i].axhline(y=0, color='k', linestyle=':')
      axs[i].set_ylabel(r'$\mu^n_{M}$'+'\n'+r'$\bar{\mu}^n_{M}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)

      i = 7
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['mub__n_Sens'], 'm', label='_no_legend_')
      axs[i].step(df_non['mub__n_Sens'], 'c', label='_no_legend_')
      axs[i].axhline(y=0, color='k', linestyle=':')
      axs[i].set_ylabel(r'$\mu^n_{Sens}$'+'\n'+r'$\bar{\mu}^n_{Sens}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      
      i = 8
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['mub__n_Secr'], 'm', label='_no_legend_')
      axs[i].step(df_non['mub__n_Secr'], 'c', label='_no_legend_')
      axs[i].axhline(y=0, color='k', linestyle=':')
      axs[i].set_ylabel(r'$\mu^n_{Secr}$'+'\n'+r'$\bar{\mu}^n_{Secr}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)

      i = 9
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['mub__n_AGI'], 'm', label='_no_legend_')
      axs[i].step(df_non['mub__n_AGI'], 'c', label='_no_legend_')
      axs[i].axhline(y=0, color='k', linestyle=':')
      axs[i].set_ylabel(r'$\mu^n_{AGI}$'+'\n'+r'$\bar{\mu}^n_{AGI}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)

      i = 10
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['mub__n_PA'], 'm', label='_no_legend_')
      axs[i].step(df_non['mub__n_PA'], 'c', label='_no_legend_')
      axs[i].axhline(y=0, color='k', linestyle=':')
      axs[i].set_ylabel(r'$\mu^n_{PA}$'+'\n'+r'$\bar{\mu}^n_{PA}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)

      i = 11
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['sgb__n_M'], 'm', label='_no_legend_')
      axs[i].step(df_non['sgb__n_M'], 'c', label='_no_legend_')
      axs[i].axhline(y=0, color='k', linestyle=':')
      axs[i].set_ylabel(r'$\sigma^n_{M}$'+'\n'+r'$\bar{\sigma}^n_{M}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)

      i = 12
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['sgb__n_Sens'], 'm', label='_no_legend_')
      axs[i].step(df_non['sgb__n_Sens'], 'c', label='_no_legend_')
      axs[i].axhline(y=0, color='k', linestyle=':')
      axs[i].set_ylabel(r'$\sigma^n_{Sens}$'+'\n'+r'$\bar{\sigma}^n_{Sens}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)
      
      i = 13
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['sgb__n_Secr'], 'm', label='_no_legend_')
      axs[i].step(df_non['sgb__n_Secr'], 'c', label='_no_legend_')
      axs[i].axhline(y=0, color='k', linestyle=':')
      axs[i].set_ylabel(r'$\sigma^n_{Secr}$'+'\n'+r'$\bar{\sigma}^n_{Secr}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)

      i = 14
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['sgb__n_AGI'], 'm', label='_no_legend_')
      axs[i].step(df_non['sgb__n_AGI'], 'c', label='_no_legend_')
      axs[i].axhline(y=0, color='k', linestyle=':')
      axs[i].set_ylabel(r'$\sigma^n_{AGI}$'+'\n'+r'$\bar{\sigma}^n_{AGI}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)

      i = 15
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(False)
      axs[i].step(df['sgb__n_PA'], 'm', label='_no_legend_')
      axs[i].step(df_non['sgb__n_PA'], 'c', label='_no_legend_')
      axs[i].axhline(y=0, color='k', linestyle=':')
      axs[i].set_ylabel(r'$\sigma^n_{PA}$'+'\n'+r'$\bar{\sigma}^n_{PA}$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize)

      i = 16
      axs[i].set_ylim(auto=True); axs[i].spines['top'].set_visible(False); axs[i].spines['right'].set_visible(True); axs[i].spines['bottom'].set_visible(True)
      axs[i].step(df['CumReward'], 'm', label='_no_legend_')
      axs[i].step(df_non['CumReward'], 'c', label='_no_legend_')
      axs[i].set_ylabel(r'$Cum$'+'\n'+r'$Reward$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);
      axs[i].set_xlabel('$n$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

      fig.legend(labels=legendlabels, loc='center')

4.6 Policy Evaluation

4.6.1 Training/Tuning

# UPDATE PARAMETERS
L = 100 #1000 #. number sample-paths / theta
N = 20 #20 #. number samples / sample-path

M = MedicalDecisionDiabetesModel(
    SNames, 
    xNames, 
    eNames,
    params,
    additional_params
)
M.printTruth()
M.printState()

print(f'{M.sigmaW=}')
# M.sigma__W = .01
print(f'{M.sigmaW=}')

P = MDDMPolicy(M, piNames, seed)

Model truth_type fixed_uniform. Measurement noise sigmaW 0.5 
Treatment M: par1 0.30, par2 -0.15 and par3 0.15
Treatment Sens: par1 0.30, par2 -0.15 and par3 0.15
Treatment Secr: par1 0.30, par2 -0.15 and par3 0.15
Treatment AGI: par1 0.30, par2 -0.15 and par3 0.15
Treatment PA: par1 0.30, par2 -0.15 and par3 0.15



Current state 
mub__n={'M': '0.32', 'Sens': '0.28', 'Secr': '0.30', 'AGI': '0.26', 'PA': '0.21'}
bet__n={'M': '69.44', 'Sens': '27.70', 'Secr': '34.60', 'AGI': '44.44', 'PA': '22.68'}
sgb__n={'M': '0.12', 'Sens': '0.19', 'Secr': '0.17', 'AGI': '0.15', 'PA': '0.21'}
N__n={'M': 0, 'Sens': 0, 'Secr': 0, 'AGI': 0, 'PA': 0}



M.sigmaW=0.5
M.sigmaW=0.5

# thetas_raw = np.arange(0, 2.1, 0.2); print(f'{thetas_raw=}')
thetas_raw = np.arange(0, 2.1, 0.1); print(f'{thetas_raw=}')
# thetas_raw = np.arange(0, 2.1, 0.05); print(f'{thetas_raw=}')
thetas = P.grid_search_theta_values(thetas_raw); print(f'{len(thetas)=}')

thetas_raw=array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2,
       1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. ])
len(thetas)=21

%%time
##########################################################################
if 'X__IE' in piNames:
    Fhat__mean_IE, Fhat__stdv_IE, thetaStar_IE, record_IE = \
        P.perform_grid_search_sample_paths(N, L, thetas, 'X__IE')
    df_IE_Fhat__mean = pd.DataFrame(
        [['IE', k[0], v] for k,v in Fhat__mean_IE.items()], 
        columns=['pi', 'theta', 'Fhat__mean'])
    print(f'df_IE_Fhat__mean=\n{df_IE_Fhat__mean}')
    df_IE_Fhat__stdv = pd.DataFrame(
        [['IE', k[0], v] for k,v in Fhat__stdv_IE.items()], 
        columns=['pi', 'theta', 'Fhat__stdv'])
    print(f'df_IE_Fhat__stdv=\n{df_IE_Fhat__stdv}')

theta_obj: {'X__UCB': [], 'X__IE': [], 'X__PureExploitation': [], 'X__PureExploration': []}
theta_obj_std: {'X__UCB': [], 'X__IE': [], 'X__PureExploitation': [], 'X__PureExploration': []}
--- Starting policy X__IE
---.--- theta: (0.0,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (0.0,). Fhat_mean = 6.095 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.34, beta_bar 4105.44 and N 1009.0
---.--- Treatment Sens: mu_bar 0.28, beta_bar 27.70 and N 0.0
---.--- Treatment Secr: mu_bar 0.21, beta_bar 38.60 and N 1.0
---.--- Treatment AGI: mu_bar 0.26, beta_bar 44.44 and N 0.0
---.--- Treatment PA: mu_bar 0.21, beta_bar 22.68 and N 0.0
---.---



---.--- theta: (0.1,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (0.1,). Fhat_mean = 6.096 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 12105.44 and N 3009.0
---.--- Treatment Sens: mu_bar 0.28, beta_bar 27.70 and N 0.0
---.--- Treatment Secr: mu_bar 0.21, beta_bar 38.60 and N 1.0
---.--- Treatment AGI: mu_bar 0.26, beta_bar 44.44 and N 0.0
---.--- Treatment PA: mu_bar 0.21, beta_bar 22.68 and N 0.0
---.---



---.--- theta: (0.2,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (0.2,). Fhat_mean = 6.096 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 20105.44 and N 5009.0
---.--- Treatment Sens: mu_bar 0.28, beta_bar 27.70 and N 0.0
---.--- Treatment Secr: mu_bar 0.21, beta_bar 38.60 and N 1.0
---.--- Treatment AGI: mu_bar 0.26, beta_bar 44.44 and N 0.0
---.--- Treatment PA: mu_bar 0.21, beta_bar 22.68 and N 0.0
---.---



---.--- theta: (0.30000000000000004,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (0.30000000000000004,). Fhat_mean = 6.096 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 28089.44 and N 7005.0
---.--- Treatment Sens: mu_bar 0.24, beta_bar 43.70 and N 4.0
---.--- Treatment Secr: mu_bar 0.21, beta_bar 38.60 and N 1.0
---.--- Treatment AGI: mu_bar 0.26, beta_bar 44.44 and N 0.0
---.--- Treatment PA: mu_bar 0.21, beta_bar 22.68 and N 0.0
---.---



---.--- theta: (0.4,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (0.4,). Fhat_mean = 6.096 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 36089.44 and N 9005.0
---.--- Treatment Sens: mu_bar 0.24, beta_bar 43.70 and N 4.0
---.--- Treatment Secr: mu_bar 0.21, beta_bar 38.60 and N 1.0
---.--- Treatment AGI: mu_bar 0.26, beta_bar 44.44 and N 0.0
---.--- Treatment PA: mu_bar 0.21, beta_bar 22.68 and N 0.0
---.---



---.--- theta: (0.5,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (0.5,). Fhat_mean = 6.098 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 44073.44 and N 11001.0
---.--- Treatment Sens: mu_bar 0.24, beta_bar 43.70 and N 4.0
---.--- Treatment Secr: mu_bar 0.21, beta_bar 38.60 and N 1.0
---.--- Treatment AGI: mu_bar 0.25, beta_bar 60.44 and N 4.0
---.--- Treatment PA: mu_bar 0.21, beta_bar 22.68 and N 0.0
---.---



---.--- theta: (0.6000000000000001,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (0.6000000000000001,). Fhat_mean = 6.097 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 52057.44 and N 12997.0
---.--- Treatment Sens: mu_bar 0.24, beta_bar 43.70 and N 4.0
---.--- Treatment Secr: mu_bar 0.21, beta_bar 38.60 and N 1.0
---.--- Treatment AGI: mu_bar 0.25, beta_bar 60.44 and N 4.0
---.--- Treatment PA: mu_bar 0.21, beta_bar 38.68 and N 4.0
---.---



---.--- theta: (0.7000000000000001,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (0.7000000000000001,). Fhat_mean = 6.096 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 60037.44 and N 14992.0
---.--- Treatment Sens: mu_bar 0.21, beta_bar 59.70 and N 8.0
---.--- Treatment Secr: mu_bar 0.21, beta_bar 38.60 and N 1.0
---.--- Treatment AGI: mu_bar 0.21, beta_bar 64.44 and N 5.0
---.--- Treatment PA: mu_bar 0.21, beta_bar 38.68 and N 4.0
---.---



---.--- theta: (0.8,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (0.8,). Fhat_mean = 6.093 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 68017.44 and N 16987.0
---.--- Treatment Sens: mu_bar 0.21, beta_bar 59.70 and N 8.0
---.--- Treatment Secr: mu_bar 0.16, beta_bar 54.60 and N 5.0
---.--- Treatment AGI: mu_bar 0.21, beta_bar 64.44 and N 5.0
---.--- Treatment PA: mu_bar 0.16, beta_bar 42.68 and N 5.0
---.---



---.--- theta: (0.9,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (0.9,). Fhat_mean = 6.096 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 76001.44 and N 18983.0
---.--- Treatment Sens: mu_bar 0.20, beta_bar 75.70 and N 12.0
---.--- Treatment Secr: mu_bar 0.16, beta_bar 54.60 and N 5.0
---.--- Treatment AGI: mu_bar 0.21, beta_bar 64.44 and N 5.0
---.--- Treatment PA: mu_bar 0.16, beta_bar 42.68 and N 5.0
---.---



---.--- theta: (1.0,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (1.0,). Fhat_mean = 6.096 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 84001.44 and N 20983.0
---.--- Treatment Sens: mu_bar 0.20, beta_bar 75.70 and N 12.0
---.--- Treatment Secr: mu_bar 0.16, beta_bar 54.60 and N 5.0
---.--- Treatment AGI: mu_bar 0.21, beta_bar 64.44 and N 5.0
---.--- Treatment PA: mu_bar 0.16, beta_bar 42.68 and N 5.0
---.---



---.--- theta: (1.1,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (1.1,). Fhat_mean = 6.099 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 91981.44 and N 22978.0
---.--- Treatment Sens: mu_bar 0.20, beta_bar 75.70 and N 12.0
---.--- Treatment Secr: mu_bar 0.16, beta_bar 54.60 and N 5.0
---.--- Treatment AGI: mu_bar 0.18, beta_bar 84.44 and N 10.0
---.--- Treatment PA: mu_bar 0.16, beta_bar 42.68 and N 5.0
---.---



---.--- theta: (1.2000000000000002,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (1.2000000000000002,). Fhat_mean = 6.097 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 99961.44 and N 24973.0
---.--- Treatment Sens: mu_bar 0.17, beta_bar 79.70 and N 13.0
---.--- Treatment Secr: mu_bar 0.16, beta_bar 54.60 and N 5.0
---.--- Treatment AGI: mu_bar 0.18, beta_bar 84.44 and N 10.0
---.--- Treatment PA: mu_bar 0.17, beta_bar 58.68 and N 9.0
---.---



---.--- theta: (1.3,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (1.3,). Fhat_mean = 6.097 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 107941.44 and N 26968.0
---.--- Treatment Sens: mu_bar 0.17, beta_bar 79.70 and N 13.0
---.--- Treatment Secr: mu_bar 0.11, beta_bar 58.60 and N 6.0
---.--- Treatment AGI: mu_bar 0.18, beta_bar 84.44 and N 10.0
---.--- Treatment PA: mu_bar 0.18, beta_bar 74.68 and N 13.0
---.---



---.--- theta: (1.4000000000000001,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (1.4000000000000001,). Fhat_mean = 6.097 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 115925.44 and N 28964.0
---.--- Treatment Sens: mu_bar 0.17, beta_bar 79.70 and N 13.0
---.--- Treatment Secr: mu_bar 0.11, beta_bar 58.60 and N 6.0
---.--- Treatment AGI: mu_bar 0.18, beta_bar 84.44 and N 10.0
---.--- Treatment PA: mu_bar 0.18, beta_bar 90.68 and N 17.0
---.---



---.--- theta: (1.5,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (1.5,). Fhat_mean = 5.899 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 122493.64 and N 30606.05
---.--- Treatment Sens: mu_bar 0.30, beta_bar 1488.26 and N 365.14
---.--- Treatment Secr: mu_bar 0.11, beta_bar 58.60 and N 6.0
---.--- Treatment AGI: mu_bar 0.18, beta_bar 103.68 and N 14.81
---.--- Treatment PA: mu_bar 0.15, beta_bar 94.68 and N 18.0
---.---



---.--- theta: (1.6,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (1.6,). Fhat_mean = 6.096 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 130325.44 and N 32564.0
---.--- Treatment Sens: mu_bar 0.29, beta_bar 1651.70 and N 406.0
---.--- Treatment Secr: mu_bar 0.11, beta_bar 58.60 and N 6.0
---.--- Treatment AGI: mu_bar 0.15, beta_bar 108.44 and N 16.0
---.--- Treatment PA: mu_bar 0.15, beta_bar 94.68 and N 18.0
---.---



---.--- theta: (1.7000000000000002,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (1.7000000000000002,). Fhat_mean = 6.055 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 138178.24 and N 34527.2
---.--- Treatment Sens: mu_bar 0.29, beta_bar 1798.90 and N 442.8
---.--- Treatment Secr: mu_bar 0.11, beta_bar 58.60 and N 6.0
---.--- Treatment AGI: mu_bar 0.15, beta_bar 108.44 and N 16.0
---.--- Treatment PA: mu_bar 0.15, beta_bar 94.68 and N 18.0
---.---



---.--- theta: (1.8,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (1.8,). Fhat_mean = 5.939 and Fhat_std = 0.170
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 144632.00 and N 36140.64
---.--- Treatment Sens: mu_bar 0.30, beta_bar 3325.14 and N 824.36
---.--- Treatment Secr: mu_bar 0.11, beta_bar 74.60 and N 10.0
---.--- Treatment AGI: mu_bar 0.15, beta_bar 108.44 and N 16.0
---.--- Treatment PA: mu_bar 0.13, beta_bar 98.68 and N 19.0
---.---



---.--- theta: (1.9000000000000001,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (1.9000000000000001,). Fhat_mean = 6.098 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 152473.44 and N 38101.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 3467.70 and N 860.0
---.--- Treatment Secr: mu_bar 0.11, beta_bar 74.60 and N 10.0
---.--- Treatment AGI: mu_bar 0.16, beta_bar 124.44 and N 20.0
---.--- Treatment PA: mu_bar 0.13, beta_bar 98.68 and N 19.0
---.---



---.--- theta: (2.0,)
---.--- Finishing policy = X__IE, Truth_type fixed_uniform and theta = (2.0,). Fhat_mean = 5.838 and Fhat_std = 0.162
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 156433.44 and N 39091.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 3467.70 and N 860.0
---.--- Treatment Secr: mu_bar 0.07, beta_bar 78.60 and N 11.0
---.--- Treatment AGI: mu_bar 0.33, beta_bar 4160.44 and N 1029.0
---.--- Treatment PA: mu_bar 0.13, beta_bar 98.68 and N 19.0
---.---



Finishing policy X__IE in 1.65 secs
Best theta: (1.1,). Best cumC: 6.098647977887929
df_IE_Fhat__mean=
    pi  theta  Fhat__mean
0   IE 0.0000      6.0954
1   IE 0.1000      6.0962
2   IE 0.2000      6.0962
3   IE 0.3000      6.0956
4   IE 0.4000      6.0962
5   IE 0.5000      6.0981
6   IE 0.6000      6.0973
7   IE 0.7000      6.0961
8   IE 0.8000      6.0934
9   IE 0.9000      6.0956
10  IE 1.0000      6.0962
11  IE 1.1000      6.0986
12  IE 1.2000      6.0972
13  IE 1.3000      6.0965
14  IE 1.4000      6.0973
15  IE 1.5000      5.8986
16  IE 1.6000      6.0961
17  IE 1.7000      6.0545
18  IE 1.8000      5.9389
19  IE 1.9000      6.0980
20  IE 2.0000      5.8382
df_IE_Fhat__stdv=
    pi  theta  Fhat__stdv
0   IE 0.0000      0.1674
1   IE 0.1000      0.1674
2   IE 0.2000      0.1674
3   IE 0.3000      0.1674
4   IE 0.4000      0.1674
5   IE 0.5000      0.1673
6   IE 0.6000      0.1673
7   IE 0.7000      0.1674
8   IE 0.8000      0.1674
9   IE 0.9000      0.1674
10  IE 1.0000      0.1674
11  IE 1.1000      0.1673
12  IE 1.2000      0.1674
13  IE 1.3000      0.1674
14  IE 1.4000      0.1673
15  IE 1.5000      0.1720
16  IE 1.6000      0.1674
17  IE 1.7000      0.1673
18  IE 1.8000      0.1698
19  IE 1.9000      0.1673
20  IE 2.0000      0.1615
CPU times: user 1.59 s, sys: 37.4 ms, total: 1.62 s
Wall time: 1.66 s

%%time
##########################################################################
if 'X__UCB' in piNames:
    Fhat__mean_UCB, Fhat__stdv_UCB, thetaStar_UCB, record_UCB = \
        P.perform_grid_search_sample_paths(N, L, thetas, 'X__UCB')
    df_UCB_Fhat__mean = pd.DataFrame(
        [['UCB', k[0], v] for k,v in Fhat__mean_UCB.items()], 
        columns=['pi', 'theta', 'Fhat__mean'])
    print(f'df_UCB_Fhat__mean=\n{df_UCB_Fhat__mean}')
    df_UCB_Fhat__stdv = pd.DataFrame(
        [['UCB', k[0], v] for k,v in Fhat__stdv_UCB.items()], 
        columns=['pi', 'theta', 'Fhat__stdv'])
    print(f'df_UCB_Fhat__stdv=\n{df_UCB_Fhat__stdv}')

theta_obj: {'X__UCB': [], 'X__IE': [], 'X__PureExploitation': [], 'X__PureExploration': []}
theta_obj_std: {'X__UCB': [], 'X__IE': [], 'X__PureExploitation': [], 'X__PureExploration': []}
--- Starting policy X__UCB
---.--- theta: (0.0,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (0.0,). Fhat_mean = 6.096 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 160473.44 and N 40101.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 3467.70 and N 860.0
---.--- Treatment Secr: mu_bar 0.07, beta_bar 78.60 and N 11.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 8120.44 and N 2019.0
---.--- Treatment PA: mu_bar 0.13, beta_bar 98.68 and N 19.0
---.---



---.--- theta: (0.1,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (0.1,). Fhat_mean = 6.096 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 168473.44 and N 42101.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 3467.70 and N 860.0
---.--- Treatment Secr: mu_bar 0.07, beta_bar 78.60 and N 11.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 8120.44 and N 2019.0
---.--- Treatment PA: mu_bar 0.13, beta_bar 98.68 and N 19.0
---.---



---.--- theta: (0.2,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (0.2,). Fhat_mean = 6.096 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 176473.44 and N 44101.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 3467.70 and N 860.0
---.--- Treatment Secr: mu_bar 0.07, beta_bar 78.60 and N 11.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 8120.44 and N 2019.0
---.--- Treatment PA: mu_bar 0.13, beta_bar 98.68 and N 19.0
---.---



---.--- theta: (0.30000000000000004,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (0.30000000000000004,). Fhat_mean = 6.096 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 184473.44 and N 46101.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 3467.70 and N 860.0
---.--- Treatment Secr: mu_bar 0.07, beta_bar 78.60 and N 11.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 8120.44 and N 2019.0
---.--- Treatment PA: mu_bar 0.13, beta_bar 98.68 and N 19.0
---.---



---.--- theta: (0.4,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (0.4,). Fhat_mean = 6.096 and Fhat_std = 0.167
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 192473.44 and N 48101.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 3467.70 and N 860.0
---.--- Treatment Secr: mu_bar 0.07, beta_bar 78.60 and N 11.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 8120.44 and N 2019.0
---.--- Treatment PA: mu_bar 0.13, beta_bar 98.68 and N 19.0
---.---



---.--- theta: (0.5,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (0.5,). Fhat_mean = 6.043 and Fhat_std = 0.139
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 198317.80 and N 49562.09
---.--- Treatment Sens: mu_bar 0.30, beta_bar 3467.70 and N 860.0
---.--- Treatment Secr: mu_bar 0.07, beta_bar 78.60 and N 11.0
---.--- Treatment AGI: mu_bar 0.32, beta_bar 10276.08 and N 2557.91
---.--- Treatment PA: mu_bar 0.13, beta_bar 98.68 and N 19.0
---.---



---.--- theta: (0.6000000000000001,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (0.6000000000000001,). Fhat_mean = 5.855 and Fhat_std = 0.152
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 201812.96 and N 50435.88
---.--- Treatment Sens: mu_bar 0.30, beta_bar 3499.34 and N 867.91
---.--- Treatment Secr: mu_bar 0.32, beta_bar 3462.40 and N 856.95
---.--- Treatment AGI: mu_bar 0.31, beta_bar 11362.64 and N 2829.55
---.--- Treatment PA: mu_bar 0.11, beta_bar 101.52 and N 19.71
---.---



---.--- theta: (0.7000000000000001,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (0.7000000000000001,). Fhat_mean = 6.263 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.04 and N 50975.9
---.--- Treatment Sens: mu_bar 0.30, beta_bar 3575.70 and N 887.0
---.--- Treatment Secr: mu_bar 0.30, beta_bar 5070.60 and N 1259.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 11524.44 and N 2870.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 4095.08 and N 1018.1
---.---



---.--- theta: (0.8,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (0.8,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 3575.70 and N 887.0
---.--- Treatment Secr: mu_bar 0.30, beta_bar 5070.60 and N 1259.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 11524.44 and N 2870.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 12094.68 and N 3018.0
---.---



---.--- theta: (0.9,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (0.9,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 3575.70 and N 887.0
---.--- Treatment Secr: mu_bar 0.30, beta_bar 5070.60 and N 1259.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 11524.44 and N 2870.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 20094.68 and N 5018.0
---.---



---.--- theta: (1.0,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (1.0,). Fhat_mean = 6.263 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 3577.14 and N 887.36
---.--- Treatment Secr: mu_bar 0.30, beta_bar 5070.60 and N 1259.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 11524.44 and N 2870.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 28093.24 and N 7017.64
---.---



---.--- theta: (1.1,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (1.1,). Fhat_mean = 6.266 and Fhat_std = 0.164
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.29, beta_bar 3910.98 and N 970.82
---.--- Treatment Secr: mu_bar 0.30, beta_bar 5110.52 and N 1268.98
---.--- Treatment AGI: mu_bar 0.31, beta_bar 11524.44 and N 2870.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 35719.48 and N 8924.2
---.---



---.--- theta: (1.2000000000000002,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (1.2000000000000002,). Fhat_mean = 6.147 and Fhat_std = 0.146
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 4841.54 and N 1203.46
---.--- Treatment Secr: mu_bar 0.30, beta_bar 5616.80 and N 1395.55
---.--- Treatment AGI: mu_bar 0.31, beta_bar 11524.44 and N 2870.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 42282.64 and N 10564.99
---.---



---.--- theta: (1.3,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (1.3,). Fhat_mean = 6.217 and Fhat_std = 0.166
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 5666.58 and N 1409.72
---.--- Treatment Secr: mu_bar 0.30, beta_bar 5791.00 and N 1439.1
---.--- Treatment AGI: mu_bar 0.31, beta_bar 11524.44 and N 2870.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 49283.40 and N 12315.18
---.---



---.--- theta: (1.4000000000000001,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (1.4000000000000001,). Fhat_mean = 6.100 and Fhat_std = 0.142
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 5917.50 and N 1472.45
---.--- Treatment Secr: mu_bar 0.30, beta_bar 7018.72 and N 1746.03
---.--- Treatment AGI: mu_bar 0.31, beta_bar 11543.64 and N 2874.8
---.--- Treatment PA: mu_bar 0.34, beta_bar 55785.56 and N 13940.72
---.---



---.--- theta: (1.5,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (1.5,). Fhat_mean = 6.137 and Fhat_std = 0.141
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 7334.82 and N 1826.78
---.--- Treatment Secr: mu_bar 0.30, beta_bar 7481.56 and N 1861.74
---.--- Treatment AGI: mu_bar 0.31, beta_bar 11628.68 and N 2896.06
---.--- Treatment PA: mu_bar 0.34, beta_bar 61820.36 and N 15449.42
---.---



---.--- theta: (1.6,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (1.6,). Fhat_mean = 6.118 and Fhat_std = 0.134
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 8213.22 and N 2046.38
---.--- Treatment Secr: mu_bar 0.30, beta_bar 7589.88 and N 1888.82
---.--- Treatment AGI: mu_bar 0.31, beta_bar 13186.76 and N 3285.58
---.--- Treatment PA: mu_bar 0.34, beta_bar 67275.56 and N 16813.22
---.---



---.--- theta: (1.7000000000000002,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (1.7000000000000002,). Fhat_mean = 6.156 and Fhat_std = 0.138
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 9379.38 and N 2337.92
---.--- Treatment Secr: mu_bar 0.29, beta_bar 7682.24 and N 1911.91
---.--- Treatment AGI: mu_bar 0.31, beta_bar 14520.48 and N 3619.01
---.--- Treatment PA: mu_bar 0.34, beta_bar 72683.32 and N 18165.16
---.---



---.--- theta: (1.8,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (1.8,). Fhat_mean = 6.112 and Fhat_std = 0.138
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 11360.50 and N 2833.2
---.--- Treatment Secr: mu_bar 0.29, beta_bar 7803.60 and N 1942.25
---.--- Treatment AGI: mu_bar 0.31, beta_bar 14688.52 and N 3661.02
---.--- Treatment PA: mu_bar 0.34, beta_bar 78412.80 and N 19597.53
---.---



---.--- theta: (1.9000000000000001,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (1.9000000000000001,). Fhat_mean = 6.117 and Fhat_std = 0.134
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 12360.86 and N 3083.29
---.--- Treatment Secr: mu_bar 0.29, beta_bar 8733.80 and N 2174.8
---.--- Treatment AGI: mu_bar 0.31, beta_bar 14951.92 and N 3726.87
---.--- Treatment PA: mu_bar 0.34, beta_bar 84218.84 and N 21049.04
---.---



---.--- theta: (2.0,)
---.--- Finishing policy = X__UCB, Truth_type fixed_uniform and theta = (2.0,). Fhat_mean = 6.147 and Fhat_std = 0.127
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 13418.46 and N 3347.69
---.--- Treatment Secr: mu_bar 0.29, beta_bar 9891.76 and N 2464.29
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15299.48 and N 3813.76
---.--- Treatment PA: mu_bar 0.34, beta_bar 89655.72 and N 22408.26
---.---



Finishing policy X__UCB in 2.86 secs
Best theta: (1.1,). Best cumC: 6.2656871483274434
df_UCB_Fhat__mean=
     pi  theta  Fhat__mean
0   UCB 0.0000      6.0962
1   UCB 0.1000      6.0962
2   UCB 0.2000      6.0962
3   UCB 0.3000      6.0962
4   UCB 0.4000      6.0962
5   UCB 0.5000      6.0433
6   UCB 0.6000      5.8546
7   UCB 0.7000      6.2628
8   UCB 0.8000      6.2653
9   UCB 0.9000      6.2653
10  UCB 1.0000      6.2634
11  UCB 1.1000      6.2657
12  UCB 1.2000      6.1466
13  UCB 1.3000      6.2173
14  UCB 1.4000      6.1000
15  UCB 1.5000      6.1365
16  UCB 1.6000      6.1183
17  UCB 1.7000      6.1564
18  UCB 1.8000      6.1120
19  UCB 1.9000      6.1166
20  UCB 2.0000      6.1468
df_UCB_Fhat__stdv=
     pi  theta  Fhat__stdv
0   UCB 0.0000      0.1674
1   UCB 0.1000      0.1674
2   UCB 0.2000      0.1674
3   UCB 0.3000      0.1674
4   UCB 0.4000      0.1674
5   UCB 0.5000      0.1387
6   UCB 0.6000      0.1516
7   UCB 0.7000      0.1720
8   UCB 0.8000      0.1722
9   UCB 0.9000      0.1722
10  UCB 1.0000      0.1718
11  UCB 1.1000      0.1636
12  UCB 1.2000      0.1455
13  UCB 1.3000      0.1660
14  UCB 1.4000      0.1421
15  UCB 1.5000      0.1408
16  UCB 1.6000      0.1342
17  UCB 1.7000      0.1378
18  UCB 1.8000      0.1385
19  UCB 1.9000      0.1337
20  UCB 2.0000      0.1273
CPU times: user 2.76 s, sys: 29.2 ms, total: 2.79 s
Wall time: 2.87 s

%%time
##########################################################################
if 'X__PureExploitation' in piNames:
    Fhat__mean_PureExploitation, Fhat__stdv_PureExploitation, thetaStar_PureExploitation, record_PureExploitation = \
        P.perform_grid_search_sample_paths(N, L, thetas, 'X__PureExploitation')
    df_PureExploitation_Fhat__mean = pd.DataFrame(
        [['PureExploitation', k[0], v] for k,v in Fhat__mean_PureExploitation.items()], 
        columns=['pi', 'theta', 'Fhat__mean'])
    print(f'df_PureExploitation_Fhat__mean=\n{df_PureExploitation_Fhat__mean}')
    df_PureExploitation_Fhat__stdv = pd.DataFrame(
        [['PureExploitation', k[0], v] for k,v in Fhat__stdv_PureExploitation.items()], 
        columns=['pi', 'theta', 'Fhat__stdv'])
    print(f'df_PureExploitation_Fhat__stdv=\n{df_PureExploitation_Fhat__stdv}')

theta_obj: {'X__UCB': [], 'X__IE': [], 'X__PureExploitation': [], 'X__PureExploration': []}
theta_obj_std: {'X__UCB': [], 'X__IE': [], 'X__PureExploitation': [], 'X__PureExploration': []}
--- Starting policy X__PureExploitation
---.--- theta: (0.0,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (0.0,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 96426.68 and N 24101.0
---.---



---.--- theta: (0.1,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (0.1,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 104426.68 and N 26101.0
---.---



---.--- theta: (0.2,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (0.2,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 112426.68 and N 28101.0
---.---



---.--- theta: (0.30000000000000004,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (0.30000000000000004,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 120426.68 and N 30101.0
---.---



---.--- theta: (0.4,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (0.4,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 128426.68 and N 32101.0
---.---



---.--- theta: (0.5,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (0.5,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 136426.68 and N 34101.0
---.---



---.--- theta: (0.6000000000000001,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (0.6000000000000001,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 144426.68 and N 36101.0
---.---



---.--- theta: (0.7000000000000001,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (0.7000000000000001,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 152426.68 and N 38101.0
---.---



---.--- theta: (0.8,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (0.8,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 160426.68 and N 40101.0
---.---



---.--- theta: (0.9,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (0.9,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 168426.68 and N 42101.0
---.---



---.--- theta: (1.0,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (1.0,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 176426.68 and N 44101.0
---.---



---.--- theta: (1.1,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (1.1,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 184426.68 and N 46101.0
---.---



---.--- theta: (1.2000000000000002,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (1.2000000000000002,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 192426.68 and N 48101.0
---.---



---.--- theta: (1.3,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (1.3,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 200426.68 and N 50101.0
---.---



---.--- theta: (1.4000000000000001,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (1.4000000000000001,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 208426.68 and N 52101.0
---.---



---.--- theta: (1.5,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (1.5,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 216426.68 and N 54101.0
---.---



---.--- theta: (1.6,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (1.6,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 224426.68 and N 56101.0
---.---



---.--- theta: (1.7000000000000002,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (1.7000000000000002,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 232426.68 and N 58101.0
---.---



---.--- theta: (1.8,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (1.8,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 240426.68 and N 60101.0
---.---



---.--- theta: (1.9000000000000001,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (1.9000000000000001,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 248426.68 and N 62101.0
---.---



---.--- theta: (2.0,)
---.--- Finishing policy = X__PureExploitation, Truth_type fixed_uniform and theta = (2.0,). Fhat_mean = 6.265 and Fhat_std = 0.172
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 203973.44 and N 50976.0
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14019.70 and N 3498.0
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10186.60 and N 2538.0
---.--- Treatment AGI: mu_bar 0.31, beta_bar 15632.44 and N 3897.0
---.--- Treatment PA: mu_bar 0.34, beta_bar 256426.68 and N 64101.0
---.---



Finishing policy X__PureExploitation in 1.68 secs
Best theta: (0.0,). Best cumC: 6.265337348335222
df_PureExploitation_Fhat__mean=
                  pi  theta  Fhat__mean
0   PureExploitation 0.0000      6.2653
1   PureExploitation 0.1000      6.2653
2   PureExploitation 0.2000      6.2653
3   PureExploitation 0.3000      6.2653
4   PureExploitation 0.4000      6.2653
5   PureExploitation 0.5000      6.2653
6   PureExploitation 0.6000      6.2653
7   PureExploitation 0.7000      6.2653
8   PureExploitation 0.8000      6.2653
9   PureExploitation 0.9000      6.2653
10  PureExploitation 1.0000      6.2653
11  PureExploitation 1.1000      6.2653
12  PureExploitation 1.2000      6.2653
13  PureExploitation 1.3000      6.2653
14  PureExploitation 1.4000      6.2653
15  PureExploitation 1.5000      6.2653
16  PureExploitation 1.6000      6.2653
17  PureExploitation 1.7000      6.2653
18  PureExploitation 1.8000      6.2653
19  PureExploitation 1.9000      6.2653
20  PureExploitation 2.0000      6.2653
df_PureExploitation_Fhat__stdv=
                  pi  theta  Fhat__stdv
0   PureExploitation 0.0000      0.1722
1   PureExploitation 0.1000      0.1722
2   PureExploitation 0.2000      0.1722
3   PureExploitation 0.3000      0.1722
4   PureExploitation 0.4000      0.1722
5   PureExploitation 0.5000      0.1722
6   PureExploitation 0.6000      0.1722
7   PureExploitation 0.7000      0.1722
8   PureExploitation 0.8000      0.1722
9   PureExploitation 0.9000      0.1722
10  PureExploitation 1.0000      0.1722
11  PureExploitation 1.1000      0.1722
12  PureExploitation 1.2000      0.1722
13  PureExploitation 1.3000      0.1722
14  PureExploitation 1.4000      0.1722
15  PureExploitation 1.5000      0.1722
16  PureExploitation 1.6000      0.1722
17  PureExploitation 1.7000      0.1722
18  PureExploitation 1.8000      0.1722
19  PureExploitation 1.9000      0.1722
20  PureExploitation 2.0000      0.1722
CPU times: user 1.63 s, sys: 30.4 ms, total: 1.66 s
Wall time: 1.68 s

%%time
##########################################################################
if 'X__PureExploration' in piNames:
    Fhat__mean_PureExploration, Fhat__stdv_PureExploration, thetaStar_PureExploration, record_PureExploration = \
        P.perform_grid_search_sample_paths(N, L, thetas, 'X__PureExploration')
    df_PureExploration_Fhat__mean = pd.DataFrame(
        [['PureExploration', k[0], v] for k,v in Fhat__mean_PureExploration.items()], 
        columns=['pi', 'theta', 'Fhat__mean'])
    print(f'df_PureExploration_Fhat__mean=\n{df_PureExploration_Fhat__mean}')
    df_PureExploration_Fhat__stdv = pd.DataFrame(
        [['PureExploration', k[0], v] for k,v in Fhat__stdv_PureExploration.items()], 
        columns=['pi', 'theta', 'Fhat__stdv'])
    print(f'df_PureExploration_Fhat__stdv=\n{df_PureExploration_Fhat__stdv}')

theta_obj: {'X__UCB': [], 'X__IE': [], 'X__PureExploitation': [], 'X__PureExploration': []}
theta_obj_std: {'X__UCB': [], 'X__IE': [], 'X__PureExploitation': [], 'X__PureExploration': []}
--- Starting policy X__PureExploration
---.--- theta: (0.0,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (0.0,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 204816.12 and N 51186.67
---.--- Treatment Sens: mu_bar 0.30, beta_bar 14811.58 and N 3695.97
---.--- Treatment Secr: mu_bar 0.29, beta_bar 10973.12 and N 2734.63
---.--- Treatment AGI: mu_bar 0.31, beta_bar 16421.72 and N 4094.32
---.--- Treatment PA: mu_bar 0.34, beta_bar 261216.32 and N 65298.41
---.---



---.--- theta: (0.1,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (0.1,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 206504.12 and N 51608.67
---.--- Treatment Sens: mu_bar 0.30, beta_bar 16323.58 and N 4073.97
---.--- Treatment Secr: mu_bar 0.30, beta_bar 12565.12 and N 3132.63
---.--- Treatment AGI: mu_bar 0.31, beta_bar 17905.72 and N 4465.32
---.--- Treatment PA: mu_bar 0.34, beta_bar 262940.32 and N 65729.41
---.---



---.--- theta: (0.2,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (0.2,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 208192.12 and N 52030.67
---.--- Treatment Sens: mu_bar 0.31, beta_bar 17835.58 and N 4451.97
---.--- Treatment Secr: mu_bar 0.30, beta_bar 14157.12 and N 3530.63
---.--- Treatment AGI: mu_bar 0.31, beta_bar 19389.72 and N 4836.32
---.--- Treatment PA: mu_bar 0.34, beta_bar 264664.32 and N 66160.41
---.---



---.--- theta: (0.30000000000000004,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (0.30000000000000004,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 209880.12 and N 52452.67
---.--- Treatment Sens: mu_bar 0.31, beta_bar 19347.58 and N 4829.97
---.--- Treatment Secr: mu_bar 0.30, beta_bar 15749.12 and N 3928.63
---.--- Treatment AGI: mu_bar 0.32, beta_bar 20873.72 and N 5207.32
---.--- Treatment PA: mu_bar 0.34, beta_bar 266388.32 and N 66591.41
---.---



---.--- theta: (0.4,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (0.4,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 211568.12 and N 52874.67
---.--- Treatment Sens: mu_bar 0.31, beta_bar 20859.58 and N 5207.97
---.--- Treatment Secr: mu_bar 0.30, beta_bar 17341.12 and N 4326.63
---.--- Treatment AGI: mu_bar 0.32, beta_bar 22357.72 and N 5578.32
---.--- Treatment PA: mu_bar 0.34, beta_bar 268112.32 and N 67022.41
---.---



---.--- theta: (0.5,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (0.5,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 213256.12 and N 53296.67
---.--- Treatment Sens: mu_bar 0.31, beta_bar 22371.58 and N 5585.97
---.--- Treatment Secr: mu_bar 0.30, beta_bar 18933.12 and N 4724.63
---.--- Treatment AGI: mu_bar 0.32, beta_bar 23841.72 and N 5949.32
---.--- Treatment PA: mu_bar 0.34, beta_bar 269836.32 and N 67453.41
---.---



---.--- theta: (0.6000000000000001,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (0.6000000000000001,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 214944.12 and N 53718.67
---.--- Treatment Sens: mu_bar 0.31, beta_bar 23883.58 and N 5963.97
---.--- Treatment Secr: mu_bar 0.30, beta_bar 20525.12 and N 5122.63
---.--- Treatment AGI: mu_bar 0.32, beta_bar 25325.72 and N 6320.32
---.--- Treatment PA: mu_bar 0.34, beta_bar 271560.32 and N 67884.41
---.---



---.--- theta: (0.7000000000000001,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (0.7000000000000001,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 216632.12 and N 54140.67
---.--- Treatment Sens: mu_bar 0.31, beta_bar 25395.58 and N 6341.97
---.--- Treatment Secr: mu_bar 0.30, beta_bar 22117.12 and N 5520.63
---.--- Treatment AGI: mu_bar 0.32, beta_bar 26809.72 and N 6691.32
---.--- Treatment PA: mu_bar 0.34, beta_bar 273284.32 and N 68315.41
---.---



---.--- theta: (0.8,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (0.8,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 218320.12 and N 54562.67
---.--- Treatment Sens: mu_bar 0.31, beta_bar 26907.58 and N 6719.97
---.--- Treatment Secr: mu_bar 0.30, beta_bar 23709.12 and N 5918.63
---.--- Treatment AGI: mu_bar 0.32, beta_bar 28293.72 and N 7062.32
---.--- Treatment PA: mu_bar 0.34, beta_bar 275008.32 and N 68746.41
---.---



---.--- theta: (0.9,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (0.9,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 220008.12 and N 54984.67
---.--- Treatment Sens: mu_bar 0.31, beta_bar 28419.58 and N 7097.97
---.--- Treatment Secr: mu_bar 0.30, beta_bar 25301.12 and N 6316.63
---.--- Treatment AGI: mu_bar 0.32, beta_bar 29777.72 and N 7433.32
---.--- Treatment PA: mu_bar 0.34, beta_bar 276732.32 and N 69177.41
---.---



---.--- theta: (1.0,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (1.0,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 221696.12 and N 55406.67
---.--- Treatment Sens: mu_bar 0.31, beta_bar 29931.58 and N 7475.97
---.--- Treatment Secr: mu_bar 0.30, beta_bar 26893.12 and N 6714.63
---.--- Treatment AGI: mu_bar 0.32, beta_bar 31261.72 and N 7804.32
---.--- Treatment PA: mu_bar 0.34, beta_bar 278456.32 and N 69608.41
---.---



---.--- theta: (1.1,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (1.1,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 223384.12 and N 55828.67
---.--- Treatment Sens: mu_bar 0.31, beta_bar 31443.58 and N 7853.97
---.--- Treatment Secr: mu_bar 0.30, beta_bar 28485.12 and N 7112.63
---.--- Treatment AGI: mu_bar 0.32, beta_bar 32745.72 and N 8175.32
---.--- Treatment PA: mu_bar 0.34, beta_bar 280180.32 and N 70039.41
---.---



---.--- theta: (1.2000000000000002,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (1.2000000000000002,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 225072.12 and N 56250.67
---.--- Treatment Sens: mu_bar 0.31, beta_bar 32955.58 and N 8231.97
---.--- Treatment Secr: mu_bar 0.31, beta_bar 30077.12 and N 7510.63
---.--- Treatment AGI: mu_bar 0.32, beta_bar 34229.72 and N 8546.32
---.--- Treatment PA: mu_bar 0.34, beta_bar 281904.32 and N 70470.41
---.---



---.--- theta: (1.3,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (1.3,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 226760.12 and N 56672.67
---.--- Treatment Sens: mu_bar 0.31, beta_bar 34467.58 and N 8609.97
---.--- Treatment Secr: mu_bar 0.31, beta_bar 31669.12 and N 7908.63
---.--- Treatment AGI: mu_bar 0.32, beta_bar 35713.72 and N 8917.32
---.--- Treatment PA: mu_bar 0.34, beta_bar 283628.32 and N 70901.41
---.---



---.--- theta: (1.4000000000000001,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (1.4000000000000001,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 228448.12 and N 57094.67
---.--- Treatment Sens: mu_bar 0.31, beta_bar 35979.58 and N 8987.97
---.--- Treatment Secr: mu_bar 0.31, beta_bar 33261.12 and N 8306.63
---.--- Treatment AGI: mu_bar 0.33, beta_bar 37197.72 and N 9288.32
---.--- Treatment PA: mu_bar 0.34, beta_bar 285352.32 and N 71332.41
---.---



---.--- theta: (1.5,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (1.5,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 230136.12 and N 57516.67
---.--- Treatment Sens: mu_bar 0.31, beta_bar 37491.58 and N 9365.97
---.--- Treatment Secr: mu_bar 0.31, beta_bar 34853.12 and N 8704.63
---.--- Treatment AGI: mu_bar 0.33, beta_bar 38681.72 and N 9659.32
---.--- Treatment PA: mu_bar 0.34, beta_bar 287076.32 and N 71763.41
---.---



---.--- theta: (1.6,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (1.6,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 231824.12 and N 57938.67
---.--- Treatment Sens: mu_bar 0.31, beta_bar 39003.58 and N 9743.97
---.--- Treatment Secr: mu_bar 0.31, beta_bar 36445.12 and N 9102.63
---.--- Treatment AGI: mu_bar 0.33, beta_bar 40165.72 and N 10030.32
---.--- Treatment PA: mu_bar 0.34, beta_bar 288800.32 and N 72194.41
---.---



---.--- theta: (1.7000000000000002,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (1.7000000000000002,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 233512.12 and N 58360.67
---.--- Treatment Sens: mu_bar 0.31, beta_bar 40515.58 and N 10121.97
---.--- Treatment Secr: mu_bar 0.31, beta_bar 38037.12 and N 9500.63
---.--- Treatment AGI: mu_bar 0.33, beta_bar 41649.72 and N 10401.32
---.--- Treatment PA: mu_bar 0.34, beta_bar 290524.32 and N 72625.41
---.---



---.--- theta: (1.8,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (1.8,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 235200.12 and N 58782.67
---.--- Treatment Sens: mu_bar 0.31, beta_bar 42027.58 and N 10499.97
---.--- Treatment Secr: mu_bar 0.31, beta_bar 39629.12 and N 9898.63
---.--- Treatment AGI: mu_bar 0.33, beta_bar 43133.72 and N 10772.32
---.--- Treatment PA: mu_bar 0.33, beta_bar 292248.32 and N 73056.41
---.---



---.--- theta: (1.9000000000000001,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (1.9000000000000001,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 236888.12 and N 59204.67
---.--- Treatment Sens: mu_bar 0.31, beta_bar 43539.58 and N 10877.97
---.--- Treatment Secr: mu_bar 0.31, beta_bar 41221.12 and N 10296.63
---.--- Treatment AGI: mu_bar 0.33, beta_bar 44617.72 and N 11143.32
---.--- Treatment PA: mu_bar 0.33, beta_bar 293972.32 and N 73487.41
---.---



---.--- theta: (2.0,)
---.--- Finishing policy = X__PureExploration, Truth_type fixed_uniform and theta = (2.0,). Fhat_mean = 5.991 and Fhat_std = 0.087
---.--- Averages along 100 sample-paths, each with 20 trials:
---.--- Treatment M: mu_bar 0.33, beta_bar 238576.12 and N 59626.67
---.--- Treatment Sens: mu_bar 0.31, beta_bar 45051.58 and N 11255.97
---.--- Treatment Secr: mu_bar 0.31, beta_bar 42813.12 and N 10694.63
---.--- Treatment AGI: mu_bar 0.33, beta_bar 46101.72 and N 11514.32
---.--- Treatment PA: mu_bar 0.33, beta_bar 295696.32 and N 73918.41
---.---



Finishing policy X__PureExploration in 4.34 secs
Best theta: (0.0,). Best cumC: 5.99115921458851
df_PureExploration_Fhat__mean=
                 pi  theta  Fhat__mean
0   PureExploration 0.0000      5.9912
1   PureExploration 0.1000      5.9912
2   PureExploration 0.2000      5.9912
3   PureExploration 0.3000      5.9912
4   PureExploration 0.4000      5.9912
5   PureExploration 0.5000      5.9912
6   PureExploration 0.6000      5.9912
7   PureExploration 0.7000      5.9912
8   PureExploration 0.8000      5.9912
9   PureExploration 0.9000      5.9912
10  PureExploration 1.0000      5.9912
11  PureExploration 1.1000      5.9912
12  PureExploration 1.2000      5.9912
13  PureExploration 1.3000      5.9912
14  PureExploration 1.4000      5.9912
15  PureExploration 1.5000      5.9912
16  PureExploration 1.6000      5.9912
17  PureExploration 1.7000      5.9912
18  PureExploration 1.8000      5.9912
19  PureExploration 1.9000      5.9912
20  PureExploration 2.0000      5.9912
df_PureExploration_Fhat__stdv=
                 pi  theta  Fhat__stdv
0   PureExploration 0.0000      0.0870
1   PureExploration 0.1000      0.0870
2   PureExploration 0.2000      0.0870
3   PureExploration 0.3000      0.0870
4   PureExploration 0.4000      0.0870
5   PureExploration 0.5000      0.0870
6   PureExploration 0.6000      0.0870
7   PureExploration 0.7000      0.0870
8   PureExploration 0.8000      0.0870
9   PureExploration 0.9000      0.0870
10  PureExploration 1.0000      0.0870
11  PureExploration 1.1000      0.0870
12  PureExploration 1.2000      0.0870
13  PureExploration 1.3000      0.0870
14  PureExploration 1.4000      0.0870
15  PureExploration 1.5000      0.0870
16  PureExploration 1.6000      0.0870
17  PureExploration 1.7000      0.0870
18  PureExploration 1.8000      0.0870
19  PureExploration 1.9000      0.0870
20  PureExploration 2.0000      0.0870
CPU times: user 4.1 s, sys: 340 ms, total: 4.44 s
Wall time: 4.34 s

Fhat__means = [df_IE_Fhat__mean, df_UCB_Fhat__mean, df_PureExploitation_Fhat__mean, df_PureExploration_Fhat__mean]
Fhat__stdvs = [df_IE_Fhat__stdv, df_UCB_Fhat__stdv, df_PureExploitation_Fhat__stdv, df_PureExploration_Fhat__stdv]
Fhat__thetaStars = [thetaStar_IE[0], thetaStar_UCB[0], thetaStar_PureExploitation[0], thetaStar_PureExploration[0]]
# Fhat__records = [record_IE[0], record_UCB[0], record_PureExploitation[0], record_PureExploration[0]]
P.plot_Fhat_charts(
    Fhat__means=Fhat__means, 
    Fhat__stdvs=Fhat__stdvs, 
    thetasX=thetas, 
    labelX=r'$\theta$', 
    labelY1=r"$\hat{F}^{mean}(\theta)$", 
    labelY2=r"$\hat{F}^{stdv}(\theta)$",
    title=r"Diabetes Model: Cumulative Reward, $\hat{F}(\theta)$"+f"\n L = {L}, N = {N}, "+r"$\mathrm{\theta^*} =$"+f"{Fhat__thetaStars}"
)

# UCB seems best ...

# data structures to output the algorithm details
mu_star_labels = ["mu_"+e for e in eNames]; #print(f"{mu_star_labels=}")
Scum_mu_x_labels = ["Scum_mu_"+e for e in eNames]; #print(f"{Scum_mu__x_=}") #.
Savg_mu_x_labels = ["Savg_mu_"+e for e in eNames]; #print(f"{Savg_mu__x_=}") #.
mub_labels = ["mub__n_"+e for e in eNames]; #print(f"{mu_labels=}")
sgb_labels = ["sgb__n_"+e for e in eNames]; #print(f"{sigma_labels=}")
N_labels = ["N__n_"+e for e in eNames]; #print(f'{N_labels=}')
x_labels = ["x__n_"+e for e in eNames]
labels = ["Policy","Truth_type","Theta","Fhat_mean","l"] + \
  mu_star_labels + Scum_mu_x_labels + Savg_mu_x_labels + ["best_x", "n"] + \
  mub_labels + sgb_labels + N_labels + x_labels + ["W","CumReward","isBest"]

# f'{len(record_IE):,}'
f'{len(record_UCB):,}'

'42,000'

# df_X__IE = pd.DataFrame.from_records(record_IE[:400], columns=labels)
df_X__UCB = pd.DataFrame.from_records(record_UCB[:400], columns=labels)
# P.plot_train(df_X__IE, 'IE', '(first 400 records)')
P.plot_train(df_X__UCB, 'UCB', '(first 400 records)')
# df_X__IE.head()
df_X__UCB.head()

dict_keys(['-', '--', '-.', ':', 'None', ' ', ''])

	Policy	Truth_type	Fhat_mean	l	mu_M	mu_Sens	mu_Secr	mu_AGI	mu_PA	best_x	n	mub__n_M	mub__n_Sens	mub__n_Secr	mub__n_AGI	mub__n_PA	sgb__n_M	sgb__n_Sens	sgb__n_Secr	sgb__n_AGI	sgb__n_PA	N__n_M	N__n_Sens	N__n_Secr	N__n_AGI	N__n_PA	x__n_M	W	CumReward
0	X__UCB	fixed_uniform	None	1	0.2902	0.2774	0.2138	0.3401	0.3190	AGI	0	0.3265	0.2987	0.0700	0.3118	0.1268	0.0025	0.0170	0.1128	0.0111	0.1007	39092	860	11	2019	19	1	0.8691	0.2902
1	X__UCB	fixed_uniform	None	1	0.2902	0.2774	0.2138	0.3401	0.3190	AGI	1	0.3265	0.2987	0.0700	0.3118	0.1268	0.0025	0.0170	0.1128	0.0111	0.1007	39093	860	11	2019	19	1	0.2138	0.5804
2	X__UCB	fixed_uniform	None	1	0.2902	0.2774	0.2138	0.3401	0.3190	AGI	2	0.3265	0.2987	0.0700	0.3118	0.1268	0.0025	0.0170	0.1128	0.0111	0.1007	39094	860	11	2019	19	1	-0.0903	0.8706
3	X__UCB	fixed_uniform	None	1	0.2902	0.2774	0.2138	0.3401	0.3190	AGI	3	0.3265	0.2987	0.0700	0.3118	0.1268	0.0025	0.0170	0.1128	0.0111	0.1007	39095	860	11	2019	19	1	-0.3106	1.1609
4	X__UCB	fixed_uniform	None	1	0.2902	0.2774	0.2138	0.3401	0.3190	AGI	4	0.3265	0.2987	0.0700	0.3118	0.1268	0.0025	0.0170	0.1128	0.0111	0.1007	39096	860	11	2019	19	1	-0.5137	1.4511

# df_X__IE = pd.DataFrame.from_records(record[:100], columns=labels)
# df_X__IE = pd.DataFrame.from_records(record_IE[-400:], columns=labels)
df_X__UCB = pd.DataFrame.from_records(record_UCB[-400:], columns=labels)
# P.plot_train(df_X__IE, 'IE', '(last 400 records)')
P.plot_train(df_X__UCB, 'UCB', '(last 400 records)')
# df_X__IE.head()
df_X__UCB.head()

dict_keys(['-', '--', '-.', ':', 'None', ' ', ''])

	Policy	Truth_type	Theta	Fhat_mean	l	mu_M	mu_Sens	mu_Secr	mu_AGI	mu_PA	Scum_mu_M	Scum_mu_Sens	Scum_mu_Secr	Scum_mu_AGI	Scum_mu_PA	best_x	n	mub__n_M	mub__n_Sens	mub__n_Secr	mub__n_AGI	mub__n_PA	sgb__n_M	sgb__n_Sens	sgb__n_Secr	sgb__n_AGI	sgb__n_PA	N__n_M	N__n_Sens	N__n_Secr	N__n_AGI	N__n_PA	x__n_PA	W	CumReward
0	X__UCB	fixed_uniform	2.0000	6.1166	81	0.4213	0.4084	0.4185	0.2921	0.1680	26.1414	24.2936	23.5034	24.5461	27.0513	M	0	0.3268	0.3033	0.2935	0.3061	0.3380	0.0022	0.0085	0.0099	0.0081	0.0033	50976	3487	2536	3817	22785	1	1.0299	0.1680
1	X__UCB	fixed_uniform	2.0000	6.1166	81	0.4213	0.4084	0.4185	0.2921	0.1680	26.1414	24.2936	23.5034	24.5461	27.0513	M	1	0.3268	0.3033	0.2935	0.3061	0.3379	0.0022	0.0085	0.0099	0.0081	0.0033	50976	3487	2536	3817	22786	1	0.2513	0.3359
2	X__UCB	fixed_uniform	2.0000	6.1166	81	0.4213	0.4084	0.4185	0.2921	0.1680	26.1414	24.2936	23.5034	24.5461	27.0513	M	2	0.3268	0.3033	0.2935	0.3061	0.3379	0.0022	0.0085	0.0099	0.0081	0.0033	50976	3487	2536	3817	22787	1	0.2845	0.5039
3	X__UCB	fixed_uniform	2.0000	6.1166	81	0.4213	0.4084	0.4185	0.2921	0.1680	26.1414	24.2936	23.5034	24.5461	27.0513	M	3	0.3268	0.3033	0.2935	0.3061	0.3380	0.0022	0.0085	0.0099	0.0081	0.0033	50976	3487	2536	3817	22788	1	0.5010	0.6719
4	X__UCB	fixed_uniform	2.0000	6.1166	81	0.4213	0.4084	0.4185	0.2921	0.1680	26.1414	24.2936	23.5034	24.5461	27.0513	M	4	0.3268	0.3033	0.2935	0.3061	0.3379	0.0022	0.0085	0.0099	0.0081	0.0033	50976	3487	2536	3817	22789	1	-0.6237	0.8398

4.6.2 Evaluation

# EVALUATION
# piName_evalu = 'X__IE'
# piName_evalu = 'X__UCB'
# stop_time_evalu = 100
# stop_time_evalu = 200
stop_time_evalu = 500

M_evalu = MedicalDecisionDiabetesModel(
    SNames, 
    xNames, 
    eNames,
    params,
    additional_params
)
print(f'{M.sigmaW=}')
print(f'{M.sigmaW=}')

P_evalu = MDDMPolicy(M_evalu, piNames, seed)

M.sigmaW=0.5
M.sigmaW=0.5

def run_policy_evalu(piInfo_evalu, piName_evalu, stop_time_evalu, model_copy):
    record = []
    model_copy.W_sample_mu()
    for n in range(stop_time_evalu):
        x__n = getattr(P_evalu, piName_evalu)(model_copy, piInfo_evalu)
        S__n, F, x__n, exog_info = model_copy.step(x__n) # step the model forward one iteration
        record_n = \
          [model_copy.mu[e] for e in eNames] + \
          [S__n.mub__n[e] for e in eNames] + \
          [1/math.sqrt(S__n.bet__n[e]) for e in eNames] + \
          [S__n.N__n[e] for e in eNames] + \
          [x__n.x__n[e] for e in eNames] + \
          [exog_info['W']] + \
          [F]
        record.append(record_n)
    cumF = model_copy.cumF    
    return cumF, record

4.6.2.1 Evalutate with data similar to train data

4.6.2.1.1 Non-optimal policy

theta_evalu_non=thetaStar_PureExploration
# theta_evalu_non=thetaStar_PureExploitation

# piName_evalu_non = 'X__IE'
# piName_evalu_non = 'X__UCB'
piName_evalu_non = 'X__PureExploration'
# piName_evalu_non = 'X__PureExploitation'

F, record = run_policy_evalu(theta_evalu_non, piName_evalu_non, stop_time_evalu, copy(M_evalu))
labels_evalu = \
  ["mu__n_"+e for e in eNames] + \
  ["mub__n_"+e for e in eNames] + \
  ["sgb__n_"+e for e in eNames] + \
  ["N__n_"+e for e in eNames] + \
  ["x__n_"+e for e in eNames] + \
  ["W", "CumReward"]
print(f'{theta_evalu_non=}')
print(f'{int(F)=:,}')
df_non = pd.DataFrame.from_records(data=record, columns=labels_evalu); df_non[:10]

theta_evalu_non=(0.0,)
int(F)=142

	mu__n_M	mu__n_Sens	mu__n_Secr	mu__n_AGI	mu__n_PA	mub__n_M	mub__n_Sens	mub__n_Secr	mub__n_AGI	mub__n_PA	sgb__n_M	sgb__n_Sens	sgb__n_Secr	sgb__n_AGI	sgb__n_PA	N__n_M	N__n_Sens	N__n_Secr	N__n_AGI	N__n_PA	x__n_M	x__n_Sens	x__n_Secr	x__n_AGI	x__n_PA	W	CumReward
0	0.2742	0.1832	0.2893	0.4046	0.2856	0.3200	0.2800	0.2012	0.2600	0.2100	0.1200	0.1900	0.1610	0.1500	0.2100	0	0	1	0	0	0	0	1	0	0	-0.6532	0.2893
1	0.2742	0.1832	0.2893	0.4046	0.2856	0.3492	0.2800	0.2012	0.2600	0.2100	0.1167	0.1900	0.1610	0.1500	0.2100	1	0	1	0	0	1	0	0	0	0	0.8555	0.5635
2	0.2742	0.1832	0.2893	0.4046	0.2856	0.3492	0.2800	0.2012	0.2913	0.2100	0.1167	0.1900	0.1610	0.1437	0.2100	1	0	1	1	0	0	0	0	1	0	0.6388	0.9681
3	0.2742	0.1832	0.2893	0.4046	0.2856	0.3492	0.2512	0.2012	0.2913	0.2100	0.1167	0.1776	0.1610	0.1437	0.2100	1	1	1	1	0	0	1	0	0	0	0.0516	1.1513
4	0.2742	0.1832	0.2893	0.4046	0.2856	0.3492	0.2512	0.1646	0.2913	0.2100	0.1167	0.1776	0.1532	0.1437	0.2100	1	1	2	1	0	0	0	1	0	0	-0.1886	1.4406
5	0.2742	0.1832	0.2893	0.4046	0.2856	0.3492	0.2512	0.1646	0.2913	0.3149	0.1167	0.1776	0.1532	0.1437	0.1936	1	1	2	1	1	0	0	0	0	1	0.9095	1.7262
6	0.2742	0.1832	0.2893	0.4046	0.2856	0.3492	0.2009	0.1646	0.2913	0.3149	0.1167	0.1674	0.1532	0.1437	0.1936	1	2	2	1	1	0	1	0	0	0	-0.1972	1.9094
7	0.2742	0.1832	0.2893	0.4046	0.2856	0.3492	0.2009	0.1646	0.2911	0.3149	0.1167	0.1674	0.1532	0.1381	0.1936	1	2	2	2	1	0	0	0	1	0	0.2887	2.3140
8	0.2742	0.1832	0.2893	0.4046	0.2856	0.3492	0.2009	0.1646	0.3442	0.3149	0.1167	0.1674	0.1532	0.1331	0.1936	1	2	2	3	1	0	0	0	1	0	1.0409	2.7186
9	0.2742	0.1832	0.2893	0.4046	0.2856	0.3116	0.2009	0.1646	0.3442	0.3149	0.1136	0.1674	0.1532	0.1331	0.1936	2	2	2	3	1	1	0	0	0	0	-0.3782	2.9928

4.6.2.1.2 Optimal policy

# theta_evalu = thetaStar_IE
theta_evalu = thetaStar_UCB

# piName_evalu = 'X__IE'
piName_evalu = 'X__UCB'

F, record = run_policy_evalu(theta_evalu, piName_evalu, stop_time_evalu, copy(M_evalu))
labels_evalu = \
  ["mu__n_"+e for e in eNames] + \
  ["mub__n_"+e for e in eNames] + \
  ["sgb__n_"+e for e in eNames] + \
  ["N__n_"+e for e in eNames] + \
  ["x__n_"+e for e in eNames] + \
  ["W", "CumReward"]
print(f'{theta_evalu=}')
print(f'{int(F)=:,}')
df = pd.DataFrame.from_records(data=record, columns=labels_evalu); df[:10]

theta_evalu=(1.1,)
int(F)=164

	mu__n_M	mu__n_Sens	mu__n_Secr	mu__n_AGI	mu__n_PA	mub__n_M	mub__n_Sens	mub__n_Secr	mub__n_AGI	mub__n_PA	sgb__n_M	sgb__n_Sens	sgb__n_Secr	sgb__n_AGI	sgb__n_PA	N__n_M	N__n_Sens	N__n_Secr	N__n_AGI	N__n_PA	x__n_AGI	W	CumReward
0	0.2417	0.4186	0.3292	0.3153	0.1804	0.2640	0.2224	0.2895	0.3645	0.2799	0.0468	0.0471	0.0484	0.0474	0.0486	97	106	98	100	100	1	-0.1193	0.3153
1	0.2417	0.4186	0.3292	0.3153	0.1804	0.2640	0.2224	0.2895	0.3732	0.2799	0.0468	0.0471	0.0484	0.0472	0.0486	97	106	98	101	100	1	1.3474	0.6305
2	0.2417	0.4186	0.3292	0.3153	0.1804	0.2640	0.2224	0.2895	0.3706	0.2799	0.0468	0.0471	0.0484	0.0470	0.0486	97	106	98	102	100	1	0.0730	0.9458
3	0.2417	0.4186	0.3292	0.3153	0.1804	0.2640	0.2224	0.2895	0.3747	0.2799	0.0468	0.0471	0.0484	0.0468	0.0486	97	106	98	103	100	1	0.8455	1.2610
4	0.2417	0.4186	0.3292	0.3153	0.1804	0.2640	0.2224	0.2895	0.3820	0.2799	0.0468	0.0471	0.0484	0.0466	0.0486	97	106	98	104	100	1	1.2055	1.5763
5	0.2417	0.4186	0.3292	0.3153	0.1804	0.2640	0.2224	0.2895	0.3854	0.2799	0.0468	0.0471	0.0484	0.0464	0.0486	97	106	98	105	100	1	0.7805	1.8915
6	0.2417	0.4186	0.3292	0.3153	0.1804	0.2640	0.2224	0.2895	0.3876	0.2799	0.0468	0.0471	0.0484	0.0462	0.0486	97	106	98	106	100	1	0.6478	2.2068
7	0.2417	0.4186	0.3292	0.3153	0.1804	0.2640	0.2224	0.2895	0.3859	0.2799	0.0468	0.0471	0.0484	0.0460	0.0486	97	106	98	107	100	1	0.1792	2.5221
8	0.2417	0.4186	0.3292	0.3153	0.1804	0.2640	0.2224	0.2895	0.3896	0.2799	0.0468	0.0471	0.0484	0.0458	0.0486	97	106	98	108	100	1	0.8293	2.8373
9	0.2417	0.4186	0.3292	0.3153	0.1804	0.2640	0.2224	0.2895	0.3823	0.2799	0.0468	0.0471	0.0484	0.0456	0.0486	97	106	98	109	100	1	-0.4850	3.1526

# P.plot_evalu(df_non, df, thetaStar_IE)
P.plot_evalu(df_non, df, thetaStar_UCB)