Dynamic Trading using Active Inference (Part 2)

Back to Portfolio of Projects | LearnableLoopAI.com | Blog |

In Part 2 we switch to Julia and setup the PythonCall and CondaPkg Julia packages to interoperate with the reused Python code. We also collect the code to be reused in a Python module called env_simulator.py. Then we load this Python module into this Julia notebook. After this we follow the usual structure followed for RxInfer projects (even though RxInfer is not yet used in this part).

0 Active Inference: Bridging Minds and Machines

In recent years, the landscape of machine learning has undergone a profound transformation with the emergence of active inference, a novel paradigm that draws inspiration from the principles of biological systems to inform intelligent decision-making processes. Unlike traditional approaches to machine learning, which often passively receive data and adjust internal parameters to optimize performance, active inference represents a dynamic and interactive framework where agents actively engage with their environment to gather information and make decisions in real-time.

At its core, active inference is rooted in the notion of agents as embodied entities situated within their environments, constantly interacting with and influencing their surroundings. This perspective mirrors the fundamental processes observed in living organisms, where perception, action, and cognition are deeply intertwined to facilitate adaptive behavior. By leveraging this holistic view of intelligence, active inference offers a unified framework that seamlessly integrates perception, decision-making, and action, thereby enabling agents to navigate complex and uncertain environments more effectively.

One of the defining features of active inference is its emphasis on the active acquisition of information. Rather than waiting passively for sensory inputs, agents proactively select actions that are expected to yield the most informative outcomes, thus guiding their interactions with the environment. This active exploration not only enables agents to reduce uncertainty and make more informed decisions but also allows them to actively shape their environments to better suit their goals and objectives.

Furthermore, active inference places a strong emphasis on the hierarchical organization of decision-making processes, recognizing that complex behaviors often emerge from the interaction of multiple levels of abstraction. At each level, agents engage in a continuous cycle of prediction, inference, and action, where higher-level representations guide lower-level processes while simultaneously being refined and updated based on incoming sensory information.

The applications of active inference span a wide range of domains, including robotics, autonomous systems, neuroscience, and cognitive science. In robotics, active inference offers a promising approach for developing robots that can adapt and learn in real-time, even in unpredictable and dynamic environments. In neuroscience and cognitive science, active inference provides a theoretical framework for understanding the computational principles underlying perception, action, and decision-making in biological systems.

In conclusion, active inference represents a paradigm shift in machine learning, offering a principled and unified framework for understanding and implementing intelligent behavior in artificial systems. By drawing inspiration from the principles of biological systems, active inference holds the promise of revolutionizing our approach to building intelligent machines and understanding the nature of intelligence itself.

1 BUSINESS UNDERSTANDING

This project deals with a client need relating to making optimal investment decisions regarding a given portfolio. Although we only provide for two financial instruments (stocks, bonds, funds, etc.) in the present example, the code can easily be expanded to provide for multiple financial instruments. A decision can handle buy, hold, or sell options.

versioninfo() ## Julia version
# VERSION ## Julia version

Julia Version 1.11.3
Commit d63adeda50d (2025-01-21 19:42 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_NUM_THREADS =

import Pkg
Pkg.add(Pkg.PackageSpec(;name="PythonCall"))
Pkg.add(Pkg.PackageSpec(;name="CondaPkg"))

using PythonCall
using CondaPkg

   Resolving package versions...
  No Changes to `~/.julia/environments/v1.11/Project.toml`
  No Changes to `~/.julia/environments/v1.11/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.11/Project.toml`
  No Changes to `~/.julia/environments/v1.11/Manifest.toml`

## Pkg.add(Pkg.PackageSpec(;name="RxInfer", version="3.0.0"))
Pkg.add(Pkg.PackageSpec(;name="Plots"))
Pkg.add(Pkg.PackageSpec(;name="LaTeXStrings"))

## using RxInfer
using Plots
## default(label="", margin=10Plots.pt)
using LaTeXStrings

## import RxInfer.ReactiveMP: getrecent, messageout

   Resolving package versions...
  No Changes to `~/.julia/environments/v1.11/Project.toml`
  No Changes to `~/.julia/environments/v1.11/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.11/Project.toml`
  No Changes to `~/.julia/environments/v1.11/Manifest.toml`

Pkg.status()

Status `~/.julia/environments/v1.11/Project.toml`
  [992eb4ea] CondaPkg v0.2.24
  [b964fa9f] LaTeXStrings v1.4.0
  [91a5bcdd] Plots v1.40.9
  [6099a3de] PythonCall v0.9.24

CondaPkg.add("numpy")
CondaPkg.add("pandas")
CondaPkg.add("matplotlib"; version="3.9.1")
CondaPkg.resolve() ## apply changes
CondaPkg.status()

    CondaPkg Found dependencies: /home/vscode/.julia/environments/v1.11/CondaPkg.toml
    CondaPkg Found dependencies: /home/vscode/.julia/packages/PythonCall/WMWY0/CondaPkg.toml
    CondaPkg Dependencies already up to date
    CondaPkg Found dependencies: /home/vscode/.julia/environments/v1.11/CondaPkg.toml
    CondaPkg Found dependencies: /home/vscode/.julia/packages/PythonCall/WMWY0/CondaPkg.toml
    CondaPkg Dependencies already up to date
    CondaPkg Found dependencies: /home/vscode/.julia/environments/v1.11/CondaPkg.toml
    CondaPkg Found dependencies: /home/vscode/.julia/packages/PythonCall/WMWY0/CondaPkg.toml
    CondaPkg Dependencies already up to date
CondaPkg Status /home/vscode/.julia/environments/v1.11/CondaPkg.toml
Environment
  /home/vscode/.julia/environments/v1.11/.CondaPkg/env
Packages
  matplotlib v3.9.1 (3.9.1)
  numpy v2.1.0
  pandas v2.2.2

2 DATA UNDERSTANDING

There is no pre-existing data to be analyzed.

3 DATA PREPARATION

There is no pre-existing data to be prepared.

4 MODELING

4.1 Narrative

Please review the narrative in section 1.

4.2 Core Elements

This section attempts to answer three important questions:

What metrics are we going to track?
What decisions do we intend to make?
What are the sources of uncertainty?

For this problem, the only metric we are interested in is the amount of profit we make after each decision window. A single type of decision needs to be made at the start of each window - whether for each asset we want to buy, hold, or sell the asset at its current price. The only source of uncertainty is the prices of the assets.

4.3 System-Under-Steer / Environment / Generative Process

The simulation of the sustr/envir/genpr will be handled by the reused Python code included in the env_simulator.py Python module.

sys = pyimport("sys")
sys.path.append(".")

env_simulator = pyimport("env_simulator") ## import module
PriceSimulator = pyimport("env_simulator").PriceSimulator ## import PriceSimulator
create_data = pyimport("env_simulator").create_data

Model = pyimport("env_simulator").Model
S_0_INFO = pyimport("env_simulator").S_0_INFO
eNAMES = pyimport("env_simulator").eNAMES
SEED_TRAIN = pyimport("env_simulator").SEED_TRAIN

Policy = pyimport("env_simulator").Policy

Python: <class 'env_simulator.Policy'>

Restart if packages are not found in previous step

Could happen when rebuilding container

The system-under-steer/environment/generative process is an investment portfolio with a 5-dimensional state vector.

4.3.1 State variables

The state variables represent what we need to know. The state at time $t$ of the system-under-steer (sustr), also referred to as the environment (envir), or the generative process (genpr) will be given by (in the case of the Python project):

$S_{t} = (R_{t}, R_{t}^{0}, p_{t})$ where

$E = {AAA, BBB}$
$R_{t} = (R_{t e})_{e \in E}$
- $R_{t e}$ = Our position (in shares) in a stock $e \in E$ , where $R_{t e}$ can be either
  - positive (for a long position) or
  - negative (for a short position), and where
$R_{t}^{0}$ is the amount in cash
$p_{t} = (p_{t e})_{e \in E}$
- $p_{t e}$ = The price at $t$ for stock $e$

This means:

$S_{t} = (R_{t, A A A}, R_{t, B B B}, R_{t}^{0}, p_{t, A A A}, p_{t, B B B})$

The state variables are represented by the following variables in the Model class:

self.State = namedtuple('State', SNAMES) # 'class'
self.S_t = self.build_state(info) # 'instance'

where

SNAMES = [ #state variable names
    'R_t',   #resource
    'R0_t',  #cash
    'p_t',   #price
]

In Julia, we will use ${\tilde{s}}_{t}$ for the state, resulting in:

${\tilde{s}}_{t} = (R_{t, A A A}, R_{t, B B B}, R_{t}^{0}, p_{t, A A A}, p_{t, B B B})$

The tilde above the $s$ indicates that this is the true state rather than the inferred state.

4.3.2 Decision variables

The decision variables represent what we control.

In the Python project we have:

$x_{t} = (x_{t e})_{e \in E}$
- $x_{t e}$ = number of shares that we trade
  - $x_{t e} > 0$ : number of shares we buy of stock $e$
  - $x_{t e} < 0$ : number of shares we sell of stock $e$
Constraints
- $\sum_{e = 1}^{M} x_{t e} p_{t e} \leq R_{t, 0}$
Decisions are made with a policy (TBD below):
- $X^{π} (S_{t})$

The decision variables are represented by the following variables in the Model class:

self.Decision = namedtuple('Decision', xNAMES) # 'class'

where

xNAMES = ['x_t']
eNAMES = ['AAA', 'BBB']

In Julia, we will use $a_{t}$ for an action:

$a_{t} = (a_{t, A A A}, a_{t, B B B})$

4.3.3 Exogenous information variables

The exogenous information variables represent what we did not know (when we made a decision). These are the variables that we cannot control directly. The information in these variables becomes available after we make the decision $x_{t}$ .

When we assume that the price in each time period is revealed, without any model to predict the price based on past prices, we have, using approach 1:

$p_{t + 1} = W_{t + 1}$

Alternatively, when we assume that we observe the change in price ${\hat{p}}_{t + 1} = p_{t + 1} - p_{t}$ , we have, using approach 2:

$\begin{aligned} p_{t + 1} & = p_{t} + W_{t + 1} \\ = p_{t} + {\hat{p}}_{t + 1} \end{aligned}$

We will make use of approach 2 which means that the exogenous information, $W_{t + 1}$ , is the observed change in price of the share.

The exogenous information is obtained by a call to

SIM = PriceSimulator.simulate(...)

where SIM is a global variable.

The latest exogenous information can be accessed by calling the following method from class Model(), which returns a simulated price change for each asset:

def W_fn(self, t):
    W_tt1 = SIM.simulate()
    return W_ttl

4.3.4 Next-state function

The next-state function captures the dynamics of the environment/system-under-steer/generative process.

Because we currently have three state variables in the state, $S_{t} = (R_{t}, R_{t}^{0}, p_{t})$ , we have three equations.

For a position in a stock $R_{t e}$ : $\begin{aligned} R_{t + 1, e} & = R_{t e} + x_{t e} (RLSO - eq 13.13) \end{aligned}$

For a position in cash $R_{t, 0}$ : $\begin{aligned} R_{t + 1}^{0} & = R_{t}^{0} - \sum_{e = 1}^{M} x_{t e} p_{t e} (RLSO - eq 13.14) \end{aligned}$

For the price $p_{t}$ : $\begin{aligned} p_{t + 1, e} & = p_{t e} + {\hat{p}}_{t + 1, e} (RLSO - eq 13.15) \end{aligned}$

Collectively, they represent the general next-state function:

$S_{t + 1} = S^{M} (S_{t}, X^{π} (S_{t} | θ), W_{t + 1})$

In Julia, we will represent the next-state function as:

${\tilde{s}}_{t} = g ({\tilde{s}}_{t - 1}, a_{t}, W_{t})$

4.3.5 Observation function

The observation function can be represented by $y_{t} = {\tilde{s}}_{t}$

because we will not have any observation noise.

4.3.6 Implementation of the System-Under-Steer / Environment / Generative Process

The agent and the environment interact through a Markov blanket. Because states of the agent are unknown to the world, we wrap them in a comprehension that only returns functions for interacting with the agent. Internal beliefs cannot be directly observed, and interaction is only allowed through the Markov blanket of the agent (i.e. the sensors and actuators).

In Part 2 of this project we will not use RxInfer yet.

The Python code for the implementation can be found in the env_simulator.py Python module:

from collections import namedtuple, defaultdict
import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
import matplotlib as mpl

## PARAMETERS
SNAMES = [ ## state variable names
    'R_t',   ## resource
    'R0_t',  ## cash
    'p_t',   ## price
]
xNAMES = ['x_t'] ## decision variable names
eNAMES = ['AAA', 'BBB']
piNAMES = ['X__HighLow'] ## policy names
thNAMES = ['thLo', 'thHi'] ## theta names
SEED_TRAIN = 77777777
x = ['Up', 'Neutral', 'Down']
W_BIAS_CDFS = pd.DataFrame(
  [[.9, 1., 1.], ## 'Up' cdf
   [.2, .8, 1.], ## 'Neutral' cdf
   [0., .1, 1.]], ## 'Down' cdf
  index=x,
  columns=x,
)
INIT_PRICE = {eNAMES[0]: 100.0, eNAMES[1]: 50.0}
W_UP_STEP = 1
W_DOWN_STEP = -1
W_VARIANCE = 2
INIT_RESOURCE = {eNAMES[0]: 0, eNAMES[1]: 0}
INIT_CASH = 1_000.00
S_0_INFO = {
  'R_t': {en: INIT_RESOURCE[en] for en in eNAMES},
  'R0_t': INIT_CASH,
  'p_t': {en: INIT_PRICE[en] for en in eNAMES},
}
##        'AAA'                         'BBB'
TH_HI = {eNAMES[0]: (200.0, 200.5, .1), eNAMES[1]: (90.0, 90.5, .1)}
TH_LO = {eNAMES[0]: (168.0, 168.5, .1), eNAMES[1]: (72.0, 72.5, .1)}

class PriceSimulator():
  def __init__(self,
    biasCdfs=W_BIAS_CDFS,
    upStep=W_UP_STEP,
    downStep=W_DOWN_STEP,
    variance=W_VARIANCE,
    seed=None):

    self.biasCdfs = biasCdfs
    self.upStep = upStep
    self.downStep = downStep
    self.variance = variance
    self.prng = np.random.RandomState(seed)
    self.bias = 'Neutral'

  def simulate(self): ## assume the change in price is normal with mean bias and variance 2
    phat_tt1_dict = {}
    b_tt1_dict = {}
    b_tt1_val_dict = {}
    for e in eNAMES:
      b_t = self.prng.choice(['Down', 'Neutral', 'Up'])
      biasCdf = self.biasCdfs.loc[[b_t]]
      coin = self.prng.random_sample()
      if (coin < float(biasCdf['Up'].iloc[0])):  
        b_tt1 = 'Up' ## new bias
        b_tt1_val = self.upStep ## bias
      elif (coin >= float(biasCdf['Up'].iloc[0]) and coin < float(biasCdf['Neutral'].iloc[0])): #.
        b_tt1 = 'Neutral' ## new bias
        b_tt1_val = 0 ## bias
      else:
        b_tt1 = 'Down' ## new bias
        b_tt1_val = self.downStep ## bias
      self.bias = b_tt1
      phat_tt1_dict[e] = self.prng.normal(b_tt1_val, self.variance) ## change in price
      b_tt1_dict[e] = b_tt1
      b_tt1_val_dict[e] = b_tt1_val
    W_tt1 = {
        "p_t": {e: phat_tt1_dict[e] for e in eNAMES},
        "b_t": {e: b_tt1_dict[e] for e in eNAMES}, ## just for display
        "b_t_val": {e: b_tt1_val_dict[e] for e in eNAMES} ## just for display
    }
    return W_tt1

def create_data(T__sim):
  price_sim = PriceSimulator(seed=SEED_TRAIN)
  PriceData = []
  for i in range(T__sim):
    res = price_sim.simulate()
    entry = [itm[1][e] for itm in list(res.items()) for e in eNAMES]
    PriceData.append(entry)
  labels = [f'{itm[0]}_{e}' for itm in list(res.items()) for e in eNAMES]
  df = pd.DataFrame.from_records(data=PriceData, columns=labels); df[:10]
  ## df = pd.DataFrame.from_records(data=PriceData); df[:10]
  return df

class Model():
    def __init__(self, S_0_info):
      self.S_0_info = S_0_info
      self.State = namedtuple('State', SNAMES) ## 'class'
      self.S_t = self.build_state(S_0_info) ## 'instance'
      self.Decision = namedtuple('Decision', xNAMES) ## 'class'
      self.Ccum = 0.0 ## cumulative reward

    def build_state(self, info):
      return self.State(*[info[sn] for sn in SNAMES])

    def build_decision(self, info):
        return self.Decision(*[info[xn] for xn in xNAMES])

    ## exogenous information, dependent on a random process (the change in price)
    def W_fn(self, SIM):
        W_tt1 = SIM.simulate()
        return W_tt1

    def S__M_fn(self, S_t, x_t, W_tt1, theta, piName):
        ## print(f'...in S__M_fn()...\n\t{S_t=}\n\t{x_t=}\n\t{W_tt1=}\n\t{theta=}')

        ## R_t
        R_tt1 = {}
        for en in eNAMES:
          R_tt1[en] = S_t.R_t[en] + x_t.x_t[en]

        ## R0_t
        cost = 0.0
        for en in eNAMES:
          cost += x_t.x_t[en]*S_t.p_t[en]
        R0_tt1 = S_t.R0_t - cost

        ## p_t
          ## W_tt1['p_t'] has CHANGE in price
          ## clipped at a penny, else division by zero in X__HighLow
        p_t = S_t.p_t
        p_tt1 = {}
        for en in eNAMES:
          p_tt1[en] = max(0.01, p_t[en] + W_tt1['p_t'][en])

        S_tt1 = self.build_state({
            'R_t': R_tt1,
            'R0_t': R0_tt1,
            'p_t': p_tt1,
        })
        return S_tt1

    def C_fn(self, S_t, x_t, W_tt1):
        ## print(f'...in C_fn()...\n\t{S_t=}\n\t{x_t=}\n\t{W_tt1=}')
        C_t = 0.0
        for en in eNAMES:
          C_t += -S_t.p_t[en]*x_t.x_t[en]
        return C_t

    def step(self, x_t, theta, piName, SIM):
        ## print(f'...in step()...\n\t{x_t=}\n\t{theta=}')
        W_tt1 = self.W_fn(SIM)
        C = self.C_fn(self.S_t, x_t, W_tt1)
        self.Ccum += C
        self.S_t = self.S__M_fn(self.S_t, x_t, W_tt1, theta, piName)
        return (self.S_t, self.Ccum, x_t, W_tt1['b_t_val']) ## for plotting

function create_envir(; s̃₀, theta, N, SIM, M, P)
    s̃ₜ₋₁ = s̃₀
    s̃ₜ = s̃ₜ₋₁
    yₜ = s̃ₜ
    ## execute = (t, a_vector) -> begin
    next_ = (t, a_vector) -> begin
        aₜ = _M.build_decision(Dict("x_t"=> Dict("AAA"=> a_vector[1], "BBB"=> a_vector[2])))
        s̃ₜ, Ccum, x_t, b_t_val = M.step(aₜ, theta, "X__HighLow", SIM)

        yₜ = s̃ₜ ## No observation noise
        s̃ₜ₋₁ = s̃ₜ

        return Ccum, x_t
    end

    observe_ = () -> begin
        return yₜ
    end
    
    return (next_, observe_)
end

create_envir (generic function with 1 method)

4.4 Uncertainty Model

We will simulate the share price $p_{t + 1} = p_{t} + {\hat{p}}_{t + 1} = p_{t} + W_{t + 1}$ as described in section 2, for each asset.

4.5 Agent / Generative Model

4.5.1 State variables

According to the agent the state of the system-under-steer/environment/generative process will be $s_{t}$ , rather than ${\tilde{s}}_{t}$ which will then be given by

$s_{t} = (R_{t, A A A}, R_{t, B B B}, R_{t}^{0}, p_{t, A A A}, p_{t, B B B})$

4.5.2 Decision variables

According to the agent the action on the environment at time $t$ will be represented by $u_{t}$ , also known as the control state of the agent.

4.5.3 Implementation of the Agent / Generative Model / Internal Model

We will not have a probabilistic model for the agent yet. In this part of the project the agent will behave according to a rule-based policy. For the Python implementation, the rule is given by:

$X^{H i g h L o w} (S_{t e} | θ^{H i g h L o w}) = {\begin{cases} - 1 & if p_{t e} < θ_{e}^{l o w} or p_{t e} > θ_{e}^{h i g h} \\ - 1 & if t = T and R_{t e} = 1 \\ 0 & otherwise \end{cases}$ for each asset $e \in E$

A slight change in symbols represents its form for the Julia case:

$π^{H i g h L o w} (s_{t, e} | θ^{H i g h L o w}) = {\begin{cases} - 1 & if p_{t e} < θ_{e}^{l o w} or p_{t e} > θ_{e}^{h i g h} \\ - 1 & if t = T and R_{t e} = 1 \\ 0 & otherwise \end{cases}$ for each asset $e \in E$

4.5.3.1 Generative Model for the portfolio

In Part 2 of this project we will not use RxInfer yet.

The Python code for the implementation can be found in the env_simulator.py Python module:

class Policy():
  def __init__(self, model):
    self.model = model
    self.Policy = namedtuple('Policy', piNAMES) ## 'class'
    self.Theta = namedtuple('Theta', thNAMES) ## 'class'

  def build_policy(self, info):
    return self.Policy(*[info[pin] for pin in piNAMES])

  def build_theta(self, info):
    return self.Theta(*[info[thn] for thn in thNAMES])

  def X__HighLow(self, t, S_t, theta, N): ## T is for lookahead horizon in AIF
    ## print(f'...in X__HighLow()...\n\t{t=}\n\t{S_t=}\n\t{theta=}')
    x_t_info = {
        'x_t': {en: 0 for en in eNAMES} ## default is hold
    }
    ## print(f'\t%%% {S_t.R0_t=}, {S_t.R_t=}, {S_t.p_t=}')
    tickersToSell = []
    tickersToBuy = []

    ## sell all at end
    if (t == N - 1):
      tickersToSell = [en for en in eNAMES]
      for ticker in tickersToSell:
        nShares = S_t.R_t[ticker]
        x_t_info['x_t'][ticker] = -nShares
      return self.model.build_decision(x_t_info)
    ## identify buys and sells
    for en in eNAMES:
      if (S_t.p_t[en] < theta.thLo[en]): ## buy
        tickersToBuy.append(en)
      elif (S_t.p_t[en] > theta.thHi[en]): ## sell
        tickersToSell.append(en)
    totalFunds = S_t.R0_t; ##print(f'\t%%% {totalFunds=}')
    ## sell
    ## print(f'\t%%% {tickersToSell=}')
    if len(tickersToSell) > 0:
      for ticker in tickersToSell:
        nShares = S_t.R_t[ticker]
        x_t_info['x_t'][ticker] = -nShares
        totalFunds += nShares*S_t.p_t[ticker]
      ## print(f'\t%%% totalFunds after selling: {totalFunds}')
    ## buy
    ## print(f'\t%%% {tickersToBuy=}')
    if len(tickersToBuy) > 0:
      availFundsPerTicker = totalFunds/len(tickersToBuy); ##print(f'{availFundsPerTicker=}')
      for ticker in tickersToBuy:
        nShares = int(availFundsPerTicker/S_t.p_t[ticker])
        x_t_info['x_t'][ticker] = +nShares
        totalFunds -= nShares*S_t.p_t[ticker]
      ## print(f'\t%%% totalFunds after buying: {totalFunds}')
    return self.model.build_decision(x_t_info)

  def run_policy_sample_paths(self, theta, piName, N, SIM): ## T is for lookahead horizon in AIF
      record = []
      M = Model(S_0_INFO)
      P = Policy(M)
      for t in range(N): ## for each transition/step
          ## print(f'\t%%% {t=}')
          x_t = getattr(self, piName)(t, M.S_t, theta, N)
          S_t, Ccum, x_t, b_t_val = M.step(x_t, theta, piName, SIM)
          record_t = [t] + \
            [S_t.R_t[en] for en in eNAMES] + [S_t.R0_t] + [S_t.p_t[en] for en in eNAMES] + \
            [Ccum] + \
            [x_t.x_t[en] for en in eNAMES] + \
            [b_t_val[en] for en in eNAMES] ## rather than b_t which is text and not ordered
          record.append(record_t)
      return record

Next, we define the agent:

function create_agent(; s̃₀, theta, N, SIM, M, P)

    ## Bayesian inference by message passing
    ## The `infer` function is the heart of the agent
    ## It calls the `RxInfer.infer` function to perform Bayesian inference by message passing
    ## compute = (υₜ::Float64, ŷₜ::Vector{Float64}) -> begin
    infer_ = (υₜ::Float64, ŷₜ::Vector{Float64}) -> begin
    end
    
    ## The `act` function returns the inferred best possible action
    act_ = (t, S_t, theta, N) -> begin
        S_t_INFO = Dict(
            "R_t"=> Dict("AAA"=> S_t[1], "BBB"=> S_t[2]), 
            "R0_t"=> S_t[3], 
            "p_t"=> Dict("AAA"=> S_t[4], "BBB"=> S_t[5])
        )
        s̃ₜ₋₁ = _M.build_state(S_t_INFO)
        aₜ = P.X__HighLow(t, s̃ₜ₋₁, theta, N)
        return aₜ
    end
    
    ## The `future` function returns the inferred future states
    future_ = () -> begin 
    end

    slide_ = () -> begin
    end

    return (act_, future_,   infer_, slide_)
end

create_agent (generic function with 1 method)

4.6 Agent Policy Evaluation

4.6.1 Training/Tuning

4.6.1.1 No actions

Just to setup the RxInfer procedure, we create an environment but do not apply any actions. The only dynamics will come from the exogenous variables. The name decoration naive is used for this case.

_M = Model(S_0_INFO)
_P = Policy(_M)
_SIM = PriceSimulator(seed=SEED_TRAIN)
_theta = _P.build_theta(
    Dict("thLo"=> Dict("AAA"=> 100, "BBB"=> 50), "thHi"=> Dict("AAA"=> 110, "BBB"=> 60))
)

_Nⁿᵃⁱᵛᵉ  = 100 ## Total simulation time
_s̃₀ = _M.build_state(S_0_INFO)
(next_naive, observe_naive) = create_envir(;
    s̃₀=    _s̃₀,
    theta= _theta,
    N=     _Nⁿᵃⁱᵛᵉ,
    SIM=   _SIM,
    M=     _M,
    P=     _P
);

_yⁿᵃⁱᵛᵉ = Vector{Vector{Float64}}(undef, _Nⁿᵃⁱᵛᵉ) ## Observations
_Ccum = Vector{Float64}(undef, _Nⁿᵃⁱᵛᵉ)
_x = Vector{Vector{Float64}}(undef, _Nⁿᵃⁱᵛᵉ) ## Actions
for t = 1:_Nⁿᵃⁱᵛᵉ
    ## 3. Next
    pytmp = _M.build_decision(Dict("x_t"=> Dict("AAA"=> 0, "BBB"=> 0))) ## dummy action
    v = [pyconvert(Integer, pytmp.x_t[i]) for i in eNAMES]
    pytmp1, pytmp2 = next_naive(t, v)

    _Ccum[t] = pyconvert(Float64, pytmp1)

    v = [pyconvert(Integer, pytmp2.x_t[i]) for i in eNAMES]
    _x[t] = v

    ## 4. Observe
    pytmp = observe_naive() ## Observe external states
    v = vcat(
        [pyconvert(Integer, pytmp.R_t[i]) for i in eNAMES],
        [pyconvert(Float64, pytmp.R0_t)],
        [pyconvert(Float64, pytmp.p_t[i]) for i in eNAMES])
    _yⁿᵃⁱᵛᵉ[t] = v
end

_thLo_AAA = pyconvert(Float64, _theta.thLo["AAA"])
_thHi_AAA = pyconvert(Float64, _theta.thHi["AAA"])
_thLo_BBB = pyconvert(Float64, _theta.thLo["BBB"])
_thHi_BBB = pyconvert(Float64, _theta.thHi["BBB"])

_R_AAA = [_yⁿᵃⁱᵛᵉ[i][1] for i in 1:length(_yⁿᵃⁱᵛᵉ)]
_R_BBB = [_yⁿᵃⁱᵛᵉ[i][2] for i in 1:length(_yⁿᵃⁱᵛᵉ)]

_R0 = [_yⁿᵃⁱᵛᵉ[i][3] for i in 1:length(_yⁿᵃⁱᵛᵉ)]

_p_AAA = [_yⁿᵃⁱᵛᵉ[i][4] for i in 1:length(_yⁿᵃⁱᵛᵉ)]
_p_BBB = [_yⁿᵃⁱᵛᵉ[i][5] for i in 1:length(_yⁿᵃⁱᵛᵉ)]

_x_AAA = [_x[i][1] for i in 1:length(_x)]
_x_BBB = [_x[i][2] for i in 1:length(_x)];

_p1 = plot(
    _x_AAA,
    title= "No actions",
    size=(700, 800),
    seriestype=:step,
    xaxis= false,
    ylabel= L"x_{t,AAA}", ##"action\n"*L"x_{t,AAA}",
    tickfont=6,
    yguidefontrotation=-90,
    left_margin=15Plots.mm,
    color= "magenta",
    legend=:none,)

_p2 = plot(
    _x_BBB,
    title= "",
    seriestype=:step,
    xaxis= false,
    ylabel= L"x_{t,BBB}", ##"action\n"*L"x_{t,BBB}",
    tickfont=6,
    yguidefontrotation=-90,
    left_margin=15Plots.mm,
    color= "magenta",
    legend=:none,)

_p3 = plot(
    _p_AAA,
    title= "",
    seriestype=:step,
    xaxis= false,
    ylabel= L"p_{t,AAA}",
    tickfont=6,
    yguidefontrotation=-90,
    left_margin=15Plots.mm,
    color= "green",
    legend=:none,)
_p3 = hline!([_thLo_AAA], linestyle=:dash, label="Horizontal Line", color="magenta")
_p3 = annotate!(10, 0.95*_thLo_AAA, text(L"\theta^{Lo}="*"$(_thLo_AAA)", :center, 8, "magenta"))
_p3 = hline!([_thHi_AAA], linestyle=:dash, label="Horizontal Line", color="magenta")
_p3 = annotate!(10, 1.05*_thHi_AAA, text(L"\theta^{Hi}="*"$(_thHi_AAA)", :center, 8, "magenta"))

_p4 = plot(
    _p_BBB,
    title= "",
    seriestype=:step,
    xlabel= L"$t$",
    ylabel= L"p_{t,BBB}",
    tickfont=6,
    yguidefontrotation=-90,
    left_margin=15Plots.mm,
    color= "green",
    legend=:none,)
_p4 = hline!([_thLo_BBB], linestyle=:dash, label="Horizontal Line", color="magenta")
_p4 = annotate!(10, 0.95*_thLo_BBB, text(L"\theta^{Lo}="*"$(_thLo_BBB)", :center, 8, "magenta"))
_p4 = hline!([_thHi_BBB], linestyle=:dash, label="Horizontal Line", color="magenta")
_p4 = annotate!(10, 1.05*_thHi_BBB, text(L"\theta^{Hi}="*"$(_thHi_BBB)", :center, 8, "magenta"))

_p5 = plot(
    _R_AAA,
    seriestype=:step,
    xaxis= false,
    ylabel= L"R_{t,AAA}",
    tickfont=6,
    yguidefontrotation=-90,
    left_margin=15Plots.mm,
    color= "magenta",
    legend=:none,)

_p6 = plot(
    _R_BBB,
    seriestype=:step,
    xaxis= false,
    ylabel= L"R_{t,BBB}",
    tickfont=6,
    yguidefontrotation=-90,
    left_margin=15Plots.mm,
    color= "magenta",
    legend=:none,)

_p7 = plot(
    _R0,
    seriestype=:step,
    xaxis= false,
    ylabel= L"R^0_t",
    tickfont=6,
    yguidefontrotation=-90,
    left_margin=15Plots.mm,
    color= "magenta",
    legend=:none,)

_p8 = plot(
    _Ccum,
    seriestype=:step,
    xaxis= false,
    ylabel= "Profit\n"*L"C^{cum}",
    tickfont=6,
    yguidefontrotation=-90,
    left_margin=15Plots.mm,
    color= "magenta",
    legend=:none,)
        
plot(_p1, _p2, _p3, _p4, _p5, _p6, _p7, _p8, layout=(8,1))

4.6.1.2 Rule-based policy

Now we are going to apply the HighLow rule-based policy that was mentioned above. Actions will be generated according to the rule. Note that the name decoration ai (for active inference) is used even though this principle is not yet applied.

_M = Model(S_0_INFO)
_P = Policy(_M)
_SIM = PriceSimulator(seed=SEED_TRAIN)
_theta = _P.build_theta(
    Dict("thLo"=> Dict("AAA"=> 100, "BBB"=> 50), "thHi"=> Dict("AAA"=> 110, "BBB"=> 60))
)

_Nᵃⁱ = 100 ## Total simulation time
_s̃₀ = _M.build_state(S_0_INFO)
(next_ai, observe_ai) = create_envir(;
    s̃₀=    _s̃₀,
    theta= _theta,
    N=     _Nᵃⁱ,
    SIM=   _SIM,
    M=     _M,
    P=     _P
);

(act_ai, future_ai,   infer_ai, slide_ai) = create_agent(;
    s̃₀=    _s̃₀,
    theta= _theta,
    N=     _Nᵃⁱ,
    SIM=   _SIM,
    M=     _M,
    P=     _P    
) 

_yᵃⁱ = Vector{Vector{Float64}}(undef, _Nᵃⁱ) ## Observations
_yᵃⁱ_init = [0.1, 0.1, 0.1, 0.1, 0.1]
_Ccum = Vector{Float64}(undef, _Nᵃⁱ)
_x = Vector{Vector{Float64}}(undef, _Nᵃⁱ) ## Actions
for t = 1:_Nᵃⁱ
    println("t = $t")
    ## 1. Act
    if t > 1
        pytmp = act_ai(t, _yᵃⁱ[t-1], _theta, _Nᵃⁱ)
    else
        pytmp = act_ai(t, _yᵃⁱ_init, _theta, _Nᵃⁱ)
    end
    v = [pyconvert(Integer, pytmp.x_t[i]) for i in eNAMES]
    _x[t] = v; println("_x[t] = $v")

    ## 2. Future
    ## _fs[t] = future_ai()  ## Fetch the predicted future states

    ## 3. Next
    pytmp1, pytmp2 = next_ai(t, _x[t]) ## The action influences hidden external states
    _Ccum[t] = pyconvert(Float64, pytmp1)
    v = [pyconvert(Integer, pytmp2.x_t[i]) for i in eNAMES]
    _x[t] = v; println("_x[t] = $v")

    ## 4. Observe
    pytmp = observe_ai() ## Observe external states
    v = vcat(
        [pyconvert(Integer, pytmp.R_t[i]) for i in eNAMES],
        [pyconvert(Float64, pytmp.R0_t)],
        [pyconvert(Float64, pytmp.p_t[i]) for i in eNAMES])
    _yᵃⁱ[t] = v; println("_yᵃⁱ[t] = $v") 

    ## 5. Infer:
    ## infer_ai(_as[t], _ys[t]) ## Infer beliefs from current model state (update q)
    
    ## 6. Slide:
    ## slide_ai() ## Prepare for next iteration
end

t = 1
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1000.0, 100.33651452994886, 53.108396080634904]
t = 2
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1000.0, 102.43229326418141, 54.24527309690561]
t = 3
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1000.0, 103.15484733626705, 53.10192423454886]
t = 4
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1000.0, 107.0945004863362, 52.61612115117965]
t = 5
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1000.0, 109.27557763765795, 53.03796910256657]
t = 6
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1000.0, 106.89706430578963, 53.0217548694274]
t = 7
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1000.0, 102.90608859021901, 52.69438145980463]
t = 8
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1000.0, 99.37395609782197, 56.68078368760216]
t = 9
_x[t] = [10, 0]
_x[t] = [10, 0]
_yᵃⁱ[t] = [10.0, 0.0, 6.2604390217802575, 96.18672569028955, 57.90458427208675]
t = 10
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [10.0, 0.0, 6.2604390217802575, 95.99473473298212, 58.62122086534626]
t = 11
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [10.0, 0.0, 6.2604390217802575, 101.91135972641926, 59.377433139058006]
t = 12
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [10.0, 0.0, 6.2604390217802575, 105.86626046039807, 61.94163571164044]
t = 13
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [10.0, 0.0, 6.2604390217802575, 106.52262118899564, 62.45991120250871]
t = 14
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [10.0, 0.0, 6.2604390217802575, 106.54958954228502, 65.7716072578473]
t = 15
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [10.0, 0.0, 6.2604390217802575, 106.440419220082, 68.23462642039826]
t = 16
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [10.0, 0.0, 6.2604390217802575, 109.08670663723032, 70.17341809692495]
t = 17
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [10.0, 0.0, 6.2604390217802575, 114.30164133140408, 75.18794010924515]
t = 18
_x[t] = [-10, 0]
_x[t] = [-10, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 114.20914036424523, 72.06684505057645]
t = 19
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 118.42176769216863, 70.02354565262273]
t = 20
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 120.36143839161926, 70.23057170086635]
t = 21
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 114.9106930650196, 68.45886064686566]
t = 22
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 115.00494104625695, 68.01267680186778]
t = 23
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 115.15415899407195, 71.26424455613683]
t = 24
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 117.18603953456744, 67.47590675082026]
t = 25
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 116.6368734675044, 67.0578489971041]
t = 26
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 114.74146909131831, 67.51997194943966]
t = 27
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 116.15874879609996, 64.46170335421051]
t = 28
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 120.36118932511313, 64.45207200030988]
t = 29
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 121.27687502615771, 61.588838810958194]
t = 30
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 124.26714022210504, 60.41718957239019]
t = 31
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 128.09210318881094, 60.913051649764725]
t = 32
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 127.17229329688912, 61.850287938104415]
t = 33
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 126.26449057519575, 61.221336111027]
t = 34
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 124.29515359675447, 62.087063482107894]
t = 35
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 125.04257487929556, 62.96866215005461]
t = 36
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 126.16438783004165, 59.82634508711346]
t = 37
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 126.48989865972587, 61.318778954166135]
t = 38
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 127.78068564108096, 64.60031901223502]
t = 39
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 125.6700360628291, 65.23896031959282]
t = 40
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 126.70177319198189, 66.0448092736948]
t = 41
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 125.27732538036592, 66.36339445520396]
t = 42
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 126.13465127479725, 64.63031396649905]
t = 43
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 119.99305582554679, 60.84642702165881]
t = 44
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 116.33746792222318, 61.6290030909189]
t = 45
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 113.9425106049464, 60.1953695166889]
t = 46
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 117.1174534933926, 60.99332642443635]
t = 47
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 115.791263684756, 57.201477054462366]
t = 48
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 117.58108653883382, 58.13181953690186]
t = 49
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 114.5739558769336, 56.37729288101637]
t = 50
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 110.73315805735614, 59.888920413698536]
t = 51
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 110.93353105745962, 59.28630689379172]
t = 52
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 110.02315804491136, 60.46425348487098]
t = 53
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 109.93112140956582, 61.1454184708522]
t = 54
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 108.16966504977418, 61.76255915678868]
t = 55
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 108.30897990183563, 63.486761722890556]
t = 56
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 109.51457603291497, 63.715006198382596]
t = 57
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 113.94610908264754, 60.341103219817114]
t = 58
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 114.95964269390392, 64.01743534010143]
t = 59
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 113.88560760142656, 66.80534064270073]
t = 60
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 114.54297603620427, 65.10688873201428]
t = 61
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 112.39023364714656, 66.87173838993743]
t = 62
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 110.40595852169919, 64.69717116979282]
t = 63
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 112.35406718623649, 65.93305141363605]
t = 64
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 109.37425881226618, 67.24842770558054]
t = 65
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 108.40603444833809, 69.83737244338931]
t = 66
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 112.57076183888891, 67.41697946022045]
t = 67
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 113.36066383812467, 68.41630249980832]
t = 68
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 109.66379980614506, 64.89086153290532]
t = 69
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 112.58690874859712, 65.58772049313286]
t = 70
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 111.5879687061103, 63.62203137002021]
t = 71
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 109.62619348635569, 63.76490106362522]
t = 72
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 110.78503228550557, 58.62786624173392]
t = 73
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 113.45378978061383, 55.90721054890252]
t = 74
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 116.77585745502346, 60.70164922223957]
t = 75
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 118.52729327440726, 61.90276838249821]
t = 76
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 121.04455735922997, 59.17854696320783]
t = 77
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 119.33118229101022, 51.39222098051578]
t = 78
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 119.64298276715287, 50.13476001662142]
t = 79
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 120.52964770842728, 50.691434157710596]
t = 80
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 119.84777224200393, 52.48002291319415]
t = 81
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 119.92945125266648, 51.64957841628771]
t = 82
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 124.47751596394714, 50.331252022239546]
t = 83
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 127.2120767136787, 51.25982373597712]
t = 84
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 123.88405194671431, 52.620515299403095]
t = 85
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 123.68716907014885, 51.02820528651759]
t = 86
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1149.276852335821, 121.26748689768284, 48.76538552642805]
t = 87
_x[t] = [0, 23]
_x[t] = [0, 23]
_yᵃⁱ[t] = [0.0, 23.0, 27.672985227975914, 118.33216064257856, 49.39094785007215]
t = 88
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 23.0, 27.672985227975914, 118.51971777071711, 49.76229157302677]
t = 89
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 23.0, 27.672985227975914, 120.59309374293454, 50.15191105748859]
t = 90
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 23.0, 27.672985227975914, 119.37322540323431, 47.96586998534207]
t = 91
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 23.0, 27.672985227975914, 119.1764593361756, 47.86806764282508]
t = 92
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 23.0, 27.672985227975914, 115.71874813560981, 47.45979814001823]
t = 93
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 23.0, 27.672985227975914, 116.33253462957488, 49.41274155447528]
t = 94
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 23.0, 27.672985227975914, 114.01957582083942, 47.1853657918281]
t = 95
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 23.0, 27.672985227975914, 113.32912001686634, 51.693337050060656]
t = 96
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 23.0, 27.672985227975914, 113.18577796475202, 49.77622295361456]
t = 97
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 23.0, 27.672985227975914, 117.0450944702334, 51.3546654045134]
t = 98
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 23.0, 27.672985227975914, 115.05536015943943, 51.638420113855055]
t = 99
_x[t] = [0, -23]
_x[t] = [0, -23]
_yᵃⁱ[t] = [0.0, 0.0, 1215.356647846642, 115.23605517544617, 54.3945982846744]
t = 100
_x[t] = [0, 0]
_x[t] = [0, 0]
_yᵃⁱ[t] = [0.0, 0.0, 1215.356647846642, 115.53683585980697, 58.088119103109406]

_thLo_AAA = pyconvert(Float64, _theta.thLo["AAA"])
_thHi_AAA = pyconvert(Float64, _theta.thHi["AAA"])
_thLo_BBB = pyconvert(Float64, _theta.thLo["BBB"])
_thHi_BBB = pyconvert(Float64, _theta.thHi["BBB"])

_R_AAA = [_yᵃⁱ[i][1] for i in 1:length(_yᵃⁱ)]
_R_BBB = [_yᵃⁱ[i][2] for i in 1:length(_yᵃⁱ)]

_R0 = [_yᵃⁱ[i][3] for i in 1:length(_yᵃⁱ)]

_p_AAA = [_yᵃⁱ[i][4] for i in 1:length(_yᵃⁱ)]
_p_BBB = [_yᵃⁱ[i][5] for i in 1:length(_yᵃⁱ)]

_x_AAA = [_x[i][1] for i in 1:length(_x)]
_x_BBB = [_x[i][2] for i in 1:length(_x)];

_p1 = plot(
    _x_AAA,
    title= "HighLow policy",
    size=(700, 800),
    seriestype=:step,
    xaxis= false,
    ylabel= L"x_{t,AAA}", ##"action\n"*L"x_{t,AAA}",
    tickfont=6,
    yguidefontrotation=-90,
    left_margin=15Plots.mm,
    color= "magenta",
    legend=:none,)

_p2 = plot(
    _x_BBB,
    title= "",
    seriestype=:step,
    xaxis= false,
    ylabel= L"x_{t,BBB}", ##"action\n"*L"x_{t,BBB}",
    tickfont=6,
    yguidefontrotation=-90,
    left_margin=15Plots.mm,
    color= "magenta",
    legend=:none,)

_p3 = plot(
    _p_AAA,
    title= "",
    seriestype=:step,
    xaxis= false,
    ylabel= L"p_{t,AAA}",
    tickfont=6,
    yguidefontrotation=-90,
    left_margin=15Plots.mm,
    color= "green",
    legend=:none,)
_p3 = hline!([_thLo_AAA], linestyle=:dash, label="Horizontal Line", color="magenta")
_p3 = annotate!(10, 0.95*_thLo_AAA, text(L"\theta^{Lo}="*"$(_thLo_AAA)", :center, 8, "magenta"))
_p3 = hline!([_thHi_AAA], linestyle=:dash, label="Horizontal Line", color="magenta")
_p3 = annotate!(10, 1.05*_thHi_AAA, text(L"\theta^{Hi}="*"$(_thHi_AAA)", :center, 8, "magenta"))

_p4 = plot(
    _p_BBB,
    title= "",
    seriestype=:step,
    xlabel= L"$t$",
    ylabel= L"p_{t,BBB}",
    tickfont=6,
    yguidefontrotation=-90,
    left_margin=15Plots.mm,
    color= "green",
    legend=:none,)
_p4 = hline!([_thLo_BBB], linestyle=:dash, label="Horizontal Line", color="magenta")
_p4 = annotate!(10, 0.95*_thLo_BBB, text(L"\theta^{Lo}="*"$(_thLo_BBB)", :center, 8, "magenta"))
_p4 = hline!([_thHi_BBB], linestyle=:dash, label="Horizontal Line", color="magenta")
_p4 = annotate!(10, 1.05*_thHi_BBB, text(L"\theta^{Hi}="*"$(_thHi_BBB)", :center, 8, "magenta"))

_p5 = plot(
    _R_AAA,
    seriestype=:step,
    xaxis= false,
    ylabel= L"R_{t,AAA}",
    tickfont=6,
    yguidefontrotation=-90,
    left_margin=15Plots.mm,
    color= "magenta",
    legend=:none,)

_p6 = plot(
    _R_BBB,
    seriestype=:step,
    xaxis= false,
    ylabel= L"R_{t,BBB}",
    tickfont=6,
    yguidefontrotation=-90,
    left_margin=15Plots.mm,
    color= "magenta",
    legend=:none,)

_p7 = plot(
    _R0,
    seriestype=:step,
    xaxis= false,
    ylabel= L"R^0_t",
    tickfont=6,
    yguidefontrotation=-90,
    left_margin=15Plots.mm,
    color= "magenta",
    legend=:none,)

_p8 = plot(
    _Ccum,
    seriestype=:step,
    xaxis= false,
    ylabel= "Profit\n"*L"C^{cum}",
    tickfont=6,
    yguidefontrotation=-90,
    left_margin=15Plots.mm,
    color= "magenta",
    legend=:none,)
        
plot(_p1, _p2, _p3, _p4, _p5, _p6, _p7, _p8, layout=(8,1))

We see that some actions are being taken. There is even a slight profit shown in the bottom chart.