District Manager Engagements Analyzed with Active Inference

Back to Portfolio of Projects | LearnableLoopAI.com | Blog |

In this project the client is a newly appointed district manager that is responsible for the optimal running of 4 retail stores in Washington state at the locations:

Bellingham
Ferndale
Lynden
Blaine

Symbols/Nomenclature/Notation (KUF)

[informed by Powell Universal Framework (PUF), Bert DeVries, AIF literature]

Taxonomy of Machine Learning

Supervised Learning (Regression, Classification)
- next state function
  - next state starts with provision: aquisition of another state/datapoint
    - $p_{i} = f_{p} (i)$
  - next state
- observation function
  - observation starts with response to next state
    - $r_{i} = f_{r} ({\overset{˘}{s}}_{i})$
  - combine with covariate noise (optional)
  - combine with observation noise $v_{i} = N ({\overset{˘}{m}}_{i}, {\overset{˘}{Σ}}_{V})$
  - observation
    - $y_{i} = {\overset{˘}{s}}_{t} + v_{t}$
- set of (unordered) independent observations
  - $(\overset{˘}{s}, y) no state noise$
  - $(z, y) with state noise$
Unsupervised Learning (Reduction of dimensionality, Clustering)
- next state function
  - next state starts with provision: aquisition of another state/datapoint
    - $p_{i} = f_{p} (i)$
  - next state
- observation function
  - observation starts with response to next state
    - $r_{i} = f_{r} ({\overset{˘}{s}}_{i})$
  - observation
    - $y_{t} = {\overset{˘}{s}}_{t}$
- set of (unordered) independent observations $y$
Sequential/Series Learning (Reinforcement Learning, Active Inference)
- next state function
  - next state starts with provision: transition from previous state/datapoint
    - $p_{t} = f_{p} ({\overset{˘}{s}}_{t - 1})$
  - combine with optional action $f_{E} (a_{t})$
  - combine with system noise $w_{t} = N (p_{t}, {\overset{˘}{Σ}}_{W})$
  - combine with optional disturbance/exogenous information $w_{d t}$
  - next state
    - ${\overset{˘}{s}}_{t} = f_{B} ({\overset{˘}{s}}_{t - 1}) + f_{E} (a_{t}) + w_{d t} + w_{t}$
- observation function
  - observation starts with response to next state
    - $r_{t} = f_{r} ({\overset{˘}{s}}_{t}) = f_{A} ({\overset{˘}{s}}_{t}) = \overset{˘}{A} {\overset{˘}{s}}_{t}$
  - combine with observation noise $v_{t} = N (r_{t}, {\overset{˘}{Σ}}_{V})$
  - observation
    - $y_{t} = f_{A} ({\overset{˘}{s}}_{t}) + v_{t}$
- sequence of (ordered) correlated observations $y$ (time/spatial)

Overall Structure

Experiment has one-to-many Batches
- Batch (into the page) has one-to-many Sequences
  - Sequence (down the page) has one-to-many Datapoints
    - Datapoint (into the page) has one-to-many Matrices
      - Matrix (down the page) has one-to-many Vectors
        
        Vector (towards right) has one-to-many Components
        
        Component/Element of type
        
        Numerical [continuous/proportional]
        
        int/real/float (continuous)
        
        Categorical [non-continuous/non-formal]
        
        AIF calls it ‘discrete’
        
        ordinal (ordered)
        
        nominal (no order)
        
        for computers, elements need to be numbers, so categoricals encoded as numbers too
Most complex Datapoint handled is a multispectral image, i.e. 3D

Generative Model

$A$ : Observation matrix
$B$ : Transition matrix
$C$ : Preference matrix
$D$ : Initial state matrix
$E$ : Behavior matrix

0 Active Inference: Bridging Minds and Machines

In recent years, the landscape of machine learning has undergone a profound transformation with the emergence of active inference, a novel paradigm that draws inspiration from the principles of biological systems to inform intelligent decision-making processes. Unlike traditional approaches to machine learning, which often passively receive data and adjust internal parameters to optimize performance, active inference represents a dynamic and interactive framework where agents actively engage with their environment to gather information and make decisions in real-time.

At its core, active inference is rooted in the notion of agents as embodied entities situated within their environments, constantly interacting with and influencing their surroundings. This perspective mirrors the fundamental processes observed in living organisms, where perception, action, and cognition are deeply intertwined to facilitate adaptive behavior. By leveraging this holistic view of intelligence, active inference offers a unified framework that seamlessly integrates perception, decision-making, and action, thereby enabling agents to navigate complex and uncertain environments more effectively.

One of the defining features of active inference is its emphasis on the active acquisition of information. Rather than waiting passively for sensory inputs, agents proactively select actions that are expected to yield the most informative outcomes, thus guiding their interactions with the environment. This active exploration not only enables agents to reduce uncertainty and make more informed decisions but also allows them to actively shape their environments to better suit their goals and objectives.

Furthermore, active inference places a strong emphasis on the hierarchical organization of decision-making processes, recognizing that complex behaviors often emerge from the interaction of multiple levels of abstraction. At each level, agents engage in a continuous cycle of prediction, inference, and action, where higher-level representations guide lower-level processes while simultaneously being refined and updated based on incoming sensory information.

The applications of active inference span a wide range of domains, including robotics, autonomous systems, neuroscience, and cognitive science. In robotics, active inference offers a promising approach for developing robots that can adapt and learn in real-time, even in unpredictable and dynamic environments. In neuroscience and cognitive science, active inference provides a theoretical framework for understanding the computational principles underlying perception, action, and decision-making in biological systems.

In conclusion, active inference represents a paradigm shift in machine learning, offering a principled and unified framework for understanding and implementing intelligent behavior in artificial systems. By drawing inspiration from the principles of biological systems, active inference holds the promise of revolutionizing our approach to building intelligent machines and understanding the nature of intelligence itself.

1 BUSINESS UNDERSTANDING

In this project the client is a newly appointed district manager that is responsible for the optimal running of 4 retail stores in Washington state at the locations:

Bellingham
Ferndale
Lynden
Blaine

His engagement with each of these branches will happen in terms of full weeks. The decision to engage with a specific store is up to the district manager. What is expected is for the district manager to engage with a store and to help improve and optimize operations. From past engagement data it is not clear how engagements decisions were made. Sometimes they appear to be random. There are also quick changes rather than longer running engagements, likely to put out sudden fires. The client wants to minimize the effect of his appointment to the role and would like, at least for a while, to follow the past pattern of engagements as closely as possible.

This analysis will make use of Bayesian inference within the larger scope of an approach known as Active Inference. In particular, the past manager engagements will be modeled as a Hidden Markov Model. This model can then be used by the client for suggestions on how to engage with the stores.

versioninfo() ## Julia version

Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_NUM_THREADS =

import Pkg
Pkg.add(Pkg.PackageSpec(;name="RxInfer"))
Pkg.add("Random")
Pkg.add(Pkg.PackageSpec(;name="Plots"))
Pkg.add(Pkg.PackageSpec(;name="LaTeXStrings"))
Pkg.add("XLSX")
Pkg.add("DataFrames")
Pkg.add("BenchmarkTools")
Pkg.add("Distributions")
Pkg.add("PrettyPrinting")

using RxInfer, Random, BenchmarkTools, Distributions, Plots, XLSX, DataFrames, PrettyPrinting

   Resolving package versions...
  No Changes to `~/.julia/environments/v1.11/Project.toml`
  No Changes to `~/.julia/environments/v1.11/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.11/Project.toml`
  No Changes to `~/.julia/environments/v1.11/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.11/Project.toml`
  No Changes to `~/.julia/environments/v1.11/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.11/Project.toml`
  No Changes to `~/.julia/environments/v1.11/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.11/Project.toml`
  No Changes to `~/.julia/environments/v1.11/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.11/Project.toml`
  No Changes to `~/.julia/environments/v1.11/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.11/Project.toml`
  No Changes to `~/.julia/environments/v1.11/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.11/Project.toml`
  No Changes to `~/.julia/environments/v1.11/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.11/Project.toml`
  No Changes to `~/.julia/environments/v1.11/Manifest.toml`

Pkg.status()

Status `~/.julia/environments/v1.11/Project.toml`
  [6e4b80f9] BenchmarkTools v1.5.0
  [a93c6f00] DataFrames v1.7.0
  [31c24e10] Distributions v0.25.113
  [b964fa9f] LaTeXStrings v1.4.0
⌃ [91a5bcdd] Plots v1.40.8
  [54e16d92] PrettyPrinting v0.4.2
  [86711068] RxInfer v3.7.1
  [fdbf4ff8] XLSX v0.10.4
  [9a3f8284] Random v1.11.0
Info Packages marked with ⌃ have new versions available and may be upgradable.

2 DATA UNDERSTANDING

The historical data spans a period of 9 years’ worth of weekly engagement data. Each weekly record consists of a 1-hot encoded engagement state vector. When the associated component of the state vector is a 1, the district manager was engaged with the associated store:

Similarly, we also have the observation vectors for each week:

The observations do not always agree with the states due to administrative errors. For example, a district manager might have failed to capture a specific engagement and eventually the capture was done with a misremenbered date.

3 DATA PREPARATION

We will use the data from the simulator and from the field directly. There is no need to perform additional data preparation.

4 MODELING

4.1 Narrative

Please review the narrative in section 1.

4.2 Core Elements

This section attempts to answer three important questions:

What metrics are we going to track?
What decisions do we intend to make?
What are the sources of uncertainty?

The only metric we are interested in is the engagement store of the district manager, i.e. at which of the 4 stores the manager is working for the week.

There are no control/steering decisions to be made. We are simply interested in understanding and modeling the past engagement behavior.

There are two sources of uncertainty. The first has to do with the fact that the state transitions are not deterministic but rather stochastic. This will be captured in the transition matrix $B$ . The second relates to observations. Engagements were not always recorded accurately. This means observations of past engagements sometimes differ from what they really were. This uncertainty will be captured in the observation matrix $A$ .

4.3 Environment Model (Generative Process)

To get insight into the engagement behavior, we need to formulate a model. Since we have a finite (and small number of) stores, we can use a categorical distribution to represent the district manager’s engagement. There are four stores which means we will have four distinct values with associated probabilities in the categorical distribution. For time we will use $t$ . The state of the engagement at time $t$ will be indicated by $s_{t}$ . The associated observation will be indicated by $y_{t}$ .

The probabilities for switching between stores will be captured in a transition matrix $B$ . To test our approach we will assign these probabilities manually in a simulation and then see to what extent the model can learn these. From the past data it appears that transition probabilities can vary quite a bit. Once we are confident in our approach we will use the past data from the field to learn the transition probabilities without a reference to compare them against.

We will also learn the observation matrix $A$ which reflects how well observations of the true state were made. At time $t$ , the observation will be indicated by $y_{t}$ . The observation matrix encodes the likelihood that the record of an engagement was a mistake so that a wrong observation is made. The model may be specified as follows:

$\begin{aligned} s_{t} & \sim C a t (B s_{t - 1}), \\ y_{t} & \sim C a t (A s_{t}) . \end{aligned}$

This type of discrete state space model is known as a Hidden Markov Model (HMM). In summary, our goal is to learn the matrices $B$ and $A$ so we can use them to understand the past engagement behavior of district managers.

"""
Returns a one-hot encoding of a random sample from a categorical distribution. 
The sample is drawn with the `rng` random number generator.
"""
function rand_1hot_vec(rng, distribution::Categorical)
    k = ncategories(distribution)
    s = zeros(k)
    drawn_category = rand(rng, distribution)
    s[drawn_category] = 1.0
    return s
end

rand_1hot_vec

4.3.1 State variables

The state variables represent what we need to know. These are captured in a state vector $s_{t}$ .

4.3.2 Decision variables

There are no decisions to be made. We are simply observing the past behavior of district manager engagements, i.e. there is no control/steering applied to the engagement environment.

4.3.3 Exogenous information variables

There are no exogenous information variables.

4.3.4 Next State function

The provision function, $f_{p} ()$ , provides another state/datapoint, called the provision/pre-state. Because this is a sequential system, the provision function transitions to the next state/datapoint from the previous state making use of a simulation or a data set.

$p_{t} = f_{p} ({\overset{˘}{s}}_{t - 1}) = f_{B} ({\overset{˘}{s}}_{t - 1}) = \overset{˘}{B} {\overset{˘}{s}}_{t - 1}$

## provision function, provides another state/datapoint from simulation
function fˢⁱᵐₚ(s̆ₜ₋₁; B̆, 𝙼, 𝚅, 𝙲, rng)
    dp = Vector{Vector{Vector{Float64}}}(undef, 𝙼)
    for m in 1:𝙼 ## Matrices
        dp[m] = Vector{Vector{Float64}}(undef, 𝚅)
        for v in 1:𝚅 ## Vectors
            dp[m][v] = Vector{Float64}(undef, 𝙲)
            ## for c in 1:𝙲 ## Components
            ##    dp[m][v][c] = float(rand(rng, Poisson(θ̆)))
            ## end
            sdis_t = B̆*first(first(s̆ₜ₋₁))
            dp[m][v] = rand_1hot_vec(rng, Categorical(sdis_t))
        end
    end
    s̆ = dp
    return s̆
end
_s̆ₜ₋₁ = [[[0.0, 1.0, 0.0, 0.0]]]
_B̆ˢⁱᵐ = [
    0.90 0.05 0.10 0.05;
    0.00 0.85 0.05 0.00;
    0.05 0.05 0.80 0.05;
    0.05 0.05 0.05 0.90]
_rng = MersenneTwister(42)
## _s̆ = fˢⁱᵐₚ(_s̆ₜ₋₁, B̆=_B̆ˢⁱᵐ, 𝙼=3, 𝚅=4, 𝙲=5, rng=_rng) ## color image with 3 colors, 4 rows, 5 cols of elements
## _s̆ = fˢⁱᵐₚ(_s̆ₜ₋₁, B̆=_B̆ˢⁱᵐ, 𝙼=1, 𝚅=4, 𝙲=5, rng=_rng) ## b/w image with 4 rows, 5 cols of elements
_s̆ = fˢⁱᵐₚ(_s̆ₜ₋₁, B̆=_B̆ˢⁱᵐ, 𝙼=1, 𝚅=1, 𝙲=4, rng=_rng) ## vector with 4 elements
pprint(IOContext(stdout, :displaysize => (24, 30)), _s̆)

[[[0.0, 1.0, 0.0, 0.0]]]

## provision function, provides another state/datapoint from field
function fᶠˡᵈₚ(d; 𝙼, 𝚅, 𝙲, df)
    dp = Vector{Vector{Vector{Float64}}}(undef, 𝙼)
    for m in 1:𝙼 ## Matrices
        dp[m] = Vector{Vector{Float64}}(undef, 𝚅)
        for v in 1:𝚅 ## Vectors    
            dp[m][v] = Vector{Float64}(undef, 𝙲)
            ## for c in 1:𝙲 ## Components
            ##     e[m][v][c] = df[s, :incidents]
            ## end
            dp[m][v] = Vector(df[d, :])
        end
    end
    s̆ = dp
    return s̆
end
## dp = fᶠˡᵈₚ(1, 𝙼=3, 𝚅=4, 𝙲=4, df=_fld_df) ## color image with 3 colors, 4 rows, 5 cols of elements
## dp = fᶠˡᵈₚ(1, 𝙼=1, 𝚅=4, 𝙲=4, df=_fld_df) ## b/w image with 4 rows, 5 cols of elements
## dp = fᶠˡᵈₚ(1, 𝙼=1, 𝚅=1, 𝙲=4, df=_fld_df) ## vector with 5 elements

fᶠˡᵈₚ (generic function with 1 method)

There is no action $f_{E} (a_{t})$ to be combined with.

When we combine with system noise $w_{t} = N (p_{t}, {\overset{˘}{Σ}}_{W})$ :

${\overset{˘}{s}}_{t} = \overset{˘}{B} {\overset{˘}{s}}_{t - 1} + w_{t}$

There is no optional disturbance/exogenous information $w_{d t}$ to be combined with.

For this problem we can summarize the next state as:

${\overset{˘}{s}}_{t} \sim C a t (\overset{˘}{B} {\overset{˘}{s}}_{t - 1})$

The breve/bowls indicate that the parameters and variables are hidden and not observed.

4.3.5 Observation function

The response function, $f_{r} ()$ , provides the response to the state/datapoint, called the response: $r_{t} = f_{r} ({\overset{˘}{s}}_{t}) = f_{A} ({\overset{˘}{s}}_{t}) = \overset{˘}{A} {\overset{˘}{s}}_{t}$

## response function, provides the response to a state/datapoint
function fᵣ(s̆, Ă, rng)
    odis_t = Ă*first(first(s̆))
    o_t = rand_1hot_vec(rng, Categorical(odis_t))
    return [[o_t]]
end
_Ăˢⁱᵐ = [
    0.94 0.02 0.02 0.02;
    0.02 0.94 0.02 0.02;
    0.02 0.02 0.94 0.02;
    0.02 0.02 0.02 0.94]
fᵣ(_s̆, _Ăˢⁱᵐ, _rng)

1-element Vector{Vector{Vector{Float64}}}:
 [[0.0, 1.0, 0.0, 0.0]]

Because the noise is already captured in $_{\overset{˘}{A}}^{sim}$ , the next observation becomes

$y_{i} = {\overset{˘}{s}}_{i}$

For this problem we can summarize the next state as:

$y_{t} \sim C a t (\overset{˘}{A} {\overset{˘}{s}}_{t})$

The breve/bowl indicates that the parameters and variables are hidden and not observed.

4.3.6 Implementation of the Environment Model (Generative Process)

_STORES = ["Bellingham", "Ferndale", "Lynden", "Blaine"]

4-element Vector{String}:
 "Bellingham"
 "Ferndale"
 "Lynden"
 "Blaine"

To simulate the engagement behavior, we need to specify:

the actual transition probabilities between the states (how likely is the manager to move from one store to another)
the observation probabilities (i.e., how reliably will engagements be captured)

These specifications will allow us to generate observations from the hidden Markov model (HMM).

Here are the steps to generate observation data:

Assume an initial engagement state (store) for the district manager. We will have the initial engagement at Bellingham.
Determine where the manager went next by drawing from a Categorical distribution with the transition probabilities between the different stores.
Determine the observation encountered in the capture records by drawing from a Categorical distribution with the corresponding observation probabilities.
Repeat steps 2 and 3 for as many samples as needed.

The following code implements the above process and generates our simulated observation data:

## Data comes from either a simulation/lab (sim|lab) OR from the field (fld)
## Data are handled either in batches (batch) OR online as individual points (point)
function sim_data(rng, 𝚂, 𝙳, 𝙼, 𝚅, 𝙲, B̆, Ă, s₀)
    p = Vector{Vector{Vector{Vector{Vector{Float64}}}}}(undef, 𝚂)
    s̆ = Vector{Vector{Vector{Vector{Vector{Float64}}}}}(undef, 𝚂)
    r = Vector{Vector{Vector{Vector{Vector{Float64}}}}}(undef, 𝚂)
    y = Vector{Vector{Vector{Vector{Vector{Float64}}}}}(undef, 𝚂)
    for s in 1:𝚂 ## Sequences
        p[s] = Vector{Vector{Vector{Vector{Float64}}}}(undef, 𝙳)
        s̆[s] = Vector{Vector{Vector{Vector{Float64}}}}(undef, 𝙳)
        r[s] = Vector{Vector{Vector{Vector{Float64}}}}(undef, 𝙳)
        y[s] = Vector{Vector{Vector{Vector{Float64}}}}(undef, 𝙳)
        s̆ₜ₋₁ = s₀ ## keep previous state
        for d in 1:𝙳 ## Datapoints
            p[s][d] = fˢⁱᵐₚ(s̆ₜ₋₁; B̆=B̆, 𝙼=𝙼, 𝚅=𝚅, 𝙲=𝙲, rng=rng)
            s̆[s][d] = p[s][d] ## no system noise
            
            r[s][d] = fᵣ(s̆[s][d], Ă, rng)
            y[s][d] = r[s][d]

            s̆ₜ₋₁ = s̆[s][d]
        end
    end
    return y, s̆
end;

## Not used because the state is not available
function fld_data(df, 𝚂, 𝙳, 𝙼, 𝚅, 𝙲)
    p = Vector{Vector{Vector{Vector{Vector{Float64}}}}}(undef, 𝚂)
    s̆ = Vector{Vector{Vector{Vector{Vector{Float64}}}}}(undef, 𝚂)
    r = Vector{Vector{Vector{Vector{Vector{Float64}}}}}(undef, 𝚂)
    y = Vector{Vector{Vector{Vector{Vector{Float64}}}}}(undef, 𝚂)
    for s in 1:𝚂 ## Sequences
        p[s] = Vector{Vector{Vector{Vector{Float64}}}}(undef, 𝙳)
        s̆[s] = Vector{Vector{Vector{Vector{Float64}}}}(undef, 𝙳)
        r[s] = Vector{Vector{Vector{Vector{Float64}}}}(undef, 𝙳)
        y[s] = Vector{Vector{Vector{Vector{Float64}}}}(undef, 𝙳)
        for d in 1:𝙳 ## Datapoints
            ## p[s][d] = fᶠˡᵈₚ(d; 𝙼=𝙼, 𝚅=𝚅, 𝙲=𝙲, df=df)
            ## s̆[s][d] = p[s][d] ## no system noise
            ## r[s][d] = fᵣ(s̆[s][d])
            ## y[s][d] = r[s][d]

            y[s][d] = [[Vector(df[d, :])]]
        end
    end
    return y
end;

Next we will generate a number of weekly data points to simulate engagements. xSim will contain the measurements (capture records) and sSim will contain information on the actual engagements.

## number of Batches in an experiment
## _𝙱 = 1 ## not used yet

## number of Sequences/examples in a batch
_𝚂 = 1

## number of Datapoints in a sequence
# _𝙳 = 600
# _𝙳 = 700
# _𝙳 = 1000
# _𝙳 = 1500 #usable
# _𝙳 = 1600 #better
# _𝙳 = 1700 #worse
# _𝙳 = 2000 #worse
_𝙳 = 1600 #better

## number of Matrices in a datapoint
_𝙼 = 1

## number of Vectors in a matrix
_𝚅 = 1

## number of Components in a vector
_𝙲 = 4

_B̆ˢⁱᵐ = [
    0.90 0.05 0.10 0.05;
    0.00 0.85 0.05 0.00;
    0.05 0.05 0.80 0.05;
    0.05 0.05 0.05 0.90]
_Ăˢⁱᵐ = [
    0.94 0.02 0.02 0.02;
    0.02 0.94 0.02 0.02;
    0.02 0.02 0.94 0.02;
    0.02 0.02 0.02 0.94]
_s₀ = [[[1.0, 0.0, 0.0, 0.0]]] ## initial state
_seed = 42
_rng = MersenneTwister(_seed)

MersenneTwister(42)

_yˢⁱᵐ, _s̆ˢⁱᵐ = sim_data(_rng, _𝚂, _𝙳, _𝙼, _𝚅, _𝙲, _B̆ˢⁱᵐ, _Ăˢⁱᵐ, _s₀) ## simulated data

_s̆ˢⁱᵐ = _s̆ˢⁱᵐ[1] ## first (and only) sequence
_s̆ˢⁱᵐ = first.(first.(_s̆ˢⁱᵐ))

_yˢⁱᵐ = _yˢⁱᵐ[1] ## first (and only) sequence
_yˢⁱᵐ = first.(first.(_yˢⁱᵐ));

_s̆ˢⁱᵐ

1600-element Vector{Vector{Float64}}:
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 ⋮
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [0.0, 0.0, 1.0, 0.0]
 [0.0, 0.0, 1.0, 0.0]
 [0.0, 0.0, 1.0, 0.0]
 [0.0, 0.0, 1.0, 0.0]
 [0.0, 0.0, 1.0, 0.0]
 [0.0, 0.0, 1.0, 0.0]

_yˢⁱᵐ

1600-element Vector{Vector{Float64}}:
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 ⋮
 [0.0, 1.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [0.0, 0.0, 0.0, 1.0]
 [0.0, 0.0, 1.0, 0.0]
 [0.0, 0.0, 1.0, 0.0]
 [0.0, 0.0, 1.0, 0.0]
 [0.0, 0.0, 1.0, 0.0]
 [0.0, 0.0, 1.0, 0.0]

plot(title="States & Observations for Simulated Data")
plot!(argmax.(_yˢⁱᵐ), linetype=:steppre, linestyle=:solid, leg=:right, label="observation", xlabel="Weeks", yticks=([1,2,3,4], _STORES))
plot!(argmax.(_s̆ˢⁱᵐ), linetype=:steppre, linestyle=:dashdot, leg=:right, label="state", xlabel="Weeks", yticks=([1,2,3,4], _STORES))

4.4 Uncertainty Model

The uncertainty model is captured in the transition matrix $B$ and the observation matrix $A$ .

4.5 Agent Model (Generative Model)

4.5.1 Implementation of the Agent Model (Generative Model)

We will use the RxInfer Julia package. RxInfer stands at the forefront of Bayesian inference tools within the Julia ecosystem, offering a powerful and versatile platform for probabilistic modeling and analysis. Built upon the robust foundation of the Julia programming language, RxInfer provides researchers, data scientists, and practitioners with a streamlined workflow for conducting Bayesian inference tasks with unprecedented speed and efficiency.

At its core, RxInfer leverages cutting-edge techniques from the realm of reactive programming to enable dynamic and interactive model specification and estimation. This unique approach empowers users to define complex probabilistic models with ease, seamlessly integrating prior knowledge, data, and domain expertise into the modeling process.

With RxInfer, conducting Bayesian inference tasks becomes a seamless and intuitive experience. The package offers a rich set of tools for performing parameter estimation, model comparison, and uncertainty quantification, all while leveraging the high-performance capabilities of Julia to deliver results in a fraction of the time required by traditional methods.

Whether tackling problems in machine learning, statistics, finance, or any other field where uncertainty reigns supreme, RxInfer equips users with the tools they need to extract meaningful insights from their data and make informed decisions with confidence.

RxInfer represents a paradigm shift in the world of Bayesian inference, combining the expressive power of Julia with the flexibility of reactive programming to deliver a state-of-the-art toolkit for probabilistic modeling and analysis. With its focus on speed, simplicity, and scalability, RxInfer is poised to become an indispensable tool for researchers and practitioners seeking to harness the power of Bayesian methods in their work.

To configure the model in RxInfer, we will use Categorical distributions for the states and observations. To learn the $B$ and $A$ matrices we can use MatrixDirichlet priors. Since we have no apriori idea how the engagements will play out we will assume that it happens randomly. For the $B$ -matrix, we can represent this by filling our MatrixDirichlet prior on $B$ with all ones. These values will get updated once learning starts.

For the observations, it is reasonable to assume that record keeping for engagements are very accurate. To configure this, we will have large values on the diagonal of $A$ ’s prior. However, to allow for the odd errors in record keeping, we will add some noise on the off-diagonal entries.

We will use Variational Inference. This means we will have to specify inference constraints. Using a structured variational approximation to the true posterior distribution, we will decouple the variational posterior over the states (q(s_0, s)) from the posteriors over the transition matrices (q(B) and q(A)). This dependency decoupling in the approximate posterior distribution ensures that inference is tractable.

Next, we specify this model in RxInfer. We will make use of the following nodes in the Forney Factor Graph (FFG):

MatrixDirichlet
Categorical
Transition
Kronecker- $δ$ nodes (for the N observations)

## Model specification
@model function hidden_markov_model(x, N)    
    B ~ MatrixDirichlet(ones(4, 4)) ## transition matrix
    A ~ MatrixDirichlet([10.0  1.0  1.0  1.0;  ## observation matrix
                          1.0 10.0  1.0  1.0; 
                          1.0  1.0 10.0  1.0;
                          1.0  1.0  1.0 10.0])
    s₀ ~ Categorical(fill(1.0/4.0, 4)) ## initial state
    ## s = randomvar(N)
    ## x = datavar(Vector{Float64}, N)
    
    sₜ₋₁ = s₀ ## initialize the previous state
    for t in 1:N
        s[t] ~ Transition(sₜ₋₁, B)
        x[t] ~ Transition(s[t], A)
        sₜ₋₁ = s[t] ## keep the previous state
    end
end

## Constraints specification
@constraints function hidden_markov_model_constraints()
    q(s₀, s, B, A) = q(s₀, s)q(B)q(A)
end

hidden_markov_model_constraints (generic function with 1 method)

4.6 Agent Evaluation

Next we will perform inference to see how engagements changed.

Using Variational Inference to perform inference, means we need to set some initial marginals as a starting point. This is made easy in RxInfer by using the vague function, which provides an uninformative guess. Different initial guesses can also be tried.

We are only interested in the final result - the best guess about the engagement store. So we will only keep the last results.

4.6.1 Evaluate with simulated data

## _imarginals = (    
##     B=vague(MatrixDirichlet, 4, 4), 
##     A=vague(MatrixDirichlet, 4, 4), 
##     s=vague(Categorical, 4)
## )
_imarginals = @initialization begin
    q(A) = vague(MatrixDirichlet, 4, 4)
    q(B) = vague(MatrixDirichlet, 4, 4) 
    q(s) = vague(Categorical, 4)
end
_resultSim = infer(
    model = hidden_markov_model(N=_𝙳),
    data = (x=_yˢⁱᵐ,),
    constraints  =hidden_markov_model_constraints(),
    initialization=_imarginals,
    returnvars = (    
        B=KeepLast(),
        A=KeepLast(),
        s=KeepLast()
    ),
    iterations   =30,
    free_energy  =true
);

We obtain the estimated posteriors for $B$ and $A$ . Then we compare them with the $B$ and $A$ matrices we setup for the simulation. For both matrices the estimates are quite good. This gives us confidence that we might get useful results for the field data as well.

println("Posterior Marginal for B:")
_Bˢⁱᵐ = mean(_resultSim.posteriors[:B])

Posterior Marginal for B:

4×4 Matrix{Float64}:
 0.895173    0.0358848  0.10032    0.0675268
 0.00167094  0.844302   0.0573432  0.00285976
 0.0457856   0.0342896  0.784693   0.0473008
 0.0573708   0.085523   0.0576435  0.882313

This compares well with:

_BSim = [
    0.90 0.05 0.10 0.05;
    0.00 0.85 0.05 0.00;
    0.05 0.05 0.80 0.05;
    0.05 0.05 0.05 0.90]

println("Posterior Marginal for A:")
_Aˢⁱᵐ = mean(_resultSim.posteriors[:A])

Posterior Marginal for A:

4×4 Matrix{Float64}:
 0.944798   0.0134634  0.0353147   0.0153189
 0.0268523  0.942494   0.00741187  0.0203064
 0.0180555  0.0309541  0.952193    0.00859556
 0.0102937  0.013089   0.00508017  0.955779

This compares well with:

_ASim = [
    0.94 0.02 0.02 0.02;
    0.02 0.94 0.02 0.02;
    0.02 0.02 0.94 0.02;
    0.02 0.02 0.02 0.94]

Next we visualize the results by comparing the real states with the inferred states. We also verify if the model has converged by looking at the Free Energy.

_p1 = scatter(argmax.(_s̆ˢⁱᵐ), 
    title="Simulation results for manager engagement", 
    label="Real state", 
    legend=:right,
    xlabel="Weeks" ,
    yticks=([1,2,3,4], _STORES),
    markercolor=:cyan, #.
    markersize=6,
    ## markerstrokecolor=:red,
    ## markeralpha=0,
    markerstrokewidth=0,
    ## markerstrokealpha=.5,
    alpha=.5,
    size=(900,550)
)
_p1 = scatter!(_p1, argmax.(ReactiveMP.probvec.(_resultSim.posteriors[:s])),
    label="Inferred state",
    markercolor=:green,
    markersize=3,
    ## markerstrokecolor=:blue,
    ## markeralpha=0,
    markerstrokewidth=0,
    ## markerstrokealpha=.5,
    alpha=.5
)
_p2 = plot(_resultSim.free_energy, 
    label="Free energy",
    xlabel="Iteration"
)
plot(_p1, _p2, layout=@layout([ a; b ]))

5 Evaluation

Now it is time to evaluate the agent/model with the historical field data. There are 468 weekly records. So 468 weeks = 9 years * 52 weeks/year.

_yᶠˡᵈ_df = DataFrame(XLSX.readtable("ManagerEngagements_observations.xlsx", "Sheet1"));
_yᶠˡᵈ_mat = Float64.(Matrix(_yᶠˡᵈ_df))
_yᶠˡᵈ = [_yᶠˡᵈ_mat[r,1:end] for r in 1:size(_yᶠˡᵈ_mat)[1]]

## Not used because the state is not available:
## _yᶠˡᵈ = fld_data(_yᶠˡᵈ_df, _𝚂, _𝙳, _𝙼, _𝚅, _𝙲) ## field data
## _yᶠˡᵈ = _yᶠˡᵈ[1] ## first (and only) sequence
## _yᶠˡᵈ = first.(first.(_yᶠˡᵈ));

## pprint(IOContext(stdout, :displaysize => (24, 30)), _yᶠˡᵈ)

468-element Vector{Vector{Float64}}:
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [0.0, 0.0, 0.0, 1.0]
 [0.0, 0.0, 0.0, 1.0]
 [0.0, 0.0, 0.0, 1.0]
 [0.0, 0.0, 0.0, 1.0]
 [0.0, 0.0, 0.0, 1.0]
 [0.0, 0.0, 0.0, 1.0]
 [0.0, 0.0, 0.0, 1.0]
 [0.0, 0.0, 1.0, 0.0]
 ⋮
 [0.0, 1.0, 0.0, 0.0]
 [0.0, 1.0, 0.0, 0.0]
 [0.0, 1.0, 0.0, 0.0]
 [0.0, 1.0, 0.0, 0.0]
 [0.0, 1.0, 0.0, 0.0]
 [0.0, 1.0, 0.0, 0.0]
 [0.0, 1.0, 0.0, 0.0]
 [0.0, 1.0, 0.0, 0.0]
 [0.0, 0.0, 1.0, 0.0]

_s̆ᶠˡᵈ_df = DataFrame(XLSX.readtable("ManagerEngagements_states.xlsx", "Sheet1"));
_s̆ᶠˡᵈ_mat = Float64.(Matrix(_s̆ᶠˡᵈ_df))
_s̆ᶠˡᵈ = [_s̆ᶠˡᵈ_mat[r,1:end] for r in 1:size(_s̆ᶠˡᵈ_mat)[1]]

468-element Vector{Vector{Float64}}:
 [1.0, 0.0, 0.0, 0.0]
 [1.0, 0.0, 0.0, 0.0]
 [0.0, 0.0, 0.0, 1.0]
 [0.0, 0.0, 0.0, 1.0]
 [0.0, 0.0, 0.0, 1.0]
 [0.0, 0.0, 0.0, 1.0]
 [0.0, 0.0, 0.0, 1.0]
 [0.0, 0.0, 0.0, 1.0]
 [0.0, 0.0, 0.0, 1.0]
 [0.0, 0.0, 1.0, 0.0]
 ⋮
 [0.0, 1.0, 0.0, 0.0]
 [0.0, 1.0, 0.0, 0.0]
 [0.0, 1.0, 0.0, 0.0]
 [0.0, 1.0, 0.0, 0.0]
 [0.0, 1.0, 0.0, 0.0]
 [0.0, 1.0, 0.0, 0.0]
 [0.0, 1.0, 0.0, 0.0]
 [0.0, 1.0, 0.0, 0.0]
 [0.0, 0.0, 1.0, 0.0]

plot(title="States & Observations for Field Data")
plot!(argmax.(_yᶠˡᵈ), linetype=:steppre, linestyle=:solid, leg=:right, label="observation", xlabel="Weeks", yticks=([1,2,3,4], _STORES))
plot!(argmax.(_s̆ᶠˡᵈ), linetype=:steppre, linestyle=:dashdot, leg=:right, label="state", xlabel="Weeks", yticks=([1,2,3,4], _STORES))

## _imarginals = (    
##     A=vague(MatrixDirichlet, 4, 4), 
##     B=vague(MatrixDirichlet, 4, 4), 
##     s=vague(Categorical, 4)
## )
_imarginals = @initialization begin
    q(A) = vague(MatrixDirichlet, 4, 4)
    q(B) = vague(MatrixDirichlet, 4, 4) 
    q(s) = vague(Categorical, 4)
end
_𝙳 = 468 ## 468 weeks = 9 years * 52 weeks/year
_resultFld = infer(
    model = hidden_markov_model(N=_𝙳),
    data = (x=_yᶠˡᵈ,),
    constraints  =hidden_markov_model_constraints(),
    initialization=_imarginals,
    returnvars = (    
        B=KeepLast(),
        A=KeepLast(),
        s=KeepLast()
    ),
    iterations   =25,
    free_energy  =true
);

println("Posterior Marginal for B:")
_Bᶠˡᵈ = mean(_resultFld.posteriors[:B])

Posterior Marginal for B:

4×4 Matrix{Float64}:
 0.663299  0.0424594  0.126407   0.0759948
 0.165723  0.8885     0.049017   0.107181
 0.108625  0.0215407  0.748939   0.13033
 0.062353  0.0475     0.0756369  0.686494

println("Posterior Marginal for A:")
_Aᶠˡᵈ = mean(_resultFld.posteriors[:A])

Posterior Marginal for A:

4×4 Matrix{Float64}:
 0.855279   0.0243141  0.0159978  0.0379516
 0.046414   0.927419   0.0135922  0.0187958
 0.0465656  0.0196343  0.92911    0.0435644
 0.0517412  0.0286326  0.0413003  0.899688

We visualize the results again:

_p1 = scatter(argmax.(_s̆ᶠˡᵈ),
    title="Field results for manager engagement", 
    label="Real state", 
    legend=:right,
    xlabel="Weeks",
    yticks=([1,2,3,4], _STORES),
    markercolor=:cyan,
    markersize=6,
    ## markerstrokecolor=:red,
    ## markeralpha=0,
    markerstrokewidth=0,
    ## markerstrokealpha=.5,
    alpha=.5,
    size=(900,550)
)
_p1 = scatter!(_p1, argmax.(ReactiveMP.probvec.(_resultFld.posteriors[:s])),
    label="Inferred state",
    markercolor=:green, #.
    markersize=3,
    ## markerstrokecolor=:blue,
    ## markeralpha=0,
    markerstrokewidth=0,
    ## markerstrokealpha=.5,
    alpha=.5
)
_p2 = plot(_resultFld.free_energy, 
    label="Free energy",
    xlabel="Iteration"
)
plot(_p1, _p2, layout=@layout([ a; b ]))

The model showed good conversion based on the free energy. However, we have less data for the agent to learn from (468 data points) than we had in the case of the simulation (1,600 data points). Our field agent model will probably be not as good as the simulation agent. We also do not have access to the $B$ and $A$ matrices that generated the field data. However, we do take comfort in the good declining behavior of the free energy chart.

Let’s look at the matrices (here rounded to 2 decimal places):

$B = [\begin{matrix} 0.66 & 0.04 & 0.13 & 0.07 \\ 0.17 & 0.89 & 0.05 & 0.11 \\ 0.11 & 0.02 & 0.75 & 0.13 \\ 0.06 & 0.05 & 0.07 & 0.69 \end{matrix}]$

$A = [\begin{matrix} 0.86 & 0.02 & 0.02 & 0.04 \\ 0.04 & 0.93 & 0.01 & 0.02 \\ 0.05 & 0.02 & 0.93 & 0.04 \\ 0.05 & 0.03 & 0.04 & 0.90 \end{matrix}]$

In table form this is:

$B$ matrix	Bellingham	Ferndale	Lynden	Blaine
Bellingham	0.66	0.04	0.13	0.07
Ferndale	0.17	0.89	0.05	0.11
Lynden	0.11	0.02	0.75	0.13
Blaine	0.06	0.05	0.07	0.69

$A$ matrix	Bellingham	Ferndale	Lynden	Blaine
Bellingham	0.86	0.02	0.02	0.04
Ferndale	0.04	0.93	0.01	0.02
Lynden	0.05	0.02	0.93	0.04
Blaine	0.05	0.03	0.04	0.90

Consider the $B$ matrix and assume the manager is currently engaged at the Lynden store. This means there is a 75% probability that he will still be at this store next week, a 13% probability that he will move to the Bellingham store, a 5% probability that he will move to the Ferndal store, and a 7% probability that he will move to the Blaine store next week. Sampling the categorical distribution of each store and changing engagement accordingly will ensure that the district manager’s engagement behavior is consistent with past engagement behavior.