Back to Portfolio of Projects | LearnableLoopAI.com | Blog |

The overall purpose of the project is to lay the ground work for another project that makes use of a categorical control space. The purpose of the Part 1b project is to experiment with the code from the paper:

Koudahl, M. T., van de Laar, T. W., & de Vries, B. (2023). Realising Synthetic Active Inference Agents, Part I: Epistemic Objectives and Graphical Specification Language arXiv:2306.08014

T-maze Active Inference - Policy Inference (Part 1b)

1 BUSINESS UNDERSTANDING

A mouse lives in a T-maze as shown in the next figure.

Either the left arm (L) or the right arm (R) of the maze contains a reward in each trial. A trial always starts with the mouse in the origin position, O. The mouse can go directly to L or to R to try and find the reward. The C postion contains a cue of where the reward is for the particular trial. The mouse can choose to first go to C, get the cue and then go to the reward location (A or B). Optimal navigation is to first go to C and then to the arm with the reward, either L or R. Moving in this way delays the reward which means that a greedy policy will lead to non-optimal behavior. When the mouse reaches either A or B it is mandated to return to the origin C indicating the end of the trial.

This project is in the form of an analysis of two papers:

Koudahl, M. T., van de Laar, T. W., & de Vries, B. (2023). Realising Synthetic Active Inference Agents, Part I: Epistemic Objectives and Graphical Specification Language arXiv:2306.08014

van de Laar, T. W., Koudahl, M. T., & de Vries, B. (2023). Realising Synthetic Active Inference Agents, Part II: Variational Message Passing Updates arXiv:2306.02733

The analysis in this notebook is mostly based on the first paper. The purpose of the current project is to lay the ground work for another project that makes use of a categorical control space. In general, the diagrams have been reproduced from the mentioned papers.

versioninfo() ##. Julia version

Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 1 on 12 virtual cores
Environment:
  JULIA_NUM_THREADS =

import Pkg
Pkg.add(Pkg.PackageSpec(;name="RxInfer", version="2.10.4"))
Pkg.add(Pkg.PackageSpec(;name="Distributions", version="0.25.108"))
Pkg.add(Pkg.PackageSpec(;name="StatsBase", version="0.33.21"))
Pkg.add(Pkg.PackageSpec(;name="DomainSets", version="0.6.6"))
Pkg.add(Pkg.PackageSpec(;name="StatsFuns", version="1.3.0"))
Pkg.add(Pkg.PackageSpec(;name="ReactiveMP", version="3.8.1"))
Pkg.add(Pkg.PackageSpec(;name="ForwardDiff", version="0.10.35"))
Pkg.add(Pkg.PackageSpec(;name="TupleTools", version="1.3.0"))

using RxInfer, LinearAlgebra, Distributions, Random

    Updating registry at `~/.julia/registries/General.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.8/Project.toml`
  No Changes to `~/.julia/environments/v1.8/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.8/Project.toml`
  No Changes to `~/.julia/environments/v1.8/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.8/Project.toml`
  No Changes to `~/.julia/environments/v1.8/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.8/Project.toml`
  No Changes to `~/.julia/environments/v1.8/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.8/Project.toml`
  No Changes to `~/.julia/environments/v1.8/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.8/Project.toml`
  No Changes to `~/.julia/environments/v1.8/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.8/Project.toml`
  No Changes to `~/.julia/environments/v1.8/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.8/Project.toml`
  No Changes to `~/.julia/environments/v1.8/Manifest.toml`

Pkg.status() ##.

Status `~/.julia/environments/v1.8/Project.toml`
  [31c24e10] Distributions v0.25.108
⌅ [5b8099bc] DomainSets v0.6.6
⌃ [f6369f11] ForwardDiff v0.10.35
⌅ [a194aa59] ReactiveMP v3.8.1
⌅ [86711068] RxInfer v2.10.4
⌅ [2913bbd2] StatsBase v0.33.21
⌃ [4c63d2b9] StatsFuns v1.3.0
⌃ [9d95972d] TupleTools v1.3.0
Info Packages marked with ⌃ and ⌅ have new versions available, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`

Random.seed!(1909)

TaskLocalRNG()

2 DATA UNDERSTANDING

There is no pre-existing data to be analyzed.

3 DATA PREPARATION

There is no pre-existing data to be prepared.

4 MODELING

4.1 Narrative

The next figure (from Bert de Vries at Eindhoven Technical University) shows the interactions between the agent and the environment:

The grey area shows the markov blanket of the agent. The interactions between the agent and environment can be summarized by:

$\begin{aligned} a_{t} & \sim q (u_{t}) \\ x_{t} & = y_{t} \end{aligned}$

This means that actions on the environment are sampled from the posterior over control signals. We will explain a bit more down below.

4.2 Core Elements

This section attempts to answer three important questions:

What metrics are we going to track?
What decisions do we intend to make?
What are the sources of uncertainty?

For this problem, we will track:

the position of the mouse
the position of the reward
whether a reward is obtained

Decisions will be in the form of agent-prescribed actions to go to one of the 4 positions.

The sources of uncertainty relating to the environment will be

the accuracy of the cue in position C

4.3 System-Under-Steer / Environment / Generative Process

The system-under-steer/environment/generative process is the mouse within the T-maze. The brain of the mouse, although embedded within the mouse which is part of the environment, is considered distinct from the environment and plays the role of the agent. The state of the environment will have a component for the position of the mouse, and another component for the position of the reward. The mouse will be steered (by its brain) by means of commands to go to specific positions in the maze.

4.3.1 State and Observation variables

The state at time $t$ of the mouse will be given by:

${\tilde{s}}_{t} = (P)$

where

$P = {O, C, L, R}$ : the position of the mouse

The observation made by the mouse at time $t$ will be given by:

$y_{t} = (O)$

where

$CL$ : the cue points to arm L
$CR$ : the cue points to arm R
$RW$ : the reward is won
$NR$ : no reward is obtained

4.3.2 Decision variables

The decision variables represent what we control.

The decision or action vector is given by:

$a_{t} = (a_{t})_{a \in A}$ where
- $A = {O, C, L, R}$ , the position to move to

4.3.3 Exogenous information / Autonomous variables

The exogenous information variables, aka autonomous state represent what we did not know (when we made a decision). These are the variables that we cannot control directly. The information in these variables become available after we make the decision $a_{t}$ . For this problem the exogenous information $W_{t}$ is given by:

$w_{t} = (w_{t})_{w \in R}$ where
- $R = {RL, RR}$ , the position of the reward

4.3.4 Transition and Observation functions

After combining the state of the mouse with the exogenous information, the resultant state of the environment at time $t$ is now given by:

${\tilde{s}}_{t} = (P, R)$

where

$P = {O, C, L, R}$ : the position of the mouse
$R = {RL, RR}$ : the position of the reward, left arm or right arm

The agent is allowed two moves ( $T = 2$ ). After each move the agent observes an outcome $O \in {CL, CR, RW, NR}$ . The observation, emitted at time $t$ , by the system-under-steer (sustr), will be given by:

$y_{t} = (O)$

where

$CL$ : the cue points to arm L
$CR$ : the cue points to arm R
$RW$ : the reward is won
$NR$ : no reward is obtained

The agent/environment interaction may be expressed as:

$(y_{t}, \tilde{s_{t}}) = R_{t} ({\tilde{s}}_{t - 1}, a_{t}, w_{t})$

where

${\tilde{s}}_{t - 1}$ is the previous state
$a_{t}$ is the action
$w_{t}$ is the exogenous information / autonomous state
$y_{t}$ is the outcome or observation

Observation model

states $\tilde{s} \in (P, R)$ observations $y$	$(O, R L)$	$(O, R R)$	$(L, R L)$	$(L, R R)$	$(R, R L)$	$(R, R R)$	$(C, R L)$	$(C, R R)$
$C L$	$0.5$	$0.5$					$1$
$C R$	$0.5$	$0.5$						$1$
$R W$			$α$	$1 - α$	$1 - α$	$α$
$N R$			$1 - α$	$α$	$α$	$1 - α$

4.3.5 Objective function

The objective function is such that the Bethe free energy (BFE) or Generalized free energy (GFE) is minimized. This aspect will be handled by the RxInfer Julia package.

4.3.6 Implementation of the System-Under-Steer / Environment / Generative Process

N/A

4.4 Uncertainty Model

As noted above, the sources of uncertainty relating to the environment will be:

the accuracy of the cue in position C

4.5 Agent / Generative Model

The agent consists of:

A free energy functional $F [q] = E_{q} [\log \frac{q (z)}{p (x, z)}]$ where
- $p (x, z) = Π_{k} p (x_{k}, z_{k} ∣ z_{k - 1})$ is a generative model with:
  - observations ${x_{k}}$
  - latent variables ${z_{k}} = {{s_{k}}, {θ_{k}}, {u_{k}}}$
  - $k$ is a time index
- $q (z)$ is a recognition model
A procedure to minimize the free energy $F [q]$

4.5.1 State and Observation variables

On the agent’s side, the state at time $t$ will be given by a 1-hot encoded vector $s_{t}$ . The initial state prior is given by $p (s_{0}) = Cat (s_{0} ∣ d)$

where $d$ parameterizes the categorical distribution of $s_{0}$ .

The observation made by the mouse at time $t$ will be given by $x_{t}$ .

4.5.2 Decision variables

On the agent’s side, the action on the environment at time $t$ will be represented by a 1-hot encoded vector $u_{t}$ . The control prior is given by $p (u_{k}) = Cat (u_{k} ∣ e_{k})$

where $e_{k}$ parameterizes the categorical distribution of $u_{k}$ .

4.5.3 Goal / Target / Preference variables / Setpoints

The goal prior is given by $p^{+} (x_{k}) = Cat (x_{k} ∣ c_{k})$

where

$c_{k}$ parameterizes the categorical distribution of $x_{k}$ .

4.5.4 Transition and Observation functions

The transition function is given by

$\begin{aligned} p (s_{k} ∣ s_{k - 1}, u_{k}) & = \prod_{κ} Cat (s_{k} ∣ B_{κ} s_{k - 1})^{u_{κ k}} \\ = Cat (s_{k} ∣ B_{u_{k}} s_{k - 1}) \end{aligned}$

where

$B_{u_{k}} s_{k - 1}$ parameterizes the categorical distribution of $s_{k}$ .

The observation function is given by

$\begin{aligned} p (x_{k} ∣ s_{k}) & = Cat (x_{k} ∣ A s_{k}) \end{aligned}$

where

$A s_{k}$ parameterizes the categorical distribution of $x_{k}$ .

An entry in $A$ captures the probability of a specific observation given a specific state. Each column in $A$ contains a categorical distribution. A specific column is selected by multiplying with $s$ .

4.5.5 Implementation of the Agent / Generative Model / Internal Model

We start by specifying a probabilistic model for the agent that describes the agent’s internal beliefs over the external dynamics of the environment. Assuming the current time is $t$ and $t = 1$ ,

The generative model is defined as follows:

$\begin{array}{r} p^{'} (x, s, θ_{A}, θ_{B}, u) = \underset{\begin{array}{c} Initial \\ state \\ prior \end{array}}{\underset{⏟}{p (s_{t - 1})}} \underset{\begin{array}{c} Parameter \\ prior for \\ θ_{A} \end{array}}{\underset{⏟}{p (θ_{A})}} \underset{\begin{array}{c} Parameter \\ prior for \\ θ_{B} \end{array}}{\underset{⏟}{p (θ_{B})}} \prod_{k = t}^{t + T} \underset{\begin{array}{c} Observation \\ model \end{array}}{\underset{⏟}{p (x_{k} ∣ s_{k}, θ_{A})}} \underset{\begin{array}{c} Transition \\ model \end{array}}{\underset{⏟}{p (s_{k} ∣ s_{k - 1}, θ_{B}, u_{k}}}) \underset{\begin{array}{c} Control \\ prior \end{array}}{\underset{⏟}{p (u_{k})}} \end{array}$

The generative model includes future time steps.

Omitting parameters $θ_{A}$ and $θ_{B}$ ,

The generative model is defined as follows:

$\begin{array}{r} p^{'} (x, s, u) = \underset{\begin{array}{c} Initial \\ state \\ prior \end{array}}{\underset{⏟}{p (s_{t - 1})}} \prod_{k = t}^{t + T} \underset{\begin{array}{c} Observation \\ model \end{array}}{\underset{⏟}{p (x_{k} ∣ s_{k})}} \underset{\begin{array}{c} Transition \\ model \end{array}}{\underset{⏟}{p (s_{k} ∣ s_{k - 1}, u_{k}}}) \underset{\begin{array}{c} Control \\ prior \end{array}}{\underset{⏟}{p (u_{k})}} \end{array}$

The generative model includes future time steps.

To infer goal-driven (i.e. purposeful) behavior, we add prior beliefs $p^{+} (x)$ about desired future observations$. This leads to an extended agent model:

$\begin{aligned} p (x, s, u) & = \frac{p^{'} (x, s, u) p^{+} (x)}{\int_{x} p^{'} (x, s, u) p^{+} (x) d x} \\ \propto \underset{original generative model}{\underset{⏟}{p (s_{t - 1}) \prod_{k = t}^{t + T} p (x_{k} ∣ s_{k}) p (s_{k} ∣ s_{k - 1}, u_{k}) p (u_{k})}} \underset{\begin{array}{c} extension \\ Goal \\ prior \end{array}}{\underset{⏟}{p^{+} (x_{k})}} \\ \propto \underset{\begin{array}{c} Initial \\ state \\ prior \end{array}}{\underset{⏟}{p (s_{t - 1})}} \prod_{k = t}^{t + T} \underset{\begin{array}{c} Observation \\ model \end{array}}{\underset{⏟}{p (x_{k} ∣ s_{k})}} \underset{\begin{array}{c} Transition \\ model \end{array}}{\underset{⏟}{p (s_{k} ∣ s_{k - 1}, u_{k}}}) \underset{\begin{array}{c} Control \\ prior \end{array}}{\underset{⏟}{p (u_{k})}} \underset{\begin{array}{c} Goal \\ prior \end{array}}{\underset{⏟}{p^{+} (x_{k})}} \end{aligned}$

Next, we place 1-hot encodings on all random variables. Using regular bold font to indicate the 1-hot encodings, and also setting $t = 1$ , we have:

$\begin{aligned} p (x, s, u) & \propto \underset{\begin{array}{c} Initial \\ state \\ prior \end{array}}{\underset{⏟}{p (s_{0})}} \prod_{k = 1}^{T + 1} \underset{\begin{array}{c} Observation \\ model \end{array}}{\underset{⏟}{p (x_{k} ∣ s_{k})}} \underset{\begin{array}{c} Transition \\ model \end{array}}{\underset{⏟}{p (s_{k} ∣ s_{k - 1}, u_{k}}}) \underset{\begin{array}{c} Control \\ prior \end{array}}{\underset{⏟}{p (u_{k})}} \underset{\begin{array}{c} Goal \\ prior \end{array}}{\underset{⏟}{p^{+} (x_{k})}} \end{aligned}$

where

$\begin{aligned} p (s_{0}) & = Cat (s_{0} ∣ d) \\ p (x_{k} ∣ s_{k}) & = Cat (x_{k} ∣ A s_{k}) \\ p (s_{k} ∣ s_{k - 1}, u_{k}) & = \prod_{κ} [Cat (s_{k} ∣ B_{κ} s_{k - 1})]^{u_{κ k}} \\ = Cat (s_{k} ∣ B_{u_{k}} s_{k - 1}) \\ p (u_{k}) & = Cat (u_{k} ∣ e_{k}) \\ p^{+} (x_{k}) & = Cat (x_{k} ∣ c_{k}) \end{aligned}$

In general, if $a$ is a 1-hot encoded random variable, and has a categorical (aka multinoulli) distribution, then

$p (a ∣ ρ) = Cat (a ∣ ρ) = \prod_{i} ρ_{i}^{a_{i}}$

This means the $i$ th component of vector $a$ selects the $i$ th component of the probability vector $ρ$ of the distribution.

If the probability vector is $ρ = [\begin{matrix} 0.05 \\ 0.05 \\ 0.50 \\ 0.10 \\ 0.10 \\ 0.20 \end{matrix}]$ and the random variable $a$ is $a = [\begin{matrix} 0 \\ 0 \\ 0 \\ 1 \\ 0 \\ 0 \end{matrix}]$

then $\begin{aligned} p (a = 4 ∣ ρ) & = \prod_{i} ρ_{i}^{a_{i}} \\ = ({0.05}^{0}) ({0.05}^{0}) ({0.50}^{0}) ({0.10}^{1}) ({0.10}^{0}) ({0.20}^{0}) \\ = 0.10 \end{aligned}$

Similarly, the $u_{κ k}$ picks the $κ$ th entry of $u_{k}$ .

4.5.5.1 Generative Model for the T-maze

The next figure is a representation of the Constrained Forney-style Factor Graph (CFFG) of the problem:

The red boxes indicate the dimensions of the vectors and matrices. Transitions are indicated by $T$ and Transition-Mixtures by $T M$ . The dashed box represents the Goal-Observation submodel. Next, we would like to define the generative model for the T-maze agent in RxInfer. However, we first need code for the Transition-Mixture as well as the Goal-Observation submodel.

4.5.3.2 Transition-Mixture

## include("transition_mixture/transition_mixture_ANNO^v1.jl")
## ContingencyTensor is defined here
import Distributions: mean, entropy
import StatsBase: xlogx ## This makes entropy calculations consistent with Distributions.jl

const Tensorvariate = ArrayLikeVariate{3}
const DiscreteTensorvariateDistribution   = Distribution{Tensorvariate,  Discrete}

struct ContingencyTensor{T<: Real, P <: AbstractArray{T}} <: DiscreteTensorvariateDistribution
    p::P
end

## Only use normalised tensors for now! Or baby dies....
Distributions.mean(dist::ContingencyTensor) = dist.p

## Clamplog means
mean(
    ::typeof(ReactiveMP.clamplog), 
    dist::MatrixDirichlet) = digamma.(ReactiveMP.clamplog.(dist.a)) .- digamma.(sum(ReactiveMP.clamplog.(dist.a)); dims = 1
)

Distributions.entropy(dist::ContingencyTensor) = -sum(xlogx.(dist.p))

struct TransitionMixture end

@node TransitionMixture Stochastic [out, in, s, B1, B2, B3, B4,] #.

@average_energy TransitionMixture (
  q_out_in_s::ContingencyTensor, #. 
  q_B1::PointMass, 
  q_B2::PointMass, 
  q_B3::PointMass, 
  q_B4::PointMass) = begin
    ## Need to make this generic
    log_A_bar = [
        mean(ReactiveMP.clamplog, q_B1);;; 
        mean(ReactiveMP.clamplog, q_B2);;; 
        mean(ReactiveMP.clamplog, q_B3);;; 
        mean(ReactiveMP.clamplog, q_B4)]
    B = mean(q_out_in_s) #.
    sum(-tr.(transpose.(eachslice(B, dims=3)) .* eachslice(log_A_bar, dims=3)))
end

## Used when input state is clamped
@average_energy TransitionMixture ( #.
  q_out_s::ContingencyTensor, #.
  q_in::PointMass, 
  q_B1::PointMass, 
  q_B2::PointMass, 
  q_B3::PointMass, 
  q_B4::PointMass) = begin
    ## Need to make this generic
    log_A_bar = [
        mean(ReactiveMP.clamplog,q_B1);;; 
        mean(ReactiveMP.clamplog,q_B2);;; 
        mean(ReactiveMP.clamplog,q_B3);;; 
        mean(ReactiveMP.clamplog,q_B4)]
    B = mean(q_out_s) #.
    sum(-tr.(transpose.(eachslice(B, dims=3)) .* eachslice(log_A_bar, dims=3)))
end

4.5.3.3 Marginals

## include("transition_mixture/marginals_ANNO^v1.jl")
##@marginalrule TransitionMixture(:out_in_z) (m_out::DiscreteNonParametric, m_z::DiscreteNonParametric, m_in::DiscreteNonParametric, q_B1::PointMass, q_B2::PointMass, q_B3::PointMass, q_B4::PointMass, ) = begin
@marginalrule TransitionMixture(:out_in_s) (
  m_out::DiscreteNonParametric, 
  m_in::DiscreteNonParametric, 
  m_s::DiscreteNonParametric, #.
  q_B1::PointMass, 
  q_B2::PointMass, 
  q_B3::PointMass, 
  q_B4::PointMass,) = begin
      μ_out = probvec(m_out)
      μ_in = probvec(m_in)
      μ_s = probvec(m_s) #.
  
      ## Need to make this generic
      A_tilde = [ #.
          mean(ReactiveMP.clamplog, q_B1);;; 
          mean(ReactiveMP.clamplog, q_B2);;; 
          mean(ReactiveMP.clamplog, q_B3);;; 
          mean(ReactiveMP.clamplog, q_B4)]
  
      B = cat(map(x -> x*μ_out*μ_in', μ_s)..., dims=3).*A_tilde #.
      return ContingencyTensor(B ./ sum(B))
  end
      
@marginalrule TransitionMixture(:out_s) (
  m_out::DiscreteNonParametric, 
  m_s::DiscreteNonParametric, #.
  q_in::PointMass, 
  q_B1::PointMass, 
  q_B2::PointMass, 
  q_B3::PointMass, 
  q_B4::PointMass,) = begin
    @call_marginalrule TransitionMixture(:out_in_s) ( #.  
        m_out = m_out, 
        # m_z = m_z, 
        m_s = m_s, #.
        m_in = Categorical(mean(q_in)), 
        q_B1 = q_B1, 
        q_B2 = q_B2, 
        q_B3 = q_B3, 
        q_B4 = q_B4,)
end

4.5.3.4 In

## include("transition_mixture/in_ANNO^v1.jl")
import Base.Iterators.repeated

##@rule TransitionMixture{N}(:in, Marginalisation) (q_z::Categorical,q_out::Categorical,q_B::ManyOf{Any,N}) where {N} = begin
##@rule TransitionMixture(:in, Marginalisation) (m_z::Categorical,m_out::Categorical,q_B1::Union{MatrixDirichlet,PointMass}, q_B2::Union{MatrixDirichlet,PointMass}, q_B3::Union{MatrixDirichlet,PointMass}, q_B4::Union{MatrixDirichlet,PointMass} ) = begin
@rule TransitionMixture(:in, Marginalisation) (
  m_out::DiscreteNonParametric, 
  m_s::DiscreteNonParametric, #.
  q_B1::PointMass, 
  q_B2::PointMass, 
  q_B3::PointMass, 
  q_B4::PointMass,) = begin
    πs = probvec(m_s) ## Weights #.
    ##q_Bs = mean.(q_B) ## Transition matrices
    q_Bs = [mean(q_B1),mean(q_B2),mean(q_B3),mean(q_B4)] ## Transition matrices
    outp = probvec(m_out) ## Output

    p = mapreduce(x -> x[1]*x[2]'*x[3], +, zip(πs, q_Bs, repeated(outp)))
    ##p = map(x -> x[1] * x[2]' * x[3], zip(πs, q_Bs,repeated(outp)))
    return Categorical(p ./ sum(p))
end

4.5.3.5 Out

## include("transition_mixture/out_ANNO^v1.jl")
import Base.Iterators.repeated

##@rule TransitionMixture{N}(:out, Marginalisation) (q_z::Categorical,q_in::Categorical,q_B::ManyOf{Any}) where {N} = begin
@rule TransitionMixture(:out, Marginalisation) (
  m_in::Union{DiscreteNonParametric, PointMass}, 
  m_s::DiscreteNonParametric, #.
  q_B1::PointMass, 
  q_B2::PointMass, 
  q_B3::PointMass, 
  q_B4::PointMass,) = begin
    πs = probvec(m_s) ## Weights #.
    ##q_Bs = mean.(q_B) ## Transition matrices
    q_Bs = [mean(q_B1), mean(q_B2), mean(q_B3), mean(q_B4)] ## Transition matrices
    inp = probvec(m_in) ## Input

    W = mapreduce(x -> x[1]*x[2]*x[3], +, zip(πs, q_Bs,repeated(inp)))
    return Categorical(W ./ sum(W))
end

## Used when the input is fixed
@rule TransitionMixture(:out, Marginalisation) (
  q_in::PointMass, 
  m_s::DiscreteNonParametric, #.
  q_B1::PointMass, 
  q_B2::PointMass, 
  q_B3::PointMass, 
  q_B4::PointMass,) = begin
    πs = probvec(m_s) ## Weights #.
    q_Bs = [mean(q_B1), mean(q_B2), mean(q_B3), mean(q_B4)] ## Transition matrices
    inp = probvec(q_in) ## Input

    W = mapreduce(x -> x[1]*x[2]*x[3], +, zip(πs, q_Bs, repeated(inp)))
    return Categorical(W ./ sum(W))
end

4.5.3.6 Switch

## include("transition_mixture/switch_ANNO^v1.jl")
import Base.Iterators.repeated

##@rule TransitionMixture{N}(:switch, Marginalisation) (m_in::Categorical,m_out::Categorical,q_B::ManyOf{Any,N}) where {N} = begin
##@rule TransitionMixture(:z, Marginalisation) (m_in::Categorical,m_out::Categorical,q_B1::Union{MatrixDirichlet,PointMass},q_B2::Union{MatrixDirichlet,PointMass}, q_B3::Union{MatrixDirichlet,PointMass}, q_B4::Union{MatrixDirichlet,PointMass} ) = begin
@rule TransitionMixture(:s, Marginalisation) ( #.  
  m_out::DiscreteNonParametric, 
  m_in::DiscreteNonParametric, 
  q_B1::PointMass, 
  q_B2::PointMass, 
  q_B3::PointMass, 
  q_B4::PointMass,) = begin
    ##q_Bs = mean.(q_B) ## Transition matrices
    q_Bs = [mean(q_B1), mean(q_B2), mean(q_B3), mean(q_B4)] ## Transition matrices
    outp = probvec(m_out) ## Output
    inp = probvec(m_in) ## input

    ##p = mapreduce(x -> x[1]' * x[2] * x[3], +, zip(repeated(outp), q_Bs, repeated(inp)))
    p = map(x -> x[1]'*x[2]*x[3], zip(repeated(outp), q_Bs, repeated(inp)))
    return Categorical(p ./ sum(p))
end

## Used when the initial state is fixed
@rule TransitionMixture(:s, Marginalisation) ( #.
  m_out::DiscreteNonParametric, 
  q_in::PointMass, 
  q_B1::PointMass, 
  q_B2::PointMass, 
  q_B3::PointMass, 
  q_B4::PointMass,) = begin
    q_Bs = [mean(q_B1), mean(q_B2), mean(q_B3), mean(q_B4)] ## Transition matrices
    outp = probvec(m_out) ## Output
    inp = probvec(q_in) ## input

    ##p = mapreduce(x -> x[1]' * x[2] * x[3], +, zip(repeated(outp), q_Bs, repeated(inp)))
    p = map(x -> x[1]'*x[2]*x[3], zip(repeated(outp), q_Bs, repeated(inp)))
    return Categorical(p ./ sum(p))
end

4.5.3.7 Distributions

## include("distributions_ANNO^v1.jl")
using DomainSets: Domain
using StatsFuns: gammainvcdf, loggamma
using ReactiveMP: AbstractContinuousGenericLogPdf, GenericLogPdfVectorisedProduct, UnspecifiedDomain, approximate_prod_with_sample_list
using RxInfer: AutoProposal, SampleListFormConstraint
using Random

import Base: prod, rand, eltype, size
import Distributions: logpdf, mean
import Random: rand, rand!
import ReactiveMP: getdomain, getlogpdf
import RxInfer: __approximate

h(A) = -diag(A'*safelog.(A))

mean_h(d::PointMass) = (d.point, h(d.point))

#------------------------------
# ContinuousMatrixvariateLogPdf
#------------------------------

struct ContinuousMatrixvariateLogPdf{D <: Domain, F} <: AbstractContinuousGenericLogPdf
    domain::D
    logpdf::F
end

ContinuousMatrixvariateLogPdf(f::Function) = ContinuousMatrixvariateLogPdf(UnspecifiedDomain(), f)

getdomain(d::ContinuousMatrixvariateLogPdf) = d.domain
getlogpdf(d::ContinuousMatrixvariateLogPdf) = d.logpdf

#-----------
# SampleList
#-----------

function __approximate(constraint::SampleListFormConstraint{N, R, S, M}, left::ContinuousMatrixvariateLogPdf, right) where {N, R, S <: AutoProposal, M}
    return approximate_prod_with_sample_list(constraint.rng, constraint.method, right, left, N)
end

function __approximate(constraint::SampleListFormConstraint{N, R, S, M}, left, right::ContinuousMatrixvariateLogPdf) where {N, R, S <: AutoProposal, M}
    return approximate_prod_with_sample_list(constraint.rng, constraint.method, left, right, N)
end

function __approximate(constraint::SampleListFormConstraint{N, R, S, M}, left::GenericLogPdfVectorisedProduct, right) where {N, R, S <: AutoProposal, M}
    return approximate_prod_with_sample_list(constraint.rng, constraint.method, right, left, N)
end

function __approximate(constraint::SampleListFormConstraint{N, R, S, M}, left, right::GenericLogPdfVectorisedProduct) where {N, R, S <: AutoProposal, M}
    return approximate_prod_with_sample_list(constraint.rng, constraint.method, left, right, N)
end

## These are hacks to make _rand! work with matrix variate logpfds
eltype(::GenericLogPdfVectorisedProduct) = Float64
eltype(::ContinuousMatrixvariateLogPdf) = Float64

function mean_h(d::SampleList)
    samples = get_samples(d) #.
    weights = get_weights(d) #.
    return (sum(samples.*weights), sum(h.(samples).*weights))
end

#----------------
# MatrixDirichlet
#----------------

size(d::MatrixDirichlet) = size(d.a)

function logpdf(d::MatrixDirichlet, x::AbstractMatrix)
    return sum(sum((d.a.-1).*log.(x),dims=1) - sum(loggamma.(d.a), dims=1) + loggamma.(sum(d.a,dims=1)))
end

## Average energy definition for SampleList marginal
@average_energy MatrixDirichlet (q_out::SampleList, q_a::PointMass) = begin
    H = mapreduce(+, zip(eachcol(mean(q_a)), eachcol(mean(log, q_out)))) do (q_a_column, logmean_q_out_column)
        return -loggamma(sum(q_a_column)) + sum(loggamma.(q_a_column)) - sum((q_a_column .- 1.0) .* logmean_q_out_column)
    end
    return H
end

## In-place operations for sampling
function rand!(rng::AbstractRNG, d::MatrixDirichlet, container::Array{Float64, 3})
    ## s = size(d) #.not used?
    for i in 1:size(container, 3)
        M = view(container, :, :, i)
        sample = rand(rng, d)
        copyto!(M, sample)
    end
    return container
end

## Custom sampling implementation
function rand(rng::AbstractRNG, d::MatrixDirichlet)
    U = rand(rng, size(d.a)...)
    S = gammainvcdf.(d.a, 1.0, U)
    return S./sum(S, dims=1) ## Normalize columns
end

function mean_h(d::MatrixDirichlet)
    n_samples = 20 ## Fixed number of samples
    sample = [rand(d) for i=1:n_samples] #.
    return (sum(sample)./n_samples, sum(h.(sample))./n_samples) #.
end

mean_h (generic function with 3 methods)

4.5.3.8 Goal-Observation submodel

## include("../goal_observation_ANNO^v1.jl")
using ForwardDiff: jacobian
using TupleTools: deleteat
using ReactiveMP: AbstractNodeFunctionalDependenciesPipeline, RequireMarginalFunctionalDependencies, messagein, setmessage!, get_samples, get_weights
import ReactiveMP: message_dependencies, marginal_dependencies

## include("distributions_ANNO^v1.jl")

struct GoalObservation end

@node GoalObservation Stochastic [c, s, A] #.

##----------
## Modifiers
##----------

## Metas
struct BetheMeta{P} ## Meta parameterized by x type for rule overloading
    x::P ## Pointmass value for observation
end
BetheMeta() = BetheMeta(missing) ## Absent observation

struct GeneralizedMeta{P}
    x::P ## Pointmass value for observation
    newton_iterations::Int64
end
GeneralizedMeta() = GeneralizedMeta(missing, 20)
GeneralizedMeta(point) = GeneralizedMeta(point, 20)

## Pipelines
struct BethePipeline <: AbstractNodeFunctionalDependenciesPipeline end
struct GeneralizedPipeline <: AbstractNodeFunctionalDependenciesPipeline
    init_message::Categorical
end

function message_dependencies(::BethePipeline, nodeinterfaces, nodelocalmarginals, varcluster, cindex, iindex)
    return ()
end

## Bethe update rules for goal-observation node require marginals on all edges
function marginal_dependencies(::BethePipeline, nodeinterfaces, nodelocalmarginals, varcluster, cindex, iindex)
    return nodelocalmarginals
end

## Generalized update rule for state requires inbound message
function message_dependencies(pipeline::GeneralizedPipeline, nodeinterfaces, nodelocalmarginals, varcluster, cindex, iindex)
    if iindex === 2 ## Message towards state
        input = ReactiveMP.messagein(nodeinterfaces[iindex])
        ReactiveMP.setmessage!(input, pipeline.init_message) ## Predefine breaker message
        return (nodeinterfaces[iindex],) ## Include inbound message on state
    else
        return ()
    end
end

## Generalized update rule for state requires inbound marginal
function marginal_dependencies(::GeneralizedPipeline, nodeinterfaces, nodelocalmarginals, varcluster, cindex, iindex)
    if (iindex === 2) || (iindex === 3) ## Message towards state or parameter
        return nodelocalmarginals ## Include all marginals
    else
        return deleteat(nodelocalmarginals, cindex) ## Include default marginals
    end
end

##------------------------------
## Unobserved Bethe Update Rules
##------------------------------

@rule GoalObservation(:c, Marginalisation) (
  q_c::Union{Dirichlet, PointMass},
  q_s::Categorical, #.
  q_A::Union{SampleList, MatrixDirichlet, PointMass}, 
  meta::BetheMeta{Missing}) = begin
    log_c = mean(log, q_c)
    s = probvec(q_s) #.
    log_A = mean(log, q_A)

    ## Compute internal marginal
    x = softmax(log_A*s + log_c) #.

    return Dirichlet(x .+ 1)
end

@rule GoalObservation(:s, Marginalisation) ( #.
  q_c::Union{Dirichlet, PointMass},
  q_s::Categorical, 
  q_A::Union{SampleList, MatrixDirichlet, PointMass}, 
  meta::BetheMeta{Missing}) = begin
    log_c = mean(log, q_c)
    s = probvec(q_s) #.
    log_A = mean(log, q_A)

    ## Compute internal marginal
    x = softmax(log_A*s + log_c) #.

    return Categorical(softmax(log_A'*x))
end

@rule GoalObservation(:A, Marginalisation) (
  q_c::Union{Dirichlet, PointMass},
  q_s::Categorical,  #.
  q_A::Union{SampleList, MatrixDirichlet, PointMass}, 
  meta::BetheMeta{Missing}) = begin
    log_c = mean(log, q_c)
    s = probvec(q_s) #.
    log_A = mean(log, q_A)

    ## Compute internal marginal
    x = softmax(log_A*s + log_c) #.

    return MatrixDirichlet(x*s' .+ 1) #.
end

@average_energy GoalObservation (
  q_c::Union{Dirichlet, PointMass}, 
  q_s::Categorical, #.
  q_A::Union{SampleList, MatrixDirichlet, PointMass}, 
  meta::BetheMeta{Missing}) = begin
    log_c = mean(log, q_c)
    s = probvec(q_s) #.
    log_A = mean(log, q_A)

    ## Compute internal marginal
    x = softmax(log_A*s + log_c) #.

    return -x'*(log_A*s + log_c - safelog.(x)) #.
end

##----------------------------
## Observed Bethe Update Rules
##----------------------------

@rule GoalObservation(:c, Marginalisation) (
  q_c::Union{Dirichlet, PointMass}, ## Unused
  q_s::Categorical, #.
  q_A::Union{SampleList, MatrixDirichlet, PointMass}, 
  meta::BetheMeta{<:AbstractVector}) = begin
    return Dirichlet(meta.x .+ 1)
end

@rule GoalObservation(:s, Marginalisation) ( #.  
  q_c::Union{Dirichlet, PointMass},
  q_s::Categorical, ## Unused #.
  q_A::Union{SampleList, MatrixDirichlet, PointMass}, 
  meta::BetheMeta{<:AbstractVector}) = begin
    log_A = mean(log, q_A)

    return Categorical(softmax(log_A'*meta.x))
end

@rule GoalObservation(:A, Marginalisation) (
  q_c::Union{Dirichlet, PointMass},
  q_s::Categorical, #.
  q_A::Union{SampleList, MatrixDirichlet, PointMass}, ## Unused
  meta::BetheMeta{<:AbstractVector}) = begin
    s = probvec(q_s) #.

    return MatrixDirichlet(meta.x*s' .+ 1) #.
end

@average_energy GoalObservation (
  q_c::Union{Dirichlet, PointMass}, 
  q_s::Categorical, #.
  q_A::Union{SampleList, MatrixDirichlet, PointMass}, 
  meta::BetheMeta{<:AbstractVector}) = begin
    log_c = mean(log, q_c)
    s = probvec(q_s) #.
    log_A = mean(log, q_A)

    return -meta.x'*(log_A*s + log_c) #.
  end

##------------------------------------
## Unobserved Generalized Update Rules
##------------------------------------

@rule GoalObservation(:c, Marginalisation) (
  q_s::Categorical, #.
  q_A::Union{SampleList, MatrixDirichlet, PointMass}, 
  meta::GeneralizedMeta{Missing}) = begin
    s = probvec(q_s)
    A = mean(q_A)

    return Dirichlet(A*s .+ 1) #.
end

@rule GoalObservation(:s, Marginalisation) ( #.
  m_s::Categorical, #.
  q_c::Union{Dirichlet, PointMass},
  q_s::Categorical, #.
  q_A::Union{SampleList, MatrixDirichlet, PointMass}, 
  meta::GeneralizedMeta{Missing}) = begin
    d = probvec(m_s) #.
    log_c = mean(log, q_c)
    s_0 = probvec(q_s) #.
    (A, h_A) = mean_h(q_A)

    ## Root-finding problem for marginal statistics
    g(s) = s - softmax(-h_A + A'*log_c - A'*safelog.(A*s) + safelog.(d)) #.

    s_k = deepcopy(s_0) #.
    for k=1:meta.newton_iterations
        s_k = s_k - inv(jacobian(g, s_k))*g(s_k) ## Newton step for multivariate root finding #.
    end

    ## Compute outbound message statistics
    rho = softmax(safelog.(s_k) - log.(d .+ 1e-6)) #.

    return Categorical(rho)
end

@rule GoalObservation(:A, Marginalisation) (
  q_c::Union{Dirichlet, PointMass},
  q_s::Categorical, 
  q_A::Union{SampleList, MatrixDirichlet, PointMass},
  meta::GeneralizedMeta{Missing}) = begin
    log_c = mean(log, q_c)
    s = probvec(q_s) #.
    A_bar = mean(q_A)                                            

    log_mu(A) = (A*s)'*(log_c - safelog.(A_bar*s)) - s'*h(A) #.

    return ContinuousMatrixvariateLogPdf(log_mu)
end

@average_energy GoalObservation (
  q_c::Union{Dirichlet, PointMass}, 
  q_s::Categorical, #.
  q_A::Union{SampleList, MatrixDirichlet, PointMass}, 
  meta::GeneralizedMeta{Missing}) = begin
    log_c = mean(log, q_c)
    s = probvec(q_s) #.
    (A, h_A) = mean_h(q_A)

    return s'*h_A - (A*s)'*(log_c - safelog.(A*s)) #.
end

##----------------------------------
## Observed Generalized Update Rules
##----------------------------------

@rule GoalObservation(:c, Marginalisation) (
  q_s::Categorical, ## Unused #.
  q_A::Union{SampleList, MatrixDirichlet, PointMass}, ## Unused
  meta::GeneralizedMeta{<:AbstractVector}) = begin
    return Dirichlet(meta.x .+ 1)
end

@rule GoalObservation(:s, Marginalisation) ( #.
  m_s::Categorical, ## Unused #.
  q_c::Union{Dirichlet, PointMass}, ## Unused
  q_s::Categorical, ## Unused #.
  q_A::Union{SampleList, MatrixDirichlet, PointMass}, 
  meta::GeneralizedMeta{<:AbstractVector}) = begin
    log_A = mean(log, q_A)

    return Categorical(softmax(log_A'*meta.x))
end

@rule GoalObservation(:A, Marginalisation) (
  q_c::Union{Dirichlet, PointMass}, # Unused
  q_s::Categorical, #.
  q_A::Union{SampleList, MatrixDirichlet, PointMass}, ## Unused
  meta::GeneralizedMeta{<:AbstractVector}) = begin
    s = probvec(q_s) #.

    return MatrixDirichlet(meta.x*s' .+ 1) #.
end

@average_energy GoalObservation (
  q_c::Union{Dirichlet, PointMass}, 
  q_s::Categorical, #.
  q_A::Union{SampleList, MatrixDirichlet, PointMass}, 
  meta::GeneralizedMeta{<:AbstractVector}) = begin
    log_c = mean(log, q_c)
    s = probvec(q_s) #.
    log_A = mean(log, q_A)

    return -meta.x'*(log_A*s + log_c) #.
end

4.5.3.9 Generative model for the T-maze

First, we define some helper functions:

## include("helpers_ANNO^v1.jl")
using ReactiveMP
import LinearAlgebra: I

function softmax(x::Vector)
    r = x .- maximum(x)
    clamp!(r, -100, 0.0)
    exp.(r) ./ sum(exp.(r))
end

## Alias for safe logarithm
const safelog = ReactiveMP.clamplog

## Kronecker product
## https://www.statlect.com/matrix-algebra/Kronecker-product#:~:text=The%20Kronecker%20product%20is%20an,linear%20algebra%20and%20its%20applications.
## https://en.wikipedia.org/wiki/Kronecker_product
## https://www.math.uwaterloo.ca/~hwolkowi/henry/reports/kronthesisschaecke04.pdf
function constructABCD(α::Float64, Cs, T)
    ## Observation model
    A_1 = [0.5 0.5;
           0.5 0.5;
           0.0 0.0;
           0.0 0.0]
    A_2 = [0.0 0.0;
           0.0 0.0;
           α   1-α;
           1-α α  ]
    A_3 = [0.0 0.0;
           0.0 0.0;
           1-α α  ;
           α   1-α]
    A_4 = [1.0 0.0;
           0.0 1.0;
           0.0 0.0;
           0.0 0.0]
    A = zeros(16, 8)
    A[1:4, 1:2]   = A_1
    A[5:8, 3:4]   = A_2
    A[9:12, 5:6]  = A_3
    A[13:16, 7:8] = A_4
    ## 0's violate the domain of the Dirichlet distribution and breaks FE calculation
    A .+= tiny

    ## Transition model
    B_1 = kron([1 1 1 1; ## Row: can I move to 1?
                0 0 0 0;
                0 0 0 0;
                0 0 0 0], I(2))
    B_2 = kron([0 1 1 0;
                1 0 0 1; ## Row: can I move to 2?
                0 0 0 0;
                0 0 0 0], I(2))
    B_3 = kron([0 1 1 0;
                0 0 0 0;
                1 0 0 1; ## Row: can I move to 3?
                0 0 0 0], I(2))
    B_4 = kron([0 1 1 0;
                0 0 0 0;
                0 0 0 0;
                1 0 0 1], I(2)) ## Row: can I move to 4?
    B = [B_1, B_2, B_3, B_4]

    C = [softmax(kron(ones(4), [0.0, 0.0, c, -c])) for c in Cs] ## Goal prior

    D = kron([1.0, 0.0, 0.0, 0.0], [0.5, 0.5]) ## Initial state prior

    return (A, B, C, D)
end

constructABCD (generic function with 1 method)

The following diagram shows the numerical values associated with the categorical states of the agent’s position:

## Create the model
@model function t_maze(A, d, B₁, B₂, B₃, B₄, T)
    u = randomvar(T) #.
    s = randomvar(T) ## Latent states #.
    c = datavar(Vector{Float64}, T) ## Goal prior

    s₀ ~ Categorical(d) ## State prior #.

    sₜ₋₁ = s₀ #.
    for t in 1:T ##. T=2; assume current time t=0
        u[t] ~ Categorical(fill(1./4., 4)) #.
        s[t] ~ TransitionMixture(sₜ₋₁, u[t], B₁, B₂, B₃, B₄) #.
        c[t] ~ GoalObservation(s[t], A) where {
            pipeline = GeneralizedPipeline(vague(Categorical, 8))}
        sₜ₋₁ = s[t] #.
    end
# end;
end

## Pointmass constraints
@constraints function pointmass_q()
    ## q(switch) :: PointMass
    q(u) :: PointMass #.
end

## Node constraints
@meta function t_maze_meta()
    GoalObservation(c, s) -> GeneralizedMeta() #.
end

t_maze_meta (generic function with 1 method)

## NOT USED IN THIS NOTEBOOK
# ## We need to make pointmass constraints for discrete vars by hand
# import RxInfer.default_point_mass_form_constraint_optimizer
# import RxInfer.PointMassFormConstraint

# function default_point_mass_form_constraint_optimizer(
#   ::Type{Univariate},
#   ::Type{Discrete},
#   constraint::PointMassFormConstraint,
#   distribution)
#     out = zeros(length(probvec(distribution)))
#     out[argmax(probvec(distribution))] = 1.
#     PointMass(out)
# end

4.6 Agent Evaluation

4.6.1 Evaluate with simulated data

## Configure experiment
_T = 2; ## Planning horizon
_α = 0.9; _cᵁᵗⁱˡ = 2.0 ##. Reward probability and utility
_its = 10; ## Number of inference iterations to run
_initmarginals = ( s=[Categorical(fill(1./8., 8)) for t in 1:_T], ) ## Initial marginals #.
_A, _B, _c, _d = constructABCD(_α, [_cᵁᵗⁱˡ for t in 1:_T], _T); ## Generate the matrices we need

_A ##.

16×8 Matrix{Float64}:
 0.5      0.5      1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12
 0.5      0.5      1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12
 1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12
 1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12
 1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12
 1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12
 1.0e-12  1.0e-12  0.9      0.1      1.0e-12  1.0e-12  1.0e-12  1.0e-12
 1.0e-12  1.0e-12  0.1      0.9      1.0e-12  1.0e-12  1.0e-12  1.0e-12
 1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12
 1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12
 1.0e-12  1.0e-12  1.0e-12  1.0e-12  0.1      0.9      1.0e-12  1.0e-12
 1.0e-12  1.0e-12  1.0e-12  1.0e-12  0.9      0.1      1.0e-12  1.0e-12
 1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0      1.0e-12
 1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0
 1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12
 1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12  1.0e-12

_B ##.

4-element Vector{Matrix{Int64}}:
 [1 0 … 1 0; 0 1 … 0 1; … ; 0 0 … 0 0; 0 0 … 0 0]
 [0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0]
 [0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0]
 [0 0 … 0 0; 0 0 … 0 0; … ; 1 0 … 1 0; 0 1 … 0 1]

_B[1] ##.

8×8 Matrix{Int64}:
 1  0  1  0  1  0  1  0
 0  1  0  1  0  1  0  1
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0

_B[2] ##.

8×8 Matrix{Int64}:
 0  0  1  0  1  0  0  0
 0  0  0  1  0  1  0  0
 1  0  0  0  0  0  1  0
 0  1  0  0  0  0  0  1
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0

_B[3] ##.

8×8 Matrix{Int64}:
 0  0  1  0  1  0  0  0
 0  0  0  1  0  1  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 1  0  0  0  0  0  1  0
 0  1  0  0  0  0  0  1
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0

_B[4] ##.

8×8 Matrix{Int64}:
 0  0  1  0  1  0  0  0
 0  0  0  1  0  1  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 1  0  0  0  0  0  1  0
 0  1  0  0  0  0  0  1

_c ##. a 16x1 probvec for each timestep
## size(_c)

2-element Vector{Vector{Float64}}:
 [0.02624839635087663, 0.02624839635087663, 0.19395087314359397, 0.0035523341546527592, 0.02624839635087663, 0.02624839635087663, 0.19395087314359397, 0.0035523341546527592, 0.02624839635087663, 0.02624839635087663, 0.19395087314359397, 0.0035523341546527592, 0.02624839635087663, 0.02624839635087663, 0.19395087314359397, 0.0035523341546527592]
 [0.02624839635087663, 0.02624839635087663, 0.19395087314359397, 0.0035523341546527592, 0.02624839635087663, 0.02624839635087663, 0.19395087314359397, 0.0035523341546527592, 0.02624839635087663, 0.02624839635087663, 0.19395087314359397, 0.0035523341546527592, 0.02624839635087663, 0.02624839635087663, 0.19395087314359397, 0.0035523341546527592]

_c[1]

16-element Vector{Float64}:
 0.02624839635087663
 0.02624839635087663
 0.19395087314359397
 0.0035523341546527592
 0.02624839635087663
 0.02624839635087663
 0.19395087314359397
 0.0035523341546527592
 0.02624839635087663
 0.02624839635087663
 0.19395087314359397
 0.0035523341546527592
 0.02624839635087663
 0.02624839635087663
 0.19395087314359397
 0.0035523341546527592

_d ##.

8-element Vector{Float64}:
 0.5
 0.5
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

## Run inference
_result = inference(
    model=         t_maze(_A, _d, _B[1], _B[2], _B[3], _B[4], _T),
    data=          (c=_c,),
    initmarginals= _initmarginals,
    meta=          t_maze_meta(),
    iterations=    _its,
)

Inference results:
  Posteriors       | available for (s, s₀, u)

## Inspect results
println("Posterior s₀, ", probvec.(_result.posteriors[:s₀][end]), "\n") #.

println("Posterior s as t=1, ", probvec.(_result.posteriors[:s][end][1])) #.
println("Posterior s as t=2, ", probvec.(_result.posteriors[:s][end][2]), "\n") #.

println("Posterior u as t=1, ", probvec.(_result.posteriors[:u][end][1])) #.
println("Posterior u as t=2, ", probvec.(_result.posteriors[:u][end][2])) #.

Posterior s₀, [0.49999999997599515, 0.4999999999759952, 8.001571707637846e-12, 8.001571707637846e-12, 8.001571707637846e-12, 8.001571707637846e-12, 8.001571707637846e-12, 8.001571707637846e-12]

Posterior s as t=1, [0.12504911587568743, 0.12504911587568743, 0.17545999968929593, 0.02264472880305665, 0.02264472880305665, 0.17545999968929593, 0.1768461556319599, 0.17684615563195993]
Posterior s as t=2, [0.21451619046215178, 0.21451619046215178, 0.17853389987901877, 0.023041443930652612, 0.023041443930652605, 0.1785338998790188, 0.08390846572817681, 0.08390846572817681]

Posterior u as t=1, [0.2500982317513749, 0.1981047284923526, 0.1981047284923526, 0.3536923112639199]
Posterior u as t=2, [0.13187528818577468, 0.3006277080558477, 0.30062770805584776, 0.26686929570253]

## Repeat experiments with pointmass constraints
_result = inference(
    model=         t_maze(_A, _d, _B[1], _B[2], _B[3], _B[4], _T), #.
    data=          (c=_c,),
    initmarginals= _initmarginals,
    meta=          t_maze_meta(),
    constraints=   pointmass_q(),
    iterations=    _its,
)

Inference results:
  Posteriors       | available for (s, s₀, u)

## Inspect results
println("Posterior s₀, ", probvec.(_result.posteriors[:s₀][end]), "\n") #.

println("Posterior s as t=1, ", probvec.(_result.posteriors[:s][end][1])) #.
println("Posterior s as t=2, ", probvec.(_result.posteriors[:s][end][2]), "\n") #.

println("Posterior u as t=1, ", probvec(_result.posteriors[:u][end][1])) ##. no dot after probvec!
println("Posterior u as t=2, ", probvec(_result.posteriors[:u][end][2])) ##. no dot after probvec!

Posterior s₀, [0.49999999997599515, 0.4999999999759952, 8.001571707637846e-12, 8.001571707637846e-12, 8.001571707637846e-12, 8.001571707637846e-12, 8.001571707637846e-12, 8.001571707637846e-12]

Posterior s as t=1, [0.12504911587568743, 0.12504911587568743, 0.17545999968929593, 0.02264472880305665, 0.02264472880305665, 0.17545999968929593, 0.1768461556319599, 0.17684615563195993]
Posterior s as t=2, [0.21451619046215178, 0.21451619046215178, 0.17853389987901877, 0.023041443930652612, 0.023041443930652605, 0.1785338998790188, 0.08390846572817681, 0.08390846572817681]

Posterior u as t=1, [0.0, 0.0, 0.0, 1.0]
Posterior u as t=2, [0.0, 0.0, 1.0, 0.0]