Back to Portfolio of Projects | LearnableLoopAI.com | Blog |

Watching Paint Dry with Active Inference

1 BUSINESS UNDERSTANDING

The client that approached us for a solution is in the spray-painting business. The objects to spray-paint are of various kinds. In addition, various kinds of paint are used which require different drying rates. To enable a suitable drying condition for a specific object surface and paint type combination, the client will provide a specific looked-up drying temperature.

The client has a heat source at the center of the drying space and needs optimal guidance on how far away from the heat source each spray-painted object needs to be placed to comply with the looked-up drying temperature. Objects that need to be spray-painted are placed on a radio-controlled cart with a protected heat sensor. Once spray-painting is completed, a remote controller (containing the in-silico agent) are called upon to position the cart at the optimal distance from the heat source so that the drying paint are placed at the specified temperature.

We simulate an agent that can relocate the cart in a temperature gradient field. The agent aims to position the cart at a desired temperature relative to the heat source. Our simulation setup is an adaptation of the setup described by Buckley et al. (2017).

versioninfo() ##. Julia version

Julia Version 1.10.4
Commit 48d4fd48430 (2024-06-04 10:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_NUM_THREADS =

# import Pkg; Pkg.activate(".."); Pkg.instantiate();
import Pkg
# Pkg.add(Pkg.PackageSpec(;name="RxInfer", version="3.0.0"))
Pkg.add(Pkg.PackageSpec(;name="RxInfer", version="3.6.0"))
Pkg.add(Pkg.PackageSpec(;name="Plots"))
Pkg.add(Pkg.PackageSpec(;name="LaTeXStrings"))

## Pkg.resolve() #.
using RxInfer, Plots
using Random; Random.seed!(51233) # Set random seed
using LaTeXStrings

import RxInfer.ReactiveMP: getrecent, messageout

    Updating registry at `~/.julia/registries/General.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
Precompiling project...
  ✓ Unitful
  ✓ Unitful → ConstructionBaseUnitfulExt
  ✓ UnitfulLatexify
  ✓ Plots
  ✓ Plots → FileIOExt
  ✓ Plots → UnitfulExt
  6 dependencies successfully precompiled in 92 seconds. 260 already precompiled.
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`

Pkg.status()

Status `~/.julia/environments/v1.10/Project.toml`
  [b964fa9f] LaTeXStrings v1.3.1
  [91a5bcdd] Plots v1.40.8
  [86711068] RxInfer v3.6.0

2 DATA UNDERSTANDING

There is no pre-existing data to be analyzed.

3 DATA PREPARATION

There is no pre-existing data to be prepared.

4 MODELING

4.1 Narrative

The next figure (from Bert de Vries at Eindhoven Technical University) shows the interactions between the agent and the environment:

4.2 Core Elements

This section attempts to answer three important questions:

What metrics are we going to track?
What decisions do we intend to make?
What are the sources of uncertainty?

For this problem, we will only track the temperature.

Decisions will be in the form of agent-prescribed velocity actions.

The only source of uncertainty relating to the environment will be the noise in the measurements of the temperature.

4.3 System-Under-Steer / Environment / Generative Process

The system-under-steer is the radio-controlled paint drying cart. The cart is subject to a temperature gradient field given by:

The temperature $T$ is a function of postion $s$ and follows the profile

$T (\tilde{s}) = \frac{T_{0}}{{\tilde{s}}^{2} + 1}$

where

$T_{0} = 100$ is the temperature at the location of the heat source
$T (\tilde{s})$ is the temperature at a distance $\tilde{s}$ away from the heat source

Please see the next code and chart.

## Environmental process parameters

## Temperature at the heat source (z=0)
_𝚃₀ = 100.0 ## [\]mttT[tab] 'math teletype', closest for now 

## Actual temperature profile; this function is hidden from the agent
𝚃(s̃) = [_𝚃₀/(s̃[1]^2 + 1.0)] ## return a 1-element vector

𝚃 (generic function with 1 method)

_d = 0.0:0.01:6.0 ## distance range #.
_y⁰ = [𝚃([s̃ₖ])[1] for s̃ₖ in _d] ## observation range (noise-free) #.
_s̃₀ = [2.0] ## initial position #.
_x₊ = [4.0] ## target/goal temperature
plot(
    _d, _y⁰, color="black", xlabel="Position", ylabel="Temperature", #.
    label="Temperature")
scatter!(_s̃₀, [𝚃(_s̃₀)], label="Initial position temperature") #.

4.3.1 State variables

The state at time $t$ of the system-under-steer/environment (envir) will be given by ${\tilde{s}}_{t}$ which will be the position of the cart relative to the heat souce.

4.3.2 Decision variables

The decision variables represent what we control.

The environment is steered by decisions/actions $a_{t}$ that reflect the velocity of the cart. Decisions/actions are in the form of agent-prescribed velocity actions. An action suggests the adjustment of the radial velocity to either move away from the heat source or to move towards it. This velocity is limited to the interval $(- V^{m a x}, V^{m a x})$ . It is given by:

$\begin{aligned} V^{a} & = V^{m a x} \cdot \tanh (a) \\ = 0.05 \cdot \tanh (a) \end{aligned}$ where $a_{t}$ is the velocity action at time $t$ .

Since the velocity have limits, we use the $\tanh (\cdot)$ function to limit the velocity action to the interval $(- V^{m a x}, V^{m a x})$ .

Let us visualize how the velocity action, $V^{a}$ , varies with the amount of action $a$ .

_Vᵐᵃˣ = 0.5
Vᵃ = (a::Real) -> _Vᵐᵃˣ*tanh(a)

#13 (generic function with 1 method)

_a = range(-10, 10, length=400) #.
_Vᵃ = [ _Vᵐᵃˣ*tanh(xs) for xs in _a ] #.
plot(
    _a, _Vᵃ, #.
    title="Limits on velocity actions", 
    label="Landscape", 
    color="black", 
    xlabel=L"Action, $a$", 
    ylabel=L"Action velocity, $V^a$", 
    legend=nothing,
    ylimits=(-0.5, 0.5)
)

4.3.3 Exogenous information variables

There are no exogenous information variables for this problem.

4.3.4 Transition and Observation functions

The transition function captures the dynamics of the environment/system-under-steer/generative process:

$\begin{aligned} {\tilde{s}}_{t} & = R_{t} ({\tilde{s}}_{t - 1}, V ᵃ_{t}) \\ = {\tilde{s}}_{t - 1} + V ᵃ_{t} \\ = {\tilde{s}}_{t - 1} + V ᵐ ᵃ ˣ \tanh (a_{t}) \end{aligned}$

The environment generates outcomes as noisy observations of the current state with an observation noise variance

$ϑ = 10^{- 4}$

The observation function can be represented by: $y_{t} \sim N (T ({\tilde{s}}_{t}), ϑ)$

_γ = 1e4 ## transition precision (system noise)
_ϑ = 1e-4 ## observation variance (observation noise)

0.0001

4.3.5 Objective function

The objective function is such that the Bethe free energy is minimized. This aspect will be handled by the RxInfer Julia package.

4.3.6 Implementation of the System-Under-Steer / Environment / Generative Process

Because states of the agent are unknown to the world, we wrap them in a comprehension that only returns functions for interacting with the agent. Internal beliefs cannot be directly observed, and interaction is only allowed through the Markov blanket of the agent (i.e., the sensors and actuators).

function create_envir(; 𝚃, s̃₀)
    s̃ₜ₋₁ = s̃₀
    s̃ₜ = s̃ₜ₋₁
    yₜ = 𝚃(s̃ₜ)[1] + sqrt(_ϑ)*randn() ##Report noisy temperature at current position; maybe this is repeated to keep place in the records
    execute = (aₜ::Float64) -> begin
        s̃ₜ = s̃ₜ₋₁ + [Vᵃ(aₜ)] ##Compute next state #.
        yₜ = 𝚃(s̃ₜ)[1] + sqrt(_ϑ)*randn() ##Report noisy temperature at current position; maybe this is repeated to keep place in the records
        s̃ₜ₋₁ = s̃ₜ ##Reset state
    end

    observe = () -> begin 
        return [yₜ]
    end

    return (execute, observe)
end

create_envir (generic function with 1 method)

4.4 Uncertainty Model

The only uncertainty we have for the environment is the noise associated with an observation as captured in the observation function above: $y_{t} \sim N (T ({\tilde{s}}_{t}), ϑ)$

4.5 Agent / Generative Model

The agent consists of:

A free energy functional $F [q] = E_{q} [\log \frac{q (z)}{p (x, z)}]$ where
- $p (x, z) = Π_{k} p (x_{k}, z_{k} ∣ z_{k - 1})$ is a generative model with:
  - observations ${x_{k}}$
  - latent variables ${z_{k}} = {{s_{k}}, {θ_{k}}, {u_{k}}}$
  - $k$ is a time index
- $q (z)$ is a recognition model
A procedure to minimize the free energy $F [q]$

4.5.1 State variables

According to the agent the state of the system-under-steer/environment/generative process will be $s_{t}$ , rather than $s ̃_{t}$ , which is the distance of the agent from the heat source.

4.5.2 Decision variables

According to the agent the action on the environment at time $t$ will be represented by $u_{t}$ , also known as the control state of the agent.

4.5.3 Implementation of the Agent / Generative Model / Internal Model

We start by specifying a probabilistic model for the agent that describes the agent’s internal beliefs over the external dynamics of the environment. Assuming the current time is $t$ and $t = 1$ ,

The generative model is defined as follows:

$\begin{array}{r} p^{'} (x, s, θ_{A}, θ_{B}, u) = \underset{\begin{array}{c} Initial \\ state \\ prior \end{array}}{\underset{⏟}{p (s_{t - 1})}} \underset{\begin{array}{c} Parameter \\ θ_{A} \\ prior \end{array}}{\underset{⏟}{p (θ_{A})}} \underset{\begin{array}{c} Parameter \\ θ_{B} \\ prior \end{array}}{\underset{⏟}{p (θ_{B})}} \prod_{k = t}^{t + T} \underset{\begin{array}{c} Observation \\ model \end{array}}{\underset{⏟}{p (x_{k} ∣ s_{k}, θ_{A})}} \underset{\begin{array}{c} Transition \\ model \end{array}}{\underset{⏟}{p (s_{k} ∣ s_{k - 1}, θ_{B}, u_{k}}} \underset{\begin{array}{c} Control \\ prior \end{array}}{\underset{⏟}{p (u_{k})}} \end{array}$

The generative model includes future time steps.

Omitting parameters $θ_{A}$ and $θ_{B}$ ,

The generative model is defined as follows:

$\begin{array}{r} p^{'} (x, s, u) = \underset{\begin{array}{c} Initial \\ state \\ prior \end{array}}{\underset{⏟}{p (s_{t - 1})}} \prod_{k = t}^{t + T} \underset{\begin{array}{c} Observation \\ model \end{array}}{\underset{⏟}{p (x_{k} ∣ s_{k})}} \underset{\begin{array}{c} Transition \\ model \end{array}}{\underset{⏟}{p (s_{k} ∣ s_{k - 1}, u_{k}}} \underset{\begin{array}{c} Control \\ prior \end{array}}{\underset{⏟}{p (u_{k})}} \end{array}$

The generative model includes future time steps.

To infer goal-driven (i.e. purposeful) behavior, we add prior beliefs $p^{+} (x)$ about desired future observations. This leads to an extended agent model:

$\begin{aligned} p (x, s, u) & = \frac{p^{'} (x, s, u) p^{+} (x)}{\int_{x} p^{'} (x, s, u) p^{+} (x) d x} \\ \propto \underset{original generative model}{\underset{⏟}{p (s_{t - 1}) \prod_{k = t}^{t + T} p (x_{k} ∣ s_{k}) p (s_{k} ∣ s_{k - 1}, u_{k}) p (u_{k})}} \underset{\begin{array}{c} extension \\ Goal \\ prior \end{array}}{\underset{⏟}{p^{+} (x_{k})}} \\ \propto \underset{\begin{array}{c} Initial \\ state \\ prior \end{array}}{\underset{⏟}{p (s_{t - 1})}} \prod_{k = t}^{t + T} \underset{\begin{array}{c} Observation \\ model \end{array}}{\underset{⏟}{p (x_{k} ∣ s_{k})}} \underset{\begin{array}{c} Transition \\ model \end{array}}{\underset{⏟}{p (s_{k} ∣ s_{k - 1}, u_{k}}} \underset{\begin{array}{c} Control \\ prior \end{array}}{\underset{⏟}{p (u_{k})}} \underset{\begin{array}{c} Goal \\ prior \end{array}}{\underset{⏟}{p^{+} (x_{k})}} \end{aligned}$

The factors are defined as:

observation:

$\begin{aligned} p (x_{k} ∣ s_{k}) & = N (x_{k} ∣ - s_{k}, ϑ) \\ = N (x_{k} ∣ - s_{k}, 10^{- 2}) \end{aligned}$ where $x_{k}$ denotes observations of the agent after interacting with the environment. Note that we hampered the observation model. Instead of the actual temperature profile, we specify that the observed temperature decreases with position linearly.

state transition:

$\begin{aligned} p (s_{k} ∣ s_{k - 1}, u_{k}) & = N (s_{k} ∣ s_{k - 1} + u_{k}, ϑ) \\ p (s_{t - 1}) & = N (s_{t - 1} ∣ m_{t - 1}, V_{t - 1}) \end{aligned}$

The current state is a linear function of the previous state and action. We have endowed the agent with an accurate model of the system dynamics.

control:

$\begin{aligned} p (u_{t}) & = \prod_{k = t}^{t + T} N (u_{k} ∣ 0, Ξ) \\ = \prod_{k = t}^{t + T} N (u_{k} ∣ 0, ξ \cdot I) \\ = \prod_{k = t}^{t + T} N (u_{k} ∣ 0, 0.5) \\ = N (u_{t} ∣ m_{u}, V_{u}) \end{aligned}$

This represents the control priors.

goal/target/preference:

$\begin{aligned} p^{+} (x_{t}) & = \prod_{k = t}^{t + (T - 1)} N (x_{k} ∣ 0, σ^{h u g e}) \cdot N (x_{T} ∣ x_{+}, Σ) \\ = \prod_{k = t}^{t + (T - 1)} N (x_{k} ∣ 0, σ^{h u g e}) \cdot N (x_{T} ∣ x_{+}, σ \cdot I) \\ = \prod_{k = t}^{t + (T - 1)} N (x_{k} ∣ 0, 10^{12}) \cdot N (x_{T} ∣ 4.0, 10^{- 4} \cdot I) \\ = N (x_{t} ∣ m_{x}, V_{x}) \end{aligned}$

This represents the target/goal priors and encodes a belief about a preferred signal strength $x ₊ = 4.0$ .

initial state:

Setting $t = 1$ , $p (s_{0}) = N (s_{0} ∣ 0, 10^{12})$

This means we set a vague prior for the initial state.

4.5.3.1 Generative Model for the thermostat

The code in the next block defines the agent’s internal beliefs over the external dynamics and its probabilistic model of the environment, which correspond accurately by directly using the functions defined above. We use the @model macro from RxInfer to define the probabilistic model and the meta block to define approximation methods for the nonlinear state-transition functions.

In the model specification we include the beliefs over its future states (up to T steps ahead), in addition to the current state of the agent:

## @model function thermostat_model(; T, 𝚃)
@model function thermostat_model(mᵤ, Vᵤ, mₓ, Vₓ, mₛ₍ₜ₋₁₎, Vₛ₍ₜ₋₁₎, T, 𝚃)
    ## Transition function
    g = (sₜ₋₁::AbstractVector) -> begin
        sₜ = similar(sₜ₋₁) ## Next state
        sₜ = 𝚃(sₜ₋₁)
        return sₜ
    end
    
    Γ = _γ*diageye(1) ## Transition precision
    𝚯 = _ϑ*diageye(1)  ## Observation variance
    
    sₜ₋₁ ~ MvNormal(mean=mₛ₍ₜ₋₁₎, cov=Vₛ₍ₜ₋₁₎)
    sₖ₋₁ = sₜ₋₁
    
    local s
    
    for k in 1:T
        ## Control
        u[k]    ~ MvNormal(mean=mᵤ[k], cov=Vᵤ[k])
        hIuI[k] ~ MvNormal(mean=sₖ₋₁ + u[k], precision=Γ)
        
        ## State transition
        s[k] ~ g(hIuI[k]) where { meta=DeltaMeta(method=Unscented(alpha=1.9)) }
        
        ## Likelihood of future observations
        x[k] ~ MvNormal(mean=s[k], cov=𝚯)
        
        ## Target/Goal prior
        x[k] ~ MvNormal(mean=mₓ[k], cov=Vₓ[k])
        
        sₖ₋₁ = s[k]
    end
    return (s, )
end

Next, we define the agent and the time-stepping procedure.

function create_agent(; T=20, 𝚃, x₊, s₀, ξ=0.5, σ=1e-4)
    ## Set control priors
    Ξ  = fill(ξ, 1, 1) ##Control prior variance
    mᵤ = Vector{Float64}[ [0.0] for k=1:T ] ##Set control priors
    Vᵤ = Matrix{Float64}[ Ξ for k=1:T ]

    ## Set target/goal priors
    Σ       = σ*diageye(1) ##Target/Goal prior variance
    mₓ      = [zeros(1) for k=1:T] ##mean for x [vector]
    mₓ[end] = x₊ ##Set prior mean to reach goal at t=T
    Vₓ      = [huge*diageye(1) for k=1:T] ##Variance for x [matrix]
    Vₓ[end] = Σ ##Set prior variance to reach goal at t=T

    ## Set initial brain state prior
    mₛ₍ₜ₋₁₎ = s₀
    Vₛ₍ₜ₋₁₎ = tiny*diageye(1)
    ## Vₛ₍ₜ₋₁₎ = huge*diageye(1) ##in writeup
    
    ## Set current inference results
    result = nothing

    ## Bayesian inference by message passing
    ## The `infer` function is the heart of the agent
    ## It calls the `RxInfer.infer` function to perform Bayesian inference by message passing
    compute = (υₜ::Float64, ŷₜ::Vector{Float64}) -> begin ##.align with mountain car
        mᵤ[1] = [υₜ] ## Register action with the generative model
        Vᵤ[1] = fill(tiny, 1, 1) ## Clamp control prior to performed action

        mₓ[1] = ŷₜ ## Register observation with the generative model
        Vₓ[1] = tiny*diageye(1) ## Clamp target/goal prior to observation

        result = infer(
            model=thermostat_model(T=T, 𝚃=𝚃),
            data=Dict(
                :mᵤ     => mᵤ, 
                :Vᵤ     => Vᵤ, 
                :mₓ     => mₓ, 
                :Vₓ     => Vₓ,
                :mₛ₍ₜ₋₁₎ => mₛ₍ₜ₋₁₎,
                :Vₛ₍ₜ₋₁₎ => Vₛ₍ₜ₋₁₎))
    end
    
    ## The `act` function returns the inferred best possible action
    act = () -> begin
        if result !== nothing
            return mode(result.posteriors[:u][2])[1]
        else
            return 0.0 ## Without inference result we return some 'random' action
        end
    end
    
    ## The `future` function returns the inferred future states
    future = () -> begin 
        if result !== nothing 
            return getindex.(mode.(result.posteriors[:s]), 1)
        else
            return zeros(T)
        end
    end
    
    ## The `slide` function modifies the `(mₛ₍ₜ₋₁₎, Vₛ₍ₜ₋₁₎` for the next step
    ## and shifts (or slides) the array of future goals `(mₓ, Vₓ)` 
    ## and inferred actions `(mᵤ, Vᵤ)`
    slide = () -> begin
        model  = RxInfer.getmodel(result.model)
        (s, )  = RxInfer.getreturnval(model)
        varref = RxInfer.getvarref(model, s) 
        var    = RxInfer.getvariable(varref)
        
        slide_msg_idx = 3 ##This index is model dependent
        (mₛ₍ₜ₋₁₎, Vₛ₍ₜ₋₁₎) = mean_cov(getrecent(messageout(var[2], slide_msg_idx)))

        mᵤ = circshift(mᵤ, -1)
        mᵤ[end] = [0.0]
        Vᵤ = circshift(Vᵤ, -1)
        Vᵤ[end] = Ξ

        mₓ = circshift(mₓ, -1)
        mₓ[end] = x₊ ##x_target
        Vₓ = circshift(Vₓ, -1)
        Vₓ[end] = Σ
    end

    return (act, future,   compute, slide)
end

create_agent (generic function with 1 method)

4.6 Agent Evaluation

4.6.1 Evaluate with simulated data

4.6.1.1 Naive approach

In this simulation we are going to perform a naive action policy. In this case, with limited engine power, the agent should not be able to achieve its goal:

_Nⁿᵃⁱᵛᵉ  = 100 ## Total simulation time
_πⁿᵃⁱᵛᵉ = 0.5*ones(_Nⁿᵃⁱᵛᵉ) ## Naive policy for right full-power only

(execute_naive, observe_naive) = create_envir(; ## Let there be a world
    𝚃=𝚃,
    s̃₀=_s̃₀
);

_yⁿᵃⁱᵛᵉ = Vector{Vector{Float64}}(undef, _Nⁿᵃⁱᵛᵉ)
for t = 1:_Nⁿᵃⁱᵛᵉ
    execute_naive(_πⁿᵃⁱᵛᵉ[t]) ## Execute environmental process
    _yⁿᵃⁱᵛᵉ[t] = observe_naive() ## Observe external states
end

_yⁿᵃⁱᵛᵉ

100-element Vector{Vector{Float64}}:
 [16.71681084161624]
 [14.162811043613832]
 [12.130010375269212]
 [10.477036308444694]
 [9.129242440313945]
 [8.014134634991786]
 [7.11502449829092]
 [6.335537417093641]
 [5.668086254476602]
 [5.09459637411509]
 ⋮
 [0.18373948326761114]
 [0.18838046098425149]
 [0.18520102232614685]
 [0.17734073536314304]
 [0.16833059836318884]
 [0.17676855961853477]
 [0.16856284746341046]
 [0.12573434370159794]
 [0.1461178658812947]

_pa=plot(
    map(x -> x[1], _πⁿᵃⁱᵛᵉ), label="actions", 
    xlabel="t", ylabel="velocity",
    )
_py=plot(
    map(x -> x[1], _yⁿᵃⁱᵛᵉ), label="observations", 
    color="red", xlabel="t", ylabel="temperature", 
    )
_py=plot!(
    [0, _Nⁿᵃⁱᵛᵉ], [_x₊[1], _x₊[1]],
    label="target/goal", color="green", xlabel="t", ylabel="temperature")
plot(_pa, _py, layout=@layout([ a; b ]))

4.6.1.1 Active inference approach

In the active inference approach we are going to create an agent that models the environment around itself as well as the best possible actions in a probabilistic manner. That should help agent to understand that the brute-force approach is not the most efficient one and hopefully to realise that a little bit of swing is necessary to achieve its goal.

### Simulation parameters
## Total simulation time
_Nᵃⁱ = 100

## Lookahead time horizon
_Tᵃⁱ = 20

## Initial position
_s₀ = [2.0]

## Control prior variance value
# _Εᵛᵃˡ = .5
_ξ = 0.5

## Target prior variance value
_σ = 1e-4

## Target/Goal signal strength
_x₊ = [4.0]

1-element Vector{Float64}:
 4.0

# ## OVERRIDES
## Total simulation time
# # _Nᵃⁱ = 50 #100 
# # _Nᵃⁱ = 100
# _Nᵃⁱ = 200
# # _Nᵃⁱ = 500
# # _Nᵃⁱ = 1000

## Lookahead time horizon
# # _Tᵃⁱ = 20#200 #100 #50 #20 
# _Tᵃⁱ = 50
# # _Tᵃⁱ = 100

# ## Initial position
# # _s₀ = [2.0]#[2.0]
# # _s₀ = [1.0]

# ## Control prior variance value
# _ξ = 0.01
# _ξ = 0.1
# _ξ = 1.0
# _ξ = 10.0
# _ξ = 100.0
# _ξ = 1000.0
# _ξ = 10000.0
# _ξ = 1e12

## Target prior variance value
# _σ = 1e-4

# ## Target/Goal temperature
# # _x₊ = [25.0]#[5.0]#[4.0] #[25.0]#[4.0]
# # _x₊ = [50.0]
# # _x₊ = [20.0]
# _x₊ = [15.0]

(execute_ai, observe_ai) = create_envir(;
    𝚃=𝚃, #.
    s̃₀=_s₀
)
(act_ai, future_ai,   compute_ai, slide_ai) = create_agent(;
    T =_Tᵃⁱ,
    𝚃=𝚃,
    x₊=_x₊,
    s₀=_s₀,
    ξ = _ξ,
    σ = _σ
) 

## Step through experimental protocol
_as = Vector{Float64}(undef, _Nᵃⁱ)         ## Actions
_fs = Vector{Vector{Float64}}(undef, _Nᵃⁱ) ## Predicted future
_ys = Vector{Vector{Float64}}(undef, _Nᵃⁱ) ## Observations #.
for t = 1:_Nᵃⁱ
    ## 1. Act-Execute-Observe: #.execute() & observe() from create_envir() 
    _as[t] = act_ai()            ## Invoke an action from the agent
    _fs[t] = future_ai()         ## Fetch the predicted future states
             execute_ai(_as[t]) ## The action influences hidden external states
    _ys[t] = observe_ai() ## Observe the current environmental outcome (update p) #.
    ## 2. Infer:    
            compute_ai(_as[t], _ys[t]) ## Infer beliefs from current model state (update q)
    ## 3. Slide:
             slide_ai() ## Prepare for next iteration
end

_as

100-element Vector{Float64}:
  0.0
 -6.343207000354588e-12
 -6.300452608128645e-12
 -6.325567301730797e-12
 -6.32821323632722e-12
 -6.2999967981599965e-12
 -6.32114312180386e-12
 -6.301798495771919e-12
 -6.3000011465939514e-12
 -6.328374396370409e-12
  ⋮
 -0.003942522330673979
 -0.01943646046556582
  0.0038905952643777204
  0.02400063974579091
 -0.00826943373310313
 -0.010698347037095936
  0.005816027556693469
  0.013641346868345587
 -0.0053588741909303935

_fs

100-element Vector{Vector{Float64}}:
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
 [19.98732212295191, 0.2506316816377108, 78.2434752339421, 0.024188291168147837, 82.12774910911314, 0.023343250705071573, 82.12968241977089, 0.023344835490577497, 82.1296742163674, 0.02334484547066802, 82.12967417433826, 0.02334484550648787, 82.12967385800283, 0.02334484585602179, 82.12902950374514, 0.023345557931323712, 80.81592674266264, 0.024796666594149613, -2595.101616631002, 2.9819507015123]
 [20.012033919899157, 0.2500123051075404, 78.26164934260241, 0.024183478944838355, 82.12776174705, 0.023343255901790226, 82.12968238984718, 0.023344835531708123, 82.12967421619298, 0.023344845470820063, 82.12967417448836, 0.023344845506321828, 82.12967416511802, 0.023344845516629097, 82.12965536098811, 0.02334486629705572, 82.09133520704768, 0.023387213864201162, 4.000286103257123, 4.000077822209044]
 [19.996940396244486, 0.25039033624168633, 78.25056091547398, 0.024186414040750052, 82.12775404036796, 0.023343252728926473, 82.1296824081125, 0.02334483550660844, 82.12967421614421, 0.0233448454708988, 82.12967385812027, 0.023344845855939968, 82.12902945105341, 0.023345557989553442, 80.81581936353287, 0.024796785258727912, -2595.3204400968066, 2.982192523124645, 4.000301064749503, 4.000077822472365]
 [19.995717590896437, 0.2504210000770538, 78.24966093480106, 0.02418665239557995, 82.12775341431059, 0.023343252471713243, 82.12968240959393, 0.02334483550457186, 82.12967421630357, 0.0233448454707247, 82.12967416526138, 0.023344845516518695, 82.12965536098852, 0.02334486629705542, 82.09133520703394, 0.02338721386421635, 4.000286075247082, 4.000339273281338, 4.000339302230682, 4.000077823145877]
 [20.010872666050012, 0.25004135935223804, 78.2607975653654, 0.02418370430375841, 82.12776115549383, 0.023343255657811535, 82.12968239109593, 0.02334483552995106, 82.12967389982823, 0.02334484582043638, 82.12902945119639, 0.023345557989443336, 80.8158193635329, 0.024796785258727985, -2595.3204400975956, 2.982192523125517, 4.000301036739258, 4.000339273544424, 4.000339302230687, 4.000077823145877]
 [19.998324719865813, 0.2503556288722998, 78.25157947438923, 0.024186144304839383, 82.1277547488123, 0.023343253020085154, 82.12968240643134, 0.023344835508918315, 82.12967420705772, 0.023344845480937956, 82.12965536113182, 0.023344866296945047, 82.09133520703435, 0.023387213864216043, 4.0002860752470815, 4.0003392732843395, 4.000339274220319, 4.000339274217335, 4.0003393022306994, 4.000077823145877]
 [20.013175547842916, 0.24998374683549687, 78.26248650755218, 0.024183257469094804, 82.12776232823045, 0.023343256141795478, 82.12968207209448, 0.023344835883226608, 82.12902949289074, 0.02334555795394937, 80.815819363689, 0.024796785258592614, -2595.3204400976274, 2.982192523125491, 4.000301036739258, 4.000339273547424, 4.000339274220324, 4.000339274217335, 4.0003393022306994, 4.000077823145877]
 [20.011193708763845, 0.2500333264469247, 78.26103307176228, 0.024183641992904688, 82.12776131905511, 0.023343255725269553, 82.12968238163107, 0.023344835540514905, 82.12965540283759, 0.023344866261443286, 82.09133520717766, 0.023387213864105205, 4.0002860752470815, 4.0003392732843395, 4.000339274220319, 4.0003392742203365, 4.000339274220337, 4.000339274217335, 4.0003393022306994, 4.000077823145877]
 [20.0076159566416, 0.25012286834120884, 78.25840758909462, 0.02418433672562668, 82.12775917901536, 0.02334325532290504, 82.12903767266343, 0.02334554800614642, 80.81581941001784, 0.024796785214489424, -2595.320440108609, 2.982192523115809, 4.000301036739257, 4.000339273547424, 4.000339274220323, 4.0003392742203365, 4.000339274220337, 4.000339274217335, 4.0003393022306994, 4.000077823145877]
 ⋮
 [5.053281084686124, 4.000260505097566, 4.000339272840953, 4.000339274220312, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.000339274220337, 4.000339274217335, 4.0003393022306994, 4.000077823145877]
 [5.070055050752981, 4.0002566621889155, 4.000339272773542, 4.000339274220312, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.000339274220337, 4.000339274217335, 4.0003393022306994, 4.000077823145877]
 [5.0447298649948635, 4.000262418251974, 4.000339272874507, 4.000339274220312, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.000339274220337, 4.000339274217335, 4.0003393022306994, 4.000077823145877]
 [5.022551490857471, 4.000267238128127, 4.000339272959048, 4.0003392742203125, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.000339274220337, 4.000339274217335, 4.0003393022306994, 4.000077823145877]
 [5.05798405647362, 4.0002594397405655, 4.000339272822264, 4.0003392742203125, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.000339274220337, 4.000339274217335, 4.0003393022306994, 4.000077823145877]
 [5.060617784836198, 4.000258839039103, 4.000339272811728, 4.0003392742203125, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.000339274220337, 4.000339274217335, 4.0003393022306994, 4.000077823145877]
 [5.042620472987896, 4.000262885456829, 4.000339272882704, 4.0003392742203125, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.000339274220337, 4.000339274217335, 4.0003393022306994, 4.000077823145877]
 [5.034017055851266, 4.000264771803537, 4.000339272915792, 4.0003392742203125, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.000339274220337, 4.000339274217335, 4.0003393022306994, 4.000077823145877]
 [5.054822132291112, 4.000260157037838, 4.000339272834847, 4.0003392742203125, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.0003392742203365, 4.000339274220337, 4.000339274217335, 4.0003393022306994, 4.000077823145877]

_ys

100-element Vector{Vector{Float64}}:
 [19.987269174689366]
 [20.003012515936202]
 [19.987828421880508]
 [19.986659749224486]
 [20.001821062932194]
 [19.98921710129548]
 [20.00411487425469]
 [20.00207937166385]
 [19.99850832336812]
 [19.999713285209054]
 ⋮
 [5.034437623873382]
 [5.00696127942573]
 [4.985408793409926]
 [5.023876107109116]
 [5.024288234007355]
 [5.005348328839933]
 [4.997513181834071]
 [5.019801749717599]
 [5.011007211431238]

## _pa = plot(map(x -> x[1], _πⁿᵃⁱᵛᵉ), label="action", xlabel="t", ylabel="force")
## _py = plot(map(x -> x[1], _yⁿᵃⁱᵛᵉ), label="obsevation", color="red", xlabel="t", ylabel="temperature")
## _py = plot!([0,100], [_x₊[1], _x₊[1]], label="goal", color="green", xlabel="t", ylabel="temperature")
## plot(_pa, _py, layout=@layout([ a; b ]))
_pa = plot(
    0.5*tanh.(_as), label="actions", xlabel="t", ylabel="velocity", 
    )
_py = plot(
    map(x -> x[1], _ys), label="observations",
    color="red", xlabel="t", ylabel="temperature", 
    )
_py = plot!(
    [0, _Nᵃⁱ], [_x₊[1], _x₊[1]], 
    label="target/goal", color="green", xlabel="t", ylabel="temperature")
plot(_pa, _py, layout=@layout([ a; b ]))

_s₀, 𝚃(_s₀), _x₊

([2.0], [20.0], [4.0])

Because $s_{0} = [2.0]$ , the temperature starts at $T ([2.0]) = 20.0$ . However, the target temperature is $x_{+} = [4.0]$ . The agent then issues actions in the form of velocity adjustments until the target temperature is reached.