Sales Forecasting with Time-Varying Autoregressive Models (Part 2)

Learning the parameters of an autoregressive process

Retail Industry
Bayesian Inference
Autoregressive (AR) Models
Active Inference
RxInfer
Julia
Author

Kobus Esterhuysen

Published

June 14, 2024

Modified

October 12, 2024

Forecasting sales volume and sales revenue is an essential part of the operations of a business. In Part 1, we analyzed a paper to understand some principles supporting the end goal:

Podusenko, A., Kouw, W. M., & de Vries, B. Message Passing-Based Inference for Time-Varying Autoregressive Models

We want to make use of a bayesian reactive message passing approach, and have an eventual implementation in the RxInfer framework. In Part 2 we implemented a rough foundation by simulating 1,000 sales data points as a standardized normal distribution, making use of a 5th order autoregression process. We then used RxInfer to learn the order as well as the parameters \(\boldsymbol\theta\) of the process. The experiment picked the order to be 4. The inference process also learned the value of the transition precision parameter \(\gamma\) as well as the \(s_t\) state component.

versioninfo() ## Julia version
Julia Version 1.10.4
Commit 48d4fd48430 (2024-06-04 10:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_NUM_THREADS = 
import Pkg
Pkg.add(Pkg.PackageSpec(;name="RxInfer", version="3.6.0"))
Pkg.add(Pkg.PackageSpec(;name="Distributions", version="0.25.108"))
Pkg.add(Pkg.PackageSpec(;name="Random"))
Pkg.add(Pkg.PackageSpec(;name="Plots", version="1.40.1"))
Pkg.add(Pkg.PackageSpec(;name="LaTeXStrings", version="1.3.1"))
Pkg.add(Pkg.PackageSpec(;name="BenchmarkTools", version="1.5.0"))
Pkg.add(Pkg.PackageSpec(;name="Parameters", version="0.12.3"))

using RxInfer, Distributions, LinearAlgebra, Random, Plots, LaTeXStrings, BenchmarkTools, Parameters
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
Pkg.status()
Status `~/.julia/environments/v1.10/Project.toml`
  [6e4b80f9] BenchmarkTools v1.5.0
⌃ [31c24e10] Distributions v0.25.108
  [b964fa9f] LaTeXStrings v1.3.1
  [d96e819e] Parameters v0.12.3
⌃ [91a5bcdd] Plots v1.40.1
  [86711068] RxInfer v3.6.0
  [9a3f8284] Random
Info Packages marked with ⌃ have new versions available and may be upgradable.
## macro h(x) ## for help
#     quote
#         display("text/markdown", @doc $x)
#     end    
# end
# @h Normal() ## help on Normal()

Our model for the Variational Bayesian Inference of a latent autoregressive model can be represented mathematically as follows:

\[\begin{aligned} p(\gamma) &= \mathcal{Gam}(\gamma|a, b)\\ p(\boldsymbol{\theta}) &= \mathcal{N}(\boldsymbol{\theta}|\mathbf{m}_{\theta}, \mathbf{V}_{\theta})\\ p(\mathbf{s}_t \mid \mathbf{s}_{t-1}, \boldsymbol{\theta}) &= \mathcal{N}\left(\mathbf{s}_t \mid \mathbf{B}\left(\boldsymbol{\theta}\right) \mathbf{s}_{t-1},\, \mathbf{V}(\gamma)\right)\\ p(x_t \mid \mathbf{s}_t) &= \mathcal{N}(x_t \mid \mathbf{s}_t, \tau^{-1}) \end{aligned}\]

where

To generate some synthetic sales data, we use a \(M^{AR}=5\) order autoregressive (AR) process with the following coeffcients (corresponding to stable poles):

_θ⁵ = [0.10699399235785655, -0.5237303489793305, 0.3068897071844715, -0.17232255282458891, 0.13323964347539288];
function generate_sales_data(
rng, ##. random number generator
N, ##. number of data points 
θ̃, ##. AR parameters
γ̃, ##. transition precision
τ̃) ##. observation precision
    Mᴬᴿ = length(θ̃) ##. order of process (use Mᴹᴬ for MA process)
= Vector{Vector{Float64}}(undef, N + 3Mᴬᴿ)
    y = Vector{Float64}(undef, N + 3Mᴬᴿ)
    γ̃_std = sqrt(inv(γ̃)) ##. transition standard dev
    τ̃_std = sqrt(inv(τ̃)) ##. observation standard dev
    s̃[1] = randn(rng, Mᴬᴿ) ##. Mᴬᴿ random numbers

    ## +++ Matrix implementation
    ## Build BIθₜI, the transition matrix
    Iₘ₋₁ = Diagonal(ones(Mᴬᴿ - 1))
    V0 = Vector(zeros(Mᴬᴿ - 1)) ##vector of zeros
    Bᵇᵒᵗᵗᵒᵐ = hcat(Iₘ₋₁, V0)
    θₜ = θ̃
    BIθₜI = vcat(transpose(θₜ), Bᵇᵒᵗᵗᵒᵐ)
    ## Build VIγI, the transition covariance matrix
    vᵤ = Vector(vcat(1.0, zeros(Mᴬᴿ - 1)))
    VIγI = (1/γ̃)*vᵤ*transpose(vᵤ)
    for i in 2:Mᴬᴿ
        # VIγI[i, i] = tiny ##tiny values on remaining off-diagonals to avoid PosDefException
        VIγI[i, i] = 1e-12*tiny ##tiny values on remaining off-diagonals to avoid PosDefException
    end
    ## --- 
    for t in 2:(N + 3Mᴬᴿ) #.
        ## s̃[t] = vcat(
        ##     rand( ##. noised current value is pushed in at the top of state vector
        ##         rng, 
        ##         Normal(dot(θ̃, s̃[t-1]), γ̃_std)
        ##     ), 
        ##     s̃[t-1][1:end-1] ##. rest shifted down, last element pushed out & lost
        ## ) #.
        s̃[t] = rand(rng, MvNormal(BIθₜI*s̃[t-1], VIγI)) ##. Eq (2b)

        ## y[t] = rand(rng, Normal(s̃[t][1], τ̃_std)) #.
        y[t] = rand(rng, Normal(transpose(vᵤ)*s̃[t], τ̃_std)) ##. Eq (2c)
    end
    
    return s̃[1+3Mᴬᴿ:end], y[1+3Mᴬᴿ:end] #.
  end
generate_sales_data (generic function with 1 method)
## Seed for reproducibility
_seed = 123
_rng  = MersenneTwister(_seed)

## Number of observations in synthetic dataset
_N = 1000

## AR process parameters
_γ̃ = 1.0 ##. real γ (transition precision)
_τ̃ = 0.5 ##. real τ (observation precision)
_θ̃ = _θ⁵ ##. real θ (AR parameters)

_s̃s, _ys = generate_sales_data( #.
    _rng, _N, _θ̃, _γ̃, _τ̃); #.

Let’s plot our synthetic dataset:

plot(
    first.(_s̃s), 
    title="Simulated standardized sales data",
    xlabel=L"$\mathrm{time}, t$",
    color="red",
    label=L"$\mathrm{Hidden\ state\ component}, \tilde{s}_t$",
) #.
scatter!(_ys, label=L"$\mathrm{Observation}, y_t$", color="orange")

Now we specify a probabilistic model, the inference constraints and then run an inference procedure with RxInfer. We need two different models for:

@model function lar_model(
x, ##. data/observations 
𝚃ᴬᴿ, ##. Uni/Multi variate 
Mᴬᴿ, ##. AR order
vᵤ, ##. unit vector 
τ) ##. observation precision     
    ## Priors
    γ  ~ Gamma= 1.0, β = 1.0) ##. for transition precision    
    if 𝚃ᴬᴿ === Multivariate
        θ  ~ MvNormal= zeros(Mᴬᴿ), Λ = diageye(Mᴬᴿ)) ##.kw μ,Λ only work inside macro
        s₀ ~ MvNormal= zeros(Mᴬᴿ), Λ = diageye(Mᴬᴿ)) ##.kw μ,Λ only work inside macro
    else ## Univariate
        θ  ~ Normal= 0.0, γ = 1.0)
        s₀ ~ Normal= 0.0, γ = 1.0)
    end
    sₜ₋₁ = s₀
    for t in eachindex(x)
        s[t] ~ AR(sₜ₋₁, θ, γ) #.Eq (2b)
        if 𝚃ᴬᴿ === Multivariate
            x[t] ~ Normal= dot(vᵤ, s[t]), γ = τ) #.Eq (2c)
        else
            x[t] ~ Normal= vᵤ*s[t], γ = τ) #.Eq (2c)
        end
        sₜ₋₁ = s[t]
    end
end
@constraints function ar_constraints() 
    q(s₀, s, θ, γ) = q(s₀, s)q(θ)q(γ)
end
ar_constraints (generic function with 1 method)
@meta function ar_meta(𝚃ᴬᴿ, Mᴬᴿ, stype)
    AR() -> ARMeta(𝚃ᴬᴿ, Mᴬᴿ, stype) #.
end
ar_meta (generic function with 1 method)
@initialization function ar_init(𝚃ᴬᴿ, Mᴬᴿ)
    q(γ) = GammaShapeRate(1.0, 1.0)
    if 𝚃ᴬᴿ === Multivariate    
        q(θ) = MvNormalMeanPrecision(zeros(Mᴬᴿ), diageye(Mᴬᴿ)) #.
    else ## Univariate
        q(θ) = NormalMeanPrecision(0.0, 1.0) #.
    end
end
ar_init (generic function with 1 method)
function run_inference(Nmodels)
    results = Vector{Any}(undef, Nmodels)
    for m = 1:Nmodels
        println("m = $m")
        if m > 1 𝚃ᴬᴿ = Multivariate elseif m === 1 𝚃ᴬᴿ = Univariate
        else error("m=$m which is not an allowed value.") end    
        results[m] = infer(
            model = lar_model(
                𝚃ᴬᴿ=𝚃ᴬᴿ, ##. Uni/Multi variate
                Mᴬᴿ=m, ##. AR order
                vᵤ=ReactiveMP.ar_unit(𝚃ᴬᴿ, m), ##. unit vector
                τ=_τ̃), ##. observation precision
            data = (x=_ys,), #.
            constraints   = ar_constraints(),
            meta          = ar_meta(𝚃ᴬᴿ, m, ARsafe()), #.
            options       = (limit_stack_depth = 100,),
            initialization = ar_init(𝚃ᴬᴿ, m),
            returnvars    = (s=KeepLast(), γ=KeepEach(), θ=KeepEach()),
            free_energy   = true,
            iterations    = 100,
            showprogress  = false
        )
    end
    return(results)
end       
run_inference (generic function with 1 method)

Now we run the inference for \(M^{AR}=1,2, ..., 5\):

_results = run_inference(5)
m = 1
m = 2
m = 3
m = 4
m = 5
5-element Vector{Any}:
 Inference results:
  Posteriors       | available for (γ, s, θ)
  Free Energy:     | Real[2132.4, 2126.68, 2121.58, 2116.54, 2111.68, 2107.12, 2102.98, 2099.35, 2096.27, 2093.75  …  2086.72, 2086.72, 2086.72, 2086.72, 2086.72, 2086.72, 2086.72, 2086.72, 2086.72, 2086.72]

 Inference results:
  Posteriors       | available for (γ, s, θ)
  Free Energy:     | Real[2156.82, 2148.12, 2140.94, 2132.99, 2124.42, 2115.29, 2106.56, 2098.44, 2091.98, 2086.83  …  2072.96, 2072.96, 2072.96, 2072.96, 2072.96, 2072.96, 2072.96, 2072.96, 2072.96, 2072.96]

 Inference results:
  Posteriors       | available for (γ, s, θ)
  Free Energy:     | Real[2175.76, 2165.66, 2159.78, 2152.86, 2145.01, 2136.08, 2126.37, 2116.5, 2107.17, 2099.68  …  2073.79, 2073.79, 2073.79, 2073.79, 2073.8, 2073.8, 2073.8, 2073.8, 2073.8, 2073.8]

 Inference results:
  Posteriors       | available for (γ, s, θ)
  Free Energy:     | Real[2191.74, 2179.03, 2174.48, 2169.21, 2162.94, 2155.59, 2146.99, 2137.06, 2126.13, 2115.39  …  2071.76, 2071.83, 2071.78, 2071.81, 2071.83, 2071.83, 2072.07, 2072.03, 2072.14, 2072.02]

 Inference results:
  Posteriors       | available for (γ, s, θ)
  Free Energy:     | Real[2206.37, 2189.7, 2186.03, 2181.77, 2176.9, 2171.0, 2164.37, 2156.11, 2146.78, 2136.25  …  2074.45, 2074.17, 2074.21, 2074.14, 2074.2, 2074.19, 2074.22, 2074.29, 2074.15, 2074.09]
function plot_free_energy(results)
    p = plot(title="Bethe Free Energy", legend=:topright, xlabel="iterations")
    for m = 1:length(results)
        p = plot!(p, results[m].free_energy, label="AR$m")
    end
    return p
end
plot_free_energy (generic function with 1 method)
plot_free_energy(_results)
function plot_converged_free_energy(results)
    values = [results[m].free_energy[end] for m in 1:length(results)]
    bar(
        1:length(results), 
        values, 
        legend=:none, 
        ylim=(2070, 2090), title="Converged Free Energy vs AR Order"
    )
    xlabel!("AR order")
    ylabel!("Minimum Free Energy")
end
plot_converged_free_energy (generic function with 1 method)
plot_converged_free_energy(_results)
values = [_results[m].free_energy[end] for m in 1:length(_results)]
5-element Vector{Float64}:
 2086.7159059436744
 2072.962026084404
 2073.7964038953724
 2072.021199658476
 2074.0921058809427
function plot_transition_precision(results)
    p = plot(title="Inferred transition precision", legend=:topright, xlabel="iterations")
    for m = 1:length(results)
        p = plot!(p, mean.(results[m].posteriors[:γ]), label="AR$m")
    end
    plot!(
        p, [_γ̃], seriestype=:hline, color="red", linestyle=:dash, 
        label=L"Real transition precision $\tilde{γ}$"
    )
    return p
end
plot_transition_precision (generic function with 1 method)
plot_transition_precision(_results)
best_order = 4
@unpack s, θ, γ = _results[best_order].posteriors
Dict{Symbol, Vector} with 3 entries:
  :γ => GammaShapeRate{Float64}[GammaShapeRate{Float64}(a=501.0, b=110.664), Ga…
  :s => MvNormalWeightedMeanPrecision{Float64, Vector{Float64}, Matrix{Float64}…
  :θ => MvNormalWeightedMeanPrecision{Float64, Vector{Float64}, Matrix{Float64}…
_p1 = plot(
    first.(_s̃s), ## first.() gives s̃ₜ, not the delayed s̃ₜ₋₁, s̃ₜ₋₂, ..., s̃ₜ₋ₘ₊₁
    title="AR, " * L"$M^{AR}=$" * "$best_order",
    xlabel=L"time $t$",
    color="red",
    label=L"Hidden state component $\tilde{s}_t$")
_p1 = scatter!(_p1, _ys, label=L"Observation $x_t=y_t$", color="orange")
_p1 = plot!(
    _p1, first.(mean.(s)), 
    ribbon=first.(std.(s)), 
    fillalpha=0.2,
    label=L"Inferred state component $s_t$",
    color="green",
    legend=:bottomright)

_p2 = plot(
    mean.(γ), ribbon=std.(γ), 
    fillalpha=0.2,
    label=L"Inferred transition precision $\gamma$", 
    color="green",
    legend=:topright,
    xlabel="iterations",
)
_p2 = plot!(
    [_γ̃], 
    seriestype=:hline, 
    color="red", #.
    label=L"Real transition precision $\tilde{γ}$"
)

_p3 = plot(_results[best_order].free_energy, label="Bethe Free Energy", xlabel="iterations")
plot(_p1, _p2, _p3, layout=@layout([ a; b c ]))

Let’s look at a smaller range of data for the sake of interest:

_subrange = 700 : 750
plot(
    _subrange, 
    first.(_s̃s)[_subrange], 
    title=L"Subrange of: AR, $M^{AR}=$"*"$best_order",
    xlabel=L"time $t$",
    color="red",
    label=L"Hidden state component $\tilde{s}_t$") #.
scatter!(_subrange, _ys[_subrange], label=L"Observation $x_t=y_t$", color="orange")
plot!(
    _subrange, 
    first.(mean.(s))[_subrange],
    ribbon=sqrt.(first.(var.(s)))[_subrange],
    fillalpha=0.2,
    label=L"Inferred state component $s_t$",
    color="green",
    legend=:bottomright)