Sales Forecasting with Time-Varying Autoregressive Models (Part 2)

Back to Portfolio of Projects | LearnableLoopAI.com | Blog |

Forecasting sales volume and sales revenue is an essential part of the operations of a business. In Part 1, we analyzed a paper to understand some principles supporting the end goal:

Podusenko, A., Kouw, W. M., & de Vries, B. Message Passing-Based Inference for Time-Varying Autoregressive Models

We want to make use of a bayesian reactive message passing approach, and have an eventual implementation in the RxInfer framework. In Part 2 we implemented a rough foundation by simulating 1,000 sales data points as a standardized normal distribution, making use of a 5th order autoregression process. We then used RxInfer to learn the order as well as the parameters $θ$ of the process. The experiment picked the order to be 4. The inference process also learned the value of the transition precision parameter $γ$ as well as the $s_{t}$ state component.

versioninfo() ## Julia version

Julia Version 1.10.4
Commit 48d4fd48430 (2024-06-04 10:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_NUM_THREADS =

import Pkg
Pkg.add(Pkg.PackageSpec(;name="RxInfer", version="3.6.0"))
Pkg.add(Pkg.PackageSpec(;name="Distributions", version="0.25.108"))
Pkg.add(Pkg.PackageSpec(;name="Random"))
Pkg.add(Pkg.PackageSpec(;name="Plots", version="1.40.1"))
Pkg.add(Pkg.PackageSpec(;name="LaTeXStrings", version="1.3.1"))
Pkg.add(Pkg.PackageSpec(;name="BenchmarkTools", version="1.5.0"))
Pkg.add(Pkg.PackageSpec(;name="Parameters", version="0.12.3"))

using RxInfer, Distributions, LinearAlgebra, Random, Plots, LaTeXStrings, BenchmarkTools, Parameters

   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`

Pkg.status()

Status `~/.julia/environments/v1.10/Project.toml`
  [6e4b80f9] BenchmarkTools v1.5.0
⌃ [31c24e10] Distributions v0.25.108
  [b964fa9f] LaTeXStrings v1.3.1
  [d96e819e] Parameters v0.12.3
⌃ [91a5bcdd] Plots v1.40.1
  [86711068] RxInfer v3.6.0
  [9a3f8284] Random
Info Packages marked with ⌃ have new versions available and may be upgradable.

## macro h(x) ## for help
#     quote
#         display("text/markdown", @doc $x)
#     end    
# end
# @h Normal() ## help on Normal()

Our model for the Variational Bayesian Inference of a latent autoregressive model can be represented mathematically as follows:

$\begin{aligned} p (γ) & = Gam (γ | a, b) \\ p (θ) & = N (θ | m_{θ}, V_{θ}) \\ p (s_{t} ∣ s_{t - 1}, θ) & = N (s_{t} ∣ B (θ) s_{t - 1}, V (γ)) \\ p (x_{t} ∣ s_{t}) & = N (x_{t} ∣ s_{t}, τ^{- 1}) \end{aligned}$

where

$M^{A R}$ is the order of the autoregression process
$θ = (θ_{1}, θ_{2}, . . ., θ_{M^{A R}})^{⊤}$ is a vector of transition coefficients
$s_{t - 1} = (s_{t - 1}, s_{t - 2}, . . ., s_{t - M^{A R}})^{⊤}$ is the previous state
$s_{t} = (s_{t}, s_{t - 1}, . . ., s_{t - M^{A R} + 1})^{⊤}$ is the current state
$γ$ is the transition precision
$τ$ is the observation precision
$v_{u} = (1, 0, . . ., 0)^{⊤}$ is an $M^{A R}$ -dimensional unit vector selecting the first component of the vector that follows
$V (γ) = (1 / γ) v_{u} v_{u}^{⊤}$ is the transition covariance
$B (θ) = [\begin{matrix} θ^{⊤} \\ I_{M^{A R} - 1} & 0 \end{matrix}]$ is the transition matrix

To generate some synthetic sales data, we use a $M^{A R} = 5$ order autoregressive (AR) process with the following coeffcients (corresponding to stable poles):

_θ⁵ = [0.10699399235785655, -0.5237303489793305, 0.3068897071844715, -0.17232255282458891, 0.13323964347539288];

function generate_sales_data(
rng, ##. random number generator
N, ##. number of data points 
θ̃, ##. AR parameters
γ̃, ##. transition precision
τ̃) ##. observation precision
    Mᴬᴿ = length(θ̃) ##. order of process (use Mᴹᴬ for MA process)
    s̃ = Vector{Vector{Float64}}(undef, N + 3Mᴬᴿ)
    y = Vector{Float64}(undef, N + 3Mᴬᴿ)
    γ̃_std = sqrt(inv(γ̃)) ##. transition standard dev
    τ̃_std = sqrt(inv(τ̃)) ##. observation standard dev
    s̃[1] = randn(rng, Mᴬᴿ) ##. Mᴬᴿ random numbers

    ## +++ Matrix implementation
    ## Build BIθₜI, the transition matrix
    Iₘ₋₁ = Diagonal(ones(Mᴬᴿ - 1))
    V0 = Vector(zeros(Mᴬᴿ - 1)) ##vector of zeros
    Bᵇᵒᵗᵗᵒᵐ = hcat(Iₘ₋₁, V0)
    θₜ = θ̃
    BIθₜI = vcat(transpose(θₜ), Bᵇᵒᵗᵗᵒᵐ)
    ## Build VIγI, the transition covariance matrix
    vᵤ = Vector(vcat(1.0, zeros(Mᴬᴿ - 1)))
    VIγI = (1/γ̃)*vᵤ*transpose(vᵤ)
    for i in 2:Mᴬᴿ
        # VIγI[i, i] = tiny ##tiny values on remaining off-diagonals to avoid PosDefException
        VIγI[i, i] = 1e-12*tiny ##tiny values on remaining off-diagonals to avoid PosDefException
    end
    ## --- 
    for t in 2:(N + 3Mᴬᴿ) #.
        ## s̃[t] = vcat(
        ##     rand( ##. noised current value is pushed in at the top of state vector
        ##         rng, 
        ##         Normal(dot(θ̃, s̃[t-1]), γ̃_std)
        ##     ), 
        ##     s̃[t-1][1:end-1] ##. rest shifted down, last element pushed out & lost
        ## ) #.
        s̃[t] = rand(rng, MvNormal(BIθₜI*s̃[t-1], VIγI)) ##. Eq (2b)

        ## y[t] = rand(rng, Normal(s̃[t][1], τ̃_std)) #.
        y[t] = rand(rng, Normal(transpose(vᵤ)*s̃[t], τ̃_std)) ##. Eq (2c)
    end
    
    return s̃[1+3Mᴬᴿ:end], y[1+3Mᴬᴿ:end] #.
  end

generate_sales_data (generic function with 1 method)

## Seed for reproducibility
_seed = 123
_rng  = MersenneTwister(_seed)

## Number of observations in synthetic dataset
_N = 1000

## AR process parameters
_γ̃ = 1.0 ##. real γ (transition precision)
_τ̃ = 0.5 ##. real τ (observation precision)
_θ̃ = _θ⁵ ##. real θ (AR parameters)

_s̃s, _ys = generate_sales_data( #.
    _rng, _N, _θ̃, _γ̃, _τ̃); #.

Let’s plot our synthetic dataset:

plot(
    first.(_s̃s), 
    title="Simulated standardized sales data",
    xlabel=L"$\mathrm{time}, t$",
    color="red",
    label=L"$\mathrm{Hidden\ state\ component}, \tilde{s}_t$",
) #.
scatter!(_ys, label=L"$\mathrm{Observation}, y_t$", color="orange")

Now we specify a probabilistic model, the inference constraints and then run an inference procedure with RxInfer. We need two different models for:

Multivariate AR, with order $M^{A R} > 1$
Univariate AR (reduces to a simple State-Space-Model), with order $M^{A R}$ = 1

@model function lar_model(
x, ##. data/observations 
𝚃ᴬᴿ, ##. Uni/Multi variate 
Mᴬᴿ, ##. AR order
vᵤ, ##. unit vector 
τ) ##. observation precision     
    ## Priors
    γ  ~ Gamma(α = 1.0, β = 1.0) ##. for transition precision    
    if 𝚃ᴬᴿ === Multivariate
        θ  ~ MvNormal(μ = zeros(Mᴬᴿ), Λ = diageye(Mᴬᴿ)) ##.kw μ,Λ only work inside macro
        s₀ ~ MvNormal(μ = zeros(Mᴬᴿ), Λ = diageye(Mᴬᴿ)) ##.kw μ,Λ only work inside macro
    else ## Univariate
        θ  ~ Normal(μ = 0.0, γ = 1.0)
        s₀ ~ Normal(μ = 0.0, γ = 1.0)
    end
    sₜ₋₁ = s₀
    for t in eachindex(x)
        s[t] ~ AR(sₜ₋₁, θ, γ) #.Eq (2b)
        if 𝚃ᴬᴿ === Multivariate
            x[t] ~ Normal(μ = dot(vᵤ, s[t]), γ = τ) #.Eq (2c)
        else
            x[t] ~ Normal(μ = vᵤ*s[t], γ = τ) #.Eq (2c)
        end
        sₜ₋₁ = s[t]
    end
end

@constraints function ar_constraints() 
    q(s₀, s, θ, γ) = q(s₀, s)q(θ)q(γ)
end

ar_constraints (generic function with 1 method)

@meta function ar_meta(𝚃ᴬᴿ, Mᴬᴿ, stype)
    AR() -> ARMeta(𝚃ᴬᴿ, Mᴬᴿ, stype) #.
end

ar_meta (generic function with 1 method)

@initialization function ar_init(𝚃ᴬᴿ, Mᴬᴿ)
    q(γ) = GammaShapeRate(1.0, 1.0)
    if 𝚃ᴬᴿ === Multivariate    
        q(θ) = MvNormalMeanPrecision(zeros(Mᴬᴿ), diageye(Mᴬᴿ)) #.
    else ## Univariate
        q(θ) = NormalMeanPrecision(0.0, 1.0) #.
    end
end

ar_init (generic function with 1 method)

function run_inference(Nmodels)
    results = Vector{Any}(undef, Nmodels)
    for m = 1:Nmodels
        println("m = $m")
        if m > 1 𝚃ᴬᴿ = Multivariate elseif m === 1 𝚃ᴬᴿ = Univariate
        else error("m=$m which is not an allowed value.") end    
        results[m] = infer(
            model = lar_model(
                𝚃ᴬᴿ=𝚃ᴬᴿ, ##. Uni/Multi variate
                Mᴬᴿ=m, ##. AR order
                vᵤ=ReactiveMP.ar_unit(𝚃ᴬᴿ, m), ##. unit vector
                τ=_τ̃), ##. observation precision
            data = (x=_ys,), #.
            constraints   = ar_constraints(),
            meta          = ar_meta(𝚃ᴬᴿ, m, ARsafe()), #.
            options       = (limit_stack_depth = 100,),
            initialization = ar_init(𝚃ᴬᴿ, m),
            returnvars    = (s=KeepLast(), γ=KeepEach(), θ=KeepEach()),
            free_energy   = true,
            iterations    = 100,
            showprogress  = false
        )
    end
    return(results)
end

run_inference (generic function with 1 method)

Now we run the inference for $M^{A R} = 1, 2, . . ., 5$ :

_results = run_inference(5)

m = 1
m = 2
m = 3
m = 4
m = 5

5-element Vector{Any}:
 Inference results:
  Posteriors       | available for (γ, s, θ)
  Free Energy:     | Real[2132.4, 2126.68, 2121.58, 2116.54, 2111.68, 2107.12, 2102.98, 2099.35, 2096.27, 2093.75  …  2086.72, 2086.72, 2086.72, 2086.72, 2086.72, 2086.72, 2086.72, 2086.72, 2086.72, 2086.72]

 Inference results:
  Posteriors       | available for (γ, s, θ)
  Free Energy:     | Real[2156.82, 2148.12, 2140.94, 2132.99, 2124.42, 2115.29, 2106.56, 2098.44, 2091.98, 2086.83  …  2072.96, 2072.96, 2072.96, 2072.96, 2072.96, 2072.96, 2072.96, 2072.96, 2072.96, 2072.96]

 Inference results:
  Posteriors       | available for (γ, s, θ)
  Free Energy:     | Real[2175.76, 2165.66, 2159.78, 2152.86, 2145.01, 2136.08, 2126.37, 2116.5, 2107.17, 2099.68  …  2073.79, 2073.79, 2073.79, 2073.79, 2073.8, 2073.8, 2073.8, 2073.8, 2073.8, 2073.8]

 Inference results:
  Posteriors       | available for (γ, s, θ)
  Free Energy:     | Real[2191.74, 2179.03, 2174.48, 2169.21, 2162.94, 2155.59, 2146.99, 2137.06, 2126.13, 2115.39  …  2071.76, 2071.83, 2071.78, 2071.81, 2071.83, 2071.83, 2072.07, 2072.03, 2072.14, 2072.02]

 Inference results:
  Posteriors       | available for (γ, s, θ)
  Free Energy:     | Real[2206.37, 2189.7, 2186.03, 2181.77, 2176.9, 2171.0, 2164.37, 2156.11, 2146.78, 2136.25  …  2074.45, 2074.17, 2074.21, 2074.14, 2074.2, 2074.19, 2074.22, 2074.29, 2074.15, 2074.09]

function plot_free_energy(results)
    p = plot(title="Bethe Free Energy", legend=:topright, xlabel="iterations")
    for m = 1:length(results)
        p = plot!(p, results[m].free_energy, label="AR$m")
    end
    return p
end

plot_free_energy (generic function with 1 method)

plot_free_energy(_results)

function plot_converged_free_energy(results)
    values = [results[m].free_energy[end] for m in 1:length(results)]
    bar(
        1:length(results), 
        values, 
        legend=:none, 
        ylim=(2070, 2090), title="Converged Free Energy vs AR Order"
    )
    xlabel!("AR order")
    ylabel!("Minimum Free Energy")
end

plot_converged_free_energy (generic function with 1 method)

plot_converged_free_energy(_results)

values = [_results[m].free_energy[end] for m in 1:length(_results)]

5-element Vector{Float64}:
 2086.7159059436744
 2072.962026084404
 2073.7964038953724
 2072.021199658476
 2074.0921058809427

function plot_transition_precision(results)
    p = plot(title="Inferred transition precision", legend=:topright, xlabel="iterations")
    for m = 1:length(results)
        p = plot!(p, mean.(results[m].posteriors[:γ]), label="AR$m")
    end
    plot!(
        p, [_γ̃], seriestype=:hline, color="red", linestyle=:dash, 
        label=L"Real transition precision $\tilde{γ}$"
    )
    return p
end

plot_transition_precision (generic function with 1 method)

plot_transition_precision(_results)

best_order = 4
@unpack s, θ, γ = _results[best_order].posteriors

Dict{Symbol, Vector} with 3 entries:
  :γ => GammaShapeRate{Float64}[GammaShapeRate{Float64}(a=501.0, b=110.664), Ga…
  :s => MvNormalWeightedMeanPrecision{Float64, Vector{Float64}, Matrix{Float64}…
  :θ => MvNormalWeightedMeanPrecision{Float64, Vector{Float64}, Matrix{Float64}…

_p1 = plot(
    first.(_s̃s), ## first.() gives s̃ₜ, not the delayed s̃ₜ₋₁, s̃ₜ₋₂, ..., s̃ₜ₋ₘ₊₁
    title="AR, " * L"$M^{AR}=$" * "$best_order",
    xlabel=L"time $t$",
    color="red",
    label=L"Hidden state component $\tilde{s}_t$")
_p1 = scatter!(_p1, _ys, label=L"Observation $x_t=y_t$", color="orange")
_p1 = plot!(
    _p1, first.(mean.(s)), 
    ribbon=first.(std.(s)), 
    fillalpha=0.2,
    label=L"Inferred state component $s_t$",
    color="green",
    legend=:bottomright)

_p2 = plot(
    mean.(γ), ribbon=std.(γ), 
    fillalpha=0.2,
    label=L"Inferred transition precision $\gamma$", 
    color="green",
    legend=:topright,
    xlabel="iterations",
)
_p2 = plot!(
    [_γ̃], 
    seriestype=:hline, 
    color="red", #.
    label=L"Real transition precision $\tilde{γ}$"
)

_p3 = plot(_results[best_order].free_energy, label="Bethe Free Energy", xlabel="iterations")
plot(_p1, _p2, _p3, layout=@layout([ a; b c ]))

Let’s look at a smaller range of data for the sake of interest:

_subrange = 700 : 750
plot(
    _subrange, 
    first.(_s̃s)[_subrange], 
    title=L"Subrange of: AR, $M^{AR}=$"*"$best_order",
    xlabel=L"time $t$",
    color="red",
    label=L"Hidden state component $\tilde{s}_t$") #.
scatter!(_subrange, _ys[_subrange], label=L"Observation $x_t=y_t$", color="orange")
plot!(
    _subrange, 
    first.(mean.(s))[_subrange],
    ribbon=sqrt.(first.(var.(s)))[_subrange],
    fillalpha=0.2,
    label=L"Inferred state component $s_t$",
    color="green",
    legend=:bottomright)

Finally, we look at how the learned AR coefficients converge:

let
    pθ = plot()
    θms = mean.(θ) ##. θ means
    θvs = var.(θ) ##. θ variances
    l = length(θms)
    edim(e) = (a) -> map(r -> r[e], a)
    for t in 1:length(first(θms)) #.
        tmp = latexstring("\$ \\theta_{$(t),t} \$")
        pθ = plot!(
            pθ, 
            θms |> edim(t),
            ribbon=θvs |> edim(t) .|> sqrt, 
            fillalpha=0.2,
            label="Inferred $tmp")
    end
    for t in 1:length(_θ̃)
        tmp = latexstring("\$ \\theta_{$(t),t} \$")
        pθ = plot!(
            pθ, 
            [ _θ̃[t] ],
            seriestype=:hline, 
            label="Real $tmp")
    end
    plot(
        pθ, 
        title=L"AR coefficients, $M^{AR}=$"*"$best_order",
        xlabel=L"time $t$",
        legend=:outertopright, 
        size=(800, 450)) #.
end