Multi-Agent Collective Intelligence with Active Inference (Part 1)

Back to Portfolio of Projects | LearnableLoopAI.com | Blog |

1 BUSINESS UNDERSTANDING

In this project the client offers an Enterprise Resource Planning (ERP) system that consists of a number of modules. The system runs as a SaaS (Software-as-a-Service). The client’s need is to consider the feasibility of adding a ‘Collective Intelligence’ module to the suite of modules. A subscriber to the ERP offering can potentially identify a number of entities, for example, business units to operate as agents which may learn from each other. One scenario regards learning from the tensions inherent in the budgeting process. A business unit often asks for the maximum allocation possible even if this amount is not strictly necessary. If budgeting agents could learn from each other and a globally more optimal partitioning of funds could emerge, this could lead to a situation where the collective intelligence of the budgeting agents exceeds the sum of the intelligences of the individual agents. Less monetary resources would be wasted and an agent will only ask for money needed in the context of the needs of the company as a whole.

To get initial traction for this problem, it was decided to focus on the paper:

Kaufmann, R.; Gupta, P.; Taylor, J. An Active Inference Model of Collective Intelligence. Entropy 2021, 23, 830. https://doi.org/10.3390/e23070830

In the current part of the project, the contents of the paper as well as the Python code associated with the paper was adapted to make it clearer. Some of the changes were:

Changing the mathematical symbols
Adding some visualizations
Rearranging the contents to fit into the (modified) CIRSP-DM project standard

If in any doubt, please consult the original paper and code.

Some of the graphics from the paper were included here.

In future parts of this project, an attempt will be made to migrate the Python code to Julia code, making use of the RxInfer package for inference.

## import pdb
## pdb.set_trace()
from collections import defaultdict
import numpy as np
import pandas as pd
from datetime import datetime
from matplotlib import pyplot as plt
pd.options.display.float_format = '{:20,.4f}'.format
import seaborn as sns
sns.set_style("whitegrid")
! python --version

Python 3.10.12

2 DATA UNDERSTANDING

There is no pre-existing data to be analyzed.

3 DATA PREPARATION

There is no pre-existing data to be prepared.

4 MODELING

4.1 Narrative

Please review the narrative in section 1.

4.2 Core Elements

This section attempts to answer three important questions:

What metrics are we going to track?
What decisions do we intend to make?
What are the sources of uncertainty?

For this problem, each agent will track:

its own position $s^{Own}$
its partner’s position $s^{Oth}$

Decisions will be in the form of agent-prescribed move actions.

The sources of uncertainty relating to the environment will be

the noise associated with the strength of the chemical deposit in each cell
the noise associated with the location of the food
an observation (measurement noise).

4.3 System-Under-Steer / Environment / Generative Process

The system-under-steer/environment/generative process is the body of an agent which finds inside a one-dimensional circular environment with 60 positions which can wrap around (period $N = 60$ ).

4.3.1 State variables

The state at time $t$ of the system-under-steer (sustr), also referred to as the environment (envir), or the generative process (genpr) will be given by:

${\tilde{s}}_{t} = ({\tilde{s}}_{t})$ where

${\tilde{s}}_{t} \in [0 . . . N - 1]$ is the position of the agent in a circular environment with period $N$ . This variable is hidden and only indirectly available by means of an observation

4.3.2 Decision variables

Decisions are in the form of actions: $a_{t} = (a^{Own}, a^{Oth})$

where

$a^{Own} \in {- 1, 0, 1}$ is the agent’s own action at time $t$
$a^{Oth} \in {- 1, 0, 1}$ is the other’s (partner’s) action at time $t$

4.3.3 Exogenous information variables

The FOOD_POSITION is exogenous.

4.3.4 Transition and Observation functions

The transition function captures the dynamics of the environment/system-under-steer/generative process:

${\tilde{s}}_{t} = g ({\tilde{s}}_{t - 1}, a_{t}) = {\tilde{s}}_{t - 1}^{Own} + a_{t}^{Own}$

The observation function can be represented by: $y_{t} = (y^{Own}, Δ, a^{pp})$

where

$y^{Own} \in {0, 1}$ is the one-bit sensory input from the environment at time $t$
$Δ \in [0 . . . N - 1]$ is the perceived difference between the agent’s own position and its partner’s at time $t$
$a^{pp}$ is the partner’s last action at time $t$

4.3.5 Objective function

The objective function is such that the free energy is minimized. This aspect will eventually be handled by the RxInfer Julia package.

4.5 Agent / Generative Model

4.5.1 State variables

According to the agent the state of the system-under-steer/environment/generative process will be $s_{t}$ , rather than $s ̃_{t}$ , which will then be given by

$s_{t} = (s^{Own}, s_{+}^{Own}, s^{Oth}, s_{+}^{Oth})$ This is known as the internal of “belief” state, where

$s^{Own}$ is the actual belief about its own position
$s_{+}^{Own}$ is the desired belief about its own position
$s^{Oth}$ is the actual belief about its partner’s position
$s_{+}^{Oth}$ is the desired belief about its partner’s position

4.5.2 Decision variables

According to the agent the action on the environment at time $t$ will be represented by $u_{t}$ , also known as the control state of the agent. In this part of the project no active or control states will be inferred.

4.5.3 Implementation

## Declarations

## Environmental Constants
ENV_SIZE = 60                   ## note: pick a number divisible by 6
SHORTEST_PATH = int(ENV_SIZE/3) ## distance of shared target from agent's initial position
TARGET_DELTA = int(ENV_SIZE/3)  ## distance of unshared target from shared target

## Agent Constants
ACTIONS = (0, -1, 1)                     ## possible space of actions (stay, left, right)
ACTIONS_SQUARED = ((0,0),(0,-1),(0,1),   ## possible combined action space for 2 agents
                   (-1,0),(-1,-1),(-1,1),
                   (1,0),(1,-1),(1,1))
MAX_SENSE_PROBABILITY = [0.99, 0.05]     ## (strong, weak) agents
SENSE_DECAY_RATE = np.log(4)/ENV_SIZE    ## omega (ensures p=0.5 at ENV/2)

## Simulation parameters
EPOCHS = 200        ## Number of epochs (simulation steps)
N_STEPS = 50        ## Number of gradient descent steps made to update beliefs in each epoch
LEARNING_RATE = 0.7 ## Stochastic gradient descent learning rate

## Experimental parameters 
## ALTERITY parameter for level of ToM (0, 1)
## ALIGNMENT parameter for goal alignment (0, 1)
CONDITIONS = {1: { 'model': 1, 'tom': [0, 0], 'gal': 0},
              2: { 'model': 2, 'tom': [0, 0.5], 'gal': 0},
              3: { 'model': 3, 'tom': [0, 0], 'gal': 1},
              4: { 'model': 4, 'tom': [0, 0.5], 'gal': 1}} #.

## Helper Functions

## Probability of occupying specific position as encoded in the internal state
def model_encoding(s): #.
  return softmax(s) #.

## Derivative of the model encoding for free energy gradient calculation
def model_encoding_derivative(s): #.
  return D_softmax(model_encoding(s)) #.

## P(stil | s)
## Agent's belief about the external states (i.e. its current position in the 
## world) or intention (i.e. desired position in the world) as encoded in the
## internal state.
def variational_density(s): #.
  return model_encoding(s) #.

def logdiff(p1, p2):
  return (np.log(p1) - np.log(p2))

def KLv(p1, p2):
  return np.multiply(p1, logdiff(p1, p2))

## Kullback-Leibler divergence between densities p1 and p2. #.
def KL(p1, p2):
  return np.sum(KLv(p1, p2))

## Softmax function. The shift by s.max() is for numerical stability #.
def softmax(s): #.
  sum = np.sum(np.exp(s - s.max())) #.
  return np.exp(s - s.max())/sum #.

## Gradient of softmax function
def D_softmax(q):
  return np.diag(q) - np.outer(q, q)

def rerange(q, a):
  return a*q + (1-a)/ENV_SIZE

def dynamic_rerange(q):
  S_MIN = -10 #.
  s = np.log(q) #.
  s -= np.max(s) #.
  # s_hat = np.maximum(s, S_MIN) #.
  shat = np.maximum(s, S_MIN) #.
  # return softmax(s_hat) #.
  return softmax(shat) #.

## probability of partner action
def p_action(q):
  p = []
  qhat = 0.9*q/np.max(q) #.
  p.append(qhat) ## a_p = 0 #.
  A = (1-qhat)/(np.roll(q, 1) + np.roll(q, -1)) #.
  p.append(np.roll(q, -1)*A) ## a_p = -1
  p.append(np.roll(q, +1)*A) ## a_p = +1
  return np.array(p)

## Agent Class: Core Mechanics of AIF with ToM and Alignment
class Agent():
  def __init__(self, stil, sPlus, max_sense_prob, tom, gal): #.
    ## intializations
    self.stil = stil #.
    self.alterity = tom ## alterity/'otherness'
    env = np.array(range(ENV_SIZE))
    self.sensory_dynamics = max_sense_prob*np.exp(-SENSE_DECAY_RATE* #.
      np.minimum(np.abs(env - int(ENV_SIZE/2)), np.abs(env - int(ENV_SIZE/2) - ENV_SIZE)))
    self.s = (np.zeros(ENV_SIZE), np.zeros(ENV_SIZE)) #.
    self.sPlus = (sPlus[0] + (1 - gal)*sPlus[1], sPlus[0] + (1 - gal)*sPlus[2]) #.
    self.q = (variational_density(self.s[0]), variational_density(self.s[1])) #.
    self.qPlus = (variational_density(self.sPlus[0]), variational_density(self.sPlus[1])) #.
    self.p_ap = p_action(self.qPlus[1]) #.
    self.a = (0, 0)
    self.a_pp = 0           
    self.delta = 0
    self.y = None #.
    
    ## logs
    self.stil_trace = [stil] #.
    self.stilOth_trace = [] #.
    self.sOwn_trace = [self.sensory_dynamics] #.
    self.sOth_trace = [self.sensory_dynamics] #.
    self.y_trace = [] #.
    self.a_trace = []

  def free_energy_own(self, a, pOwn, pOth):
    pOth_reranged = rerange(pOth, self.alterity) #.
    pOth_shifted = np.roll(pOth_reranged, self.delta + a[0] - a[1]) #.
    return KL(self.qPlus[0], dynamic_rerange(pOwn*pOth_shifted)) #.

  def free_energy_oth(self, a, pOwn, pOth): #.
    pOwn_reranged = rerange(pOwn, self.alterity**2) #.
    pOwn_shifted = np.roll(pOwn_reranged, - (self.delta + a[0] - a[1])) #.
    return KL(self.qPlus[1], dynamic_rerange(pOth*pOwn_shifted)) #.

  ## Partial derivatives of the free energy with respect to belief.
  ## FE = KL(qOwn || pOwn*qOth[-delta]) + KL(qOth || pOth*qOwn[+delta]) #.
  def fe_gradient(self, sPrime, pOwn, pOth, deltaPrime): #.
    qOwn = variational_density(sPrime[0]) #.
    qOth = variational_density(sPrime[1]) #.
    pOwn_reranged = rerange(pOwn, self.alterity**2) #.
    pOth_reranged = rerange(pOth, self.alterity) #.
    pOwn_shifted = np.roll(pOwn_reranged, -deltaPrime) #.
    pOth_shifted = np.roll(pOth_reranged, deltaPrime) #.

    DqOwn = D_softmax(qOwn) #.
    DqOth = D_softmax(qOth) #.

    v = (
      1 + logdiff(qOwn, dynamic_rerange(pOwn*pOth_shifted)), #.
      1 + logdiff(qOth, dynamic_rerange(pOth*pOwn_shifted))) #.
    return np.array([np.dot(DqOwn, v[0]), np.dot(DqOth, v[1])]) #.

  def generative_density_own(self, a=(0,0)):
    qOwn = self.q[0] #.
    sd = self.sensory_dynamics if self.y == 1 else 1 - self.sensory_dynamics #.
    return np.roll(sd*qOwn, a[0]) #.
  
  def generative_density_oth(self, a=(0,0)): #.
    qOwn = self.q[0]
    qOth = self.q[1] #.

    pDelta = np.roll(qOwn, -self.delta) #.
    p_app = np.roll(self.p_ap[ACTIONS.index(self.a_pp)], -self.a_pp)
    p_j_prior = qOth #.
    p_j_posterior = pDelta*p_app*p_j_prior #.
    return np.roll(p_j_posterior, a[1])

  def step(self):
    ## Roll the dice on measuring sensory state y \in {0, 1} #.
    y = int(np.random.random() < self.sensory_dynamics[self.stil]) #.
    self.y = y #.

    ## Pick action state, a \in {-1, 0, 1}^2 (pair of own and partner actions)
    ## Calculate the free energy given my 
    ##  target (intent) distribution, current state distribution, & sensory input 
    ## Do this for all (three) actions and select the action with minimum free energy.
    fes = []
    epsilon = 0.999 + np.random.random(3)*0.002 ## perturb values to randomize the action chosen if equal FE
    for a in ACTIONS_SQUARED:
      pOwn = self.generative_density_own(a) #.
      pOth = self.generative_density_oth(a) #.
      fes.append([self.free_energy_own(a, pOwn, pOth)*epsilon[a[0]+1], #.
                  # self.free_energy_oth(a, p_own, p_oth)]) #.
                  self.free_energy_oth(a, pOwn, pOth)]) #.
    fes_t = np.transpose(fes)
    actions_index = [np.argmin(fes_t[0]), np.argmin(fes_t[1])]
    a = (ACTIONS_SQUARED[actions_index[0]][0], ACTIONS_SQUARED[actions_index[1]][1])
    self.a = a
    deltaPrime = self.delta + a[0] - a[1] #.

    ## Update actual (own) position by taking action, partner action never gets realized
    stil = (self.stil + a[0]) % ENV_SIZE #.
    self.stil = stil #.

    pOwn = self.generative_density_own() #.
    pOth = self.generative_density_oth() #.
    sPrime = np.array([np.roll(self.s[0], a[0]), np.roll(self.s[1], a[1])]) #.
    
    ## Minimise free energy
    for step in range(N_STEPS):
      sPrime -= LEARNING_RATE*self.fe_gradient(sPrime, np.roll(pOwn, a[0]), np.roll(pOth, a[1]), deltaPrime) #.

    ## Save position, sensory output, and internal state for plotting
    self.s = sPrime #.
    self.q = (variational_density(sPrime[0]), variational_density(sPrime[1])) #.
    self.y_trace.append(y) #.
    self.a_trace.append(a[0])
    self.stil_trace.append(stil) #.
    self.stilOth_trace.append(stil - self.delta) #.
    self.sOwn_trace.append(model_encoding(sPrime[0])) #.
    self.sOth_trace.append(model_encoding(sPrime[1])) #.

  ## Plot agent's internal state and position + agent's beliefs about partner's position #.
  def plot_traces(self, i): #.
    ## Plot own belief trace
    fig1 = plt.figure(figsize=(15, 4))
    ax = fig1.gca()
    im = ax.imshow(np.transpose(self.sOwn_trace), #.
              interpolation="nearest", 
              aspect = "auto", 
              vmin = 0, vmax = 1, 
              cmap = "viridis")
    c = np.asarray(['white' if y==1 else 'grey' for y in self.y_trace]) #.
    stil = np.asarray(self.stil_trace) #.
    epochs = np.arange(EPOCHS + 1)
    self.a_trace.append(0)
    a = np.asarray(self.a_trace)
    idx = a<0
    ax.scatter(epochs[idx], stil[idx], c=c[idx], marker='v') #.
    idx = a>0
    ax.scatter(epochs[idx], stil[idx], c=c[idx], marker='^') #.
    idx = a==0
    ax.scatter(epochs[idx], stil[idx], c=c[idx], marker='o') #.
    ax.invert_yaxis()
    ax.set_xlim([0, EPOCHS + 1])
    fig1.colorbar(im)
    ax.set_title(f"Agent{i}: own belief trace") #.
    ax.set_xlabel(f"steps") #.
    ax.set_ylabel(f"position") #.

    ## Then plot oth belief trace #.
    fig2 = plt.figure(figsize=(15, 4))
    ax = fig2.gca()
    im = ax.imshow(np.transpose(self.sOth_trace), #.
              interpolation="nearest", 
              aspect = "auto", 
              vmin = 0, vmax = 1, 
              cmap = "viridis")   
    ax.invert_yaxis()
    ax.set_xlim([0, EPOCHS + 1])
    fig2.colorbar(im)
    ax.set_title(f"Agent{i}: oth belief trace") #.
    ax.set_xlabel(f"steps") #.
    ax.set_ylabel(f"position") #.

    return fig1, fig2

  ## Calculate absolute distance from target position #.
  def log_convergence(self, targets):
    stil = np.array(self.stil_trace) #.
    c0 = np.minimum(np.abs(stil - targets[0]), #.
                    np.minimum(np.abs(stil - targets[0] - ENV_SIZE),
                               np.abs(stil - targets[0] + ENV_SIZE)))

    if len(targets) == 1:
      return 'shared target', c0
    
    c1 = np.minimum(np.abs(stil - targets[1]), 
                    np.minimum(np.abs(stil - targets[1] - ENV_SIZE), 
                               np.abs(stil - targets[1] + ENV_SIZE)))

    ## return the convergent distance from closest target
    if c0[-1] <= c1[-1]:
      return 'shared target', c0
    else:
      return 'unshared target', c1

## Functions for capturing run data

def targetData(dft1, dft2):
  ## collate target data and calculate % time agents pursue primary/secondary target
  dft1.columns = ['Target']
  dft1['Agent'] = 'strong'
  dft2.columns = ['Target']
  dft2['Agent'] = 'weak'
  return pd.concat([dft1, dft2], ignore_index=True)

def convergenceData(dfc1, dfc2):
  ## collate and plot convergence to target
  dfc1 = dfc1.melt()
  dfc1.columns = ["Time", "Distance from Target"]
  dfc1['Agent'] = 'strong'

  dfc2 = dfc2.melt()
  dfc2.columns = ["Time", "Distance from Target"]
  dfc2['Agent'] = 'weak'
  return pd.concat([dfc1, dfc2], ignore_index=True)

def beliefData(dfb1, dfb2):
  ## collate and plot final belief distribution (primary target fixed at the middle of the env.)
  dfb1 = dfb1.melt()
  dfb1.columns = ["Relative Location", "Belief"]
  dfb1['Agent'] = 'strong'

  dfb2 = dfb2.melt()
  dfb2.columns = ["Relative Location", "Belief"]
  dfb2['Agent'] = 'weak'
  return pd.concat([dfb1, dfb2], ignore_index=True)

def systemFreeEnergyData(q_empirical):
  ## Collate and plot FE based on distribution across all runs
  # global_p = initialize_s_star([0], 30) #.
  global_p = initialize_sPlus([0], 30) #.
  global_p = global_p/np.sum(global_p)
  fe = np.zeros(EPOCHS)
  for t in range(EPOCHS):
    q = (q_empirical[t] + 0.01)/(np.sum(q_empirical[t]) + 0.01*ENV_SIZE)
    fe[t] = KL(q, global_p)
  return pd.DataFrame(fe, columns=['FE'])

## Simulation Functions

## Simulate the agents for EPOCHS steps
def simulate_agents(agents): #.
  for epoch in range(EPOCHS):
    ## Update each agent's perceived delta with their actual delta
    delta = [0, 0]
    delta[0] = agents[0].stil - agents[1].stil #.
    delta[1] = -delta[0]
    agents[0].delta = delta[0] #.
    agents[1].delta = delta[1] #.

    ## Update each agent's perceived partner action with their partner's previous action
    agents[0].a_pp = agents[1].a[0] #.
    agents[1].a_pp = agents[0].a[0] #.

    for i in range(2):
      agents[i].step() #.

  ## For plotting add target position as 
  ## probability distribution at the end of the belief trace
  for i in range(2):
    tgt = model_encoding(agents[i].sPlus[0]) #.
    agents[i].sOwn_trace.append(tgt/tgt.max()) #.
    agents[i].y_trace.append(int(np.random.random() < agents[i].sensory_dynamics[agents[i].stil])) #.
  return agents #.

def initialize_s_star(targets, sharpness=6): #.
  sPlus = [] #.
  S_STANDARD = np.array([np.exp(-((i-ENV_SIZE/2)/(ENV_SIZE/sharpness))**2) for i in range(ENV_SIZE)]) #.
  for target in targets:
    sPlus.append(np.roll(S_STANDARD, target-ENV_SIZE // 2)) #.
  return sPlus #.

def singleRun(shared_target, max_sense_prob, tom, gal, plot=False): #.
  ## initialize agents
  initial_positions = [(shared_target + SHORTEST_PATH)%ENV_SIZE,
                       (shared_target - SHORTEST_PATH)%ENV_SIZE]

  target_0 = (shared_target - TARGET_DELTA)%ENV_SIZE
  target_1 = (shared_target + TARGET_DELTA)%ENV_SIZE
  
  sPlus_0 = initialize_s_star([shared_target, target_0, target_1]) #.
  sPlus_1 = initialize_s_star([shared_target, target_1, target_0]) #.

  ## Run simulation
  agents = (Agent(initial_positions[0], sPlus_0, #.
                  max_sense_prob[0], tom[0], gal), #.
            Agent(initial_positions[1], sPlus_1, #.
                  max_sense_prob[1], tom[1], gal)) #.
  agents = simulate_agents(agents)

  ## Plot output
  if plot:
    for i in range(2):
      agents[i].plot_traces(i) #.
    
  tgt_0, log_0 = agents[0].log_convergence([shared_target, target_0])
  tgt_1, log_1 = agents[1].log_convergence([shared_target, target_1])

  s_end_0 = np.roll(agents[0].sOwn_trace[-2], (int(ENV_SIZE/2) - shared_target) % ENV_SIZE) #.
  s_end_1 = np.roll(agents[1].sOwn_trace[-2], (int(ENV_SIZE/2) - shared_target) % ENV_SIZE) #.

  return agents, tgt_0, log_0, s_end_0, agents[0].stil_trace, tgt_1, log_1, s_end_1, agents[1].stil_trace #.

def simulateRuns(model=1, no_of_cycles=1):
  alterity = CONDITIONS[model]['tom']
  alignment = CONDITIONS[model]['gal'] #.

  ## multiple runs
  dft1 = pd.DataFrame()
  dft2 = pd.DataFrame()
  dfc1 = pd.DataFrame()
  dfc2 = pd.DataFrame()
  dfb1 = pd.DataFrame()
  dfb2 = pd.DataFrame()
  q_empirical = np.zeros([EPOCHS, ENV_SIZE])

  print('Running model: ' + str(model))
  print(' > Env Size: ' + str(ENV_SIZE))
  print(' > Perceptiveness: ' + str(MAX_SENSE_PROBABILITY))
  print(' > ToM: ' + str(alterity))
  print(' > Alignment: ' + str(alignment))

  print('Starting Simulation Runs: ' + str(no_of_cycles * ENV_SIZE))
  print(" > Start Time =", datetime.now().strftime("%H:%M:%S"))

  for i in range(no_of_cycles):
    for shared_target in range(ENV_SIZE):
      if (shared_target % 10 == 0): 
        print("   run: " + str(i * ENV_SIZE + shared_target)) 
      
      t1, c1, s1, stil1, t2, c2, s2, stil2 = singleRun(shared_target, MAX_SENSE_PROBABILITY, alterity, alignment) #.
      
      dft1 = pd.concat([dft1, pd.Series(t1)], axis=1, ignore_index=True) #.
      dft2 = pd.concat([dft2, pd.Series(t2)], axis=1, ignore_index=True) #.
      dfc1 = pd.concat([dfc1, pd.Series(c1)], axis=1, ignore_index=True) #.
      dfc2 = pd.concat([dfc2, pd.Series(c2)], axis=1, ignore_index=True) #.
      dfb1 = pd.concat([dfb1, pd.Series(s1).T], axis=1, ignore_index=True) #.
      dfb2 = pd.concat([dfb2, pd.Series(s2).T], axis=1, ignore_index=True) #.
      for t in range(EPOCHS):  ## ?? decribe metric
        q_empirical[t, stil1[t] - shared_target] += 1 #.
        q_empirical[t, stil2[t] - shared_target] += 1 #.

  print(" > End Time =", datetime.now().strftime("%H:%M:%S"))
  print('Simulation Complete for Model: ' + str(model))

  ## generate results data
  t_composite = targetData(dft1, dft2)
  c_composite = convergenceData(dfc1, dfc2)
  b_composite = beliefData(dfb1, dfb2)
  fedf = systemFreeEnergyData(q_empirical)

  return model, t_composite, c_composite, b_composite, fedf

4.6 Agent Policy Evaluation

We do not have the ability to create policies based on inference yet. Nor do we make provision for simulation of collective bahavior yet. However, in this initial part of the project we will do a limited simulation consisting of a single run (making use of MODEL = 4) that will simulate the behavior of a single pair of agents.

## OVERIDES #.
## Declarations

## Environmental Constants
ENV_SIZE = 60                            ## note: pick a number divisible by 6
SHORTEST_PATH = int(ENV_SIZE / 3)        ## distance of shared target from agent's initial position
TARGET_DELTA = int(ENV_SIZE / 3)         ## distance unshared target from shared target

## Agent Constants
ACTIONS = (0, -1, 1)                     ## possible space of actions (stay, left, right)
ACTIONS_SQUARED = ((0,0),(0,-1),(0,1),   ## possible combined action space for 2 agents
                   (-1,0),(-1,-1),(-1,1),
                   (1,0),(1,-1),(1,1))
MAX_SENSE_PROBABILITY = [0.99, 0.05]     ## (strong, weak) agents
SENSE_DECAY_RATE = np.log(4)/ENV_SIZE    ## omega (ensures p=0.5 at ENV/2)

## Simulation parameters
EPOCHS = 200                 ## Number of epochs (simulation steps).
N_STEPS = 50                 ## Number of gradient descent steps made to update beliefs in each epoch.
LEARNING_RATE = 0.7          ## Stochastic gradient descent learning rate

## Experimental parameters 
## ALTERITY parameter for level of ToM (0, 1)
## ALIGNMENT parameter for goal alignment (0, 1)
CONDITIONS = {1: { 'model': 1, 'tom': [0, 0], 'gal': 0},
              2: { 'model': 2, 'tom': [0, 0.5], 'gal': 0},
              3: { 'model': 3, 'tom': [0, 0], 'gal': 1},
              4: { 'model': 4, 'tom': [0, 0.5], 'gal': 1}} #.

## single run
## MODEL = 1   ## specify model you want to run #.
## MODEL = 2   ## specify model you want to run #.
## MODEL = 3   ## specify model you want to run #.
MODEL = 4   ## specify model you want to run
FOOD_POSITION = 15
print(FOOD_POSITION)
agents, t1, c1, s1, stil1, t2, c2, s2, stil2 = singleRun( #.
    FOOD_POSITION, ## shared_target #.
    MAX_SENSE_PROBABILITY, ## max_sense_prob
    CONDITIONS[MODEL]['tom'], ## theory-of-mind/alterity
    CONDITIONS[MODEL]['gal'], ## goal alignment #.
    plot=True)

## collect some traces for the strong agent in a dataframe
df_strong = pd.DataFrame({
    'a': agents[0].a_trace,
    'stilOwn': agents[0].stil_trace,
    'stilOth': agents[0].stilOth_trace + [0],
    'y': agents[0].y_trace,
})
df_strong

	a	stilOwn	stilOth	y
0	-1	35	54	1
1	-1	34	55	1
2	-1	33	56	1
3	1	32	59	0
4	-1	33	58	1
...	...	...	...	...
196	0	14	18	1
197	0	14	19	1
198	0	14	18	1
199	1	14	18	0
200	0	15	0	1

201 rows × 4 columns

## collect some traces for the weak agent in a dataframe
df_weak = pd.DataFrame({
    'a': agents[1].a_trace,
    'stilOwn': agents[1].stil_trace,
    'stilOth': agents[1].stilOth_trace + [0],
    'y': agents[1].y_trace,
})
df_weak

	a	stilOwn	stilOth	y
0	1	55	36	0
1	1	56	35	0
2	1	57	34	0
3	1	58	33	0
4	1	59	-26	0
...	...	...	...	...
196	1	18	15	0
197	-1	19	13	0
198	-1	18	13	0
199	1	17	15	0
200	0	18	0	0

201 rows × 4 columns

def plot_traces(df, df_non, pars=defaultdict(str)):
    n_charts = 4
    ylabelsize = 14
    fig, axs = plt.subplots(n_charts, sharex=True)
    fig.set_figwidth(13); fig.set_figheight(9)
    fig.suptitle(pars['suptitle'], fontsize=14)

    xi = 0
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df['a'], 'm-', where='post')
    if not df_non is None: axs[xi].step(df_non['a'], 'c-.', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    axs[xi].set_ylabel('$a_t$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

    xi = 1
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df['stilOwn'], 'm-', where='post')
    if not df_non is None: axs[xi].step(df_non['stilOwn'], 'c-.', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    axs[xi].set_ylabel('$\\tilde{s}^{Own}_t$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

    xi = 2
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df['stilOth'], 'm-', where='post')
    if not df_non is None: axs[xi].step(df_non['stilOth'], 'c-.', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    axs[xi].set_ylabel('$\\tilde{s}^{Oth}_t$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

    xi = 3
    axs[xi].set_ylim(auto=True); axs[xi].spines['top'].set_visible(False); axs[xi].spines['right'].set_visible(True); axs[xi].spines['bottom'].set_visible(False)
    axs[xi].step(df['y'], 'm-', where='post')
    if not df_non is None: axs[xi].step(df_non['y'], 'c-.', where='post')
    axs[xi].axhline(y=0, color='k', linestyle=':')
    axs[xi].set_ylabel('$y_t$', rotation=0, ha='right', va='center', fontweight='bold', size=ylabelsize);

plot_traces(
  df=df_strong,
  df_non=df_weak,
  pars=defaultdict(str, {
    'suptitle': f'Simulation of a pair of agents'+'\n'+f'strong agent (magenta), weak agent (cyan)'
  }),
)