Energy Storage using the Powell Unified Framework (Part 3)
Using Sequential Decision Analytics to find ongoing optimal decisions
Energy Storage
Powell Unified Framework (PUF)
Reinforcement Learning (RL)
Author
Kobus Esterhuysen
Published
September 16, 2022
In Part 2 we used 1-hot encoding for the decision variable. We also optimized a Bellman policy - one in the Value Function Approximation (VFA) class. This policy was be based on Bellman’s exact dynamic programming approach making use of Backward Dynamic Programming (BDP).
In Part 3 (and Part 4) we change the 1-hot encoding of the decision variable to something more scalable. We will have a single decision variable (\(x_{t,GB}\)) where a positive value indicates the amount bought, a negative value the amount sold, and a zero value when neither happened. The \(GB\) stands for ‘Grid-to-Battery’. When the flow is from grid to battery, \(x_{t,GB}\) will have a positive value. When the flow is reversed, i.e. battery to grid, \(x_{t,GB}\) will have a negative value. In the future, the number of decision variables will be increased as needed. Eventually we will have the decision vector:
In Part 3 we explore again the buy-low-sell-high policy.
0 STRUCTURE & FRAMEWORK
The overall structure of this project and report follows the traditional CRISP-DM format. However, instead of the CRISP-DM’S “4 Modeling” section, we inserted the “6 step modeling process” of Dr. Warren Powell in section 4 of this document.
The example explored (but also modified) in this report comes from Dr. Warren Powell (formerly at Princeton). It was chosen as a template to create a POC for a client. It was important to understand this example thoroughly before embarking on creating the POC. Dr Powell’s unified framework shows great promise for unifying the formalisms of at least a dozen different fields. Using his framework enables easier access to thinking patterns in these other fields that might be beneficial and informative to the sequential decision problem at hand. Traditionally, this kind of problem would be approached from the reinforcement learning perspective. However, using Dr. Powell’s wider and more comprehensive perspective almost certainly provides additional value.
The original code for this example can be found here.
In order to make a strong mapping between the code in this notebook and the mathematics in the unified framework, many variable names in the original code have been adapted. Here is a summary of the mapping between the mathematics and the variable names in the code:
Superscripts
variable names have a double underscore to indicate a superscript
\(X^{\pi}\): has code X__pi, is read X pi
Subscripts
variable names have a single underscore to indicate a subscript
\(S_t\): has code S_t, is read ‘S at t’
\(M^{Spend}_t\) has code M__Spend_t which is read: “MSpend at t”
Arguments
collection variable names may have argument information added
\(X^{\pi}(S_t)\): has code X__piIS_tI, is read ‘X pi in S at t’
the surrounding I’s are used to imitate the parentheses around the argument
Next time/iteration
variable names that indicate one step in the future are quite common
\(R_{t+1}\): has code R_tt1, is read ‘R at t+1’
\(R^{n+1}\): has code R__nt1, is read ‘R at n+1’
Rewards
State-independent terminal reward and cumulative reward
\(F\): has code F for terminal reward
\(\sum_{n}F\): has code cumF for cumulative reward
State-dependent terminal reward and cumulative reward
\(C\): has code C for terminal reward
\(\sum_{t}C\): has code cumC for cumulative reward
Vectors where components use different names
\(S_t(R_t, p_t)\): has code S_t.R_t and S_t.p_t, is read ‘S at t in R at t, and, S at t in p at t’
the code implementation is by means of a named tuple
self.State = namedtuple('State', SNames) for the ‘class’ of the vector
self.S_t for the ‘instance’ of the vector
Vectors where components reuse names
\(x_t(x_{t,GB}, x_{t,BL})\): has code x_t.x_t_GB and x_t.x_t_BL, is read ‘x at t in x at t for GB, and, x at t in x at t for BL’
the code implementation is by means of a named tuple
self.Decision = namedtuple('Decision', xNames) for the ‘class’ of the vector
self.x_t for the ‘instance’ of the vector
Use of mixed-case variable names
to reduce confusion, sometimes the use of mixed-case variable names are preferred (even though it is not a best practice in the Python community), reserving the use of underscores and double underscores for math-related variables
New Jersey is looking to develop 3,500 megawatts (MW) of offshore wind generating power. A challenge is that wind (and especially offshore wind) can be highly variable, due in part to the property that wind power (over intermediate ranges) increases with the cube of wind speed. This variability is depicted in figure 9.1, developed from a study of offshore wind. Energy from wind power has become popular in regions where wind is high, such as the midwestern United States, coastal regions off of Europe, the northeast of Brazil, and northern regions of China (to name just a few). Sometimes communities (and companies) have invested in renewables (wind or solar) to help reduce their carbon footprint and minimize their dependence on the grid. It is quite rare, however, that these projects allow a community to eliminate the grid from their portfolio. Common practice is to let the renewable source (wind or solar) sell directly to the grid, while a company may purchase from the grid. This can be useful as a hedge since the company will make a lot of money during price spikes (prices may jump from $20 per megawatt-hour (MWh) to $300 per MWh or more) that offsets the cost of purchasing power during those periods.
A challenge with renewables is handling the variability. While one solution is to simply pump any energy from a renewable source into the grid and use the tremendous power of the grid to handle this variability, there has been considerable interest in using storage (in particular, battery storage) to smooth out the peaks and valleys. In addition to handling the variability in the renewable source, there has also been interest in using batteries to take advantage of price spikes, buying power when it is cheap (prices can even go negative) and selling it back when they are high. Exploiting the variability in power prices on the grid is known as battery arbitrage.
This problem will provide insights into virtually any inventory/storage problem, including - Holding cash in mutual funds A bank has to determine how much of its investment capital to hold in cash to meet requests for redemptions, versus investing the money in loans, stocks and bonds. - Retailers (including both online as well as bricksandmortar stores) have to manage inventories of the hundreds of thousands of products. - Auto dealers have to decide how many cars to hold to meet customer demand. - Consulting firms have to decide how many employees to keep on staff to meet the variable demand of different consulting projects.
Energy storage is a particularly rich form of inventory problem.
2 DATA UNDERSTANDING
Next, we will start to look at the data that is provided for this problem.
# import pdbfrom collections import namedtuple, defaultdictimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom copy import copyimport timefrom scipy.ndimage.interpolation import shiftimport picklefrom bisect import bisectimport mathfrom pprint import pprintimport matplotlib as mplpd.options.display.float_format ='{:,.4f}'.formatpd.set_option('display.max_columns', None)pd.set_option('display.max_rows', None)pd.set_option('display.max_colwidth', None)! python --version# 3.8.10; found it changed on 3/10/23
Python 3.10.11
DeprecationWarning: Please use `shift` from the `scipy.ndimage` namespace, the `scipy.ndimage.interpolation` namespace is deprecated.
from scipy.ndimage.interpolation import shift
def process_raw_price_data(file, params): DISC_TYPE ="FROM_CUM"#DISC_TYPE = "OTHER"print("Processing raw price data. Constructing price change list and cdf using {}".format(DISC_TYPE)) tS = time.time()# load energy price data from the Excel spreadsheet raw_data = pd.read_excel(file, sheet_name="Raw Data")# look at data spanning a week data_selection = raw_data.iloc[0:params['T'], 0:5]# rename columns to remove spaces (otherwise we can't access them) cols = data_selection.columns cols = cols.map(lambda x: x.replace(' ', '_') ifisinstance(x, str) else x) data_selection.columns = cols# sort prices in ascending order sort_by_price = data_selection.sort_values('PJM_RT_LMP')#print(sort_by_price.head()) hist_price = np.array(data_selection['PJM_RT_LMP'].tolist())#print(hist_price[0]) max_price = pd.DataFrame.max(sort_by_price['PJM_RT_LMP']) min_price = pd.DataFrame.min(sort_by_price['PJM_RT_LMP'])print("Min price {:.2f} and Max price {:.2f}".format(min_price,max_price))# sort prices in ascending order sort_by_price = data_selection.sort_values('PJM_RT_LMP')# calculate change in price and sort values of change in price in ascending order data_selection['Price_Shift'] = data_selection.PJM_RT_LMP.shift(1) data_selection['Price_Change'] = data_selection['PJM_RT_LMP'] - data_selection['Price_Shift'] sort_price_change = data_selection.sort_values('Price_Change')# discretize change in price and obtain f(p) for each price change max_price_change = pd.DataFrame.max(sort_price_change['Price_Change']) min_price_change = pd.DataFrame.min(sort_price_change['Price_Change'])print("Min price change {:.2f} and Max price change {:.2f}".format(min_price_change,max_price_change))# there are 191 values for price change price_changes_sorted = sort_price_change['Price_Change'].tolist()# remove the last NaN value price_changes_sorted.pop()if DISC_TYPE =="FROM_CUM":# discretize price change by interpolating from cumulative distribution xp = price_changes_sorted fp = np.arange(len(price_changes_sorted) -1) / (len(price_changes_sorted) -1) cum_fn = np.append(fp, 1)# obtain 30 discrete prices discrete_price_change_cdf = np.linspace(0, 1, params['nPriceChangeInc']) discrete_price_change_list = []for i in discrete_price_change_cdf: interpolated_point = np.interp(i, cum_fn, xp) discrete_price_change_list.append(interpolated_point)else: price_change_range = max_price_change - min_price_change price_change_increment = price_change_range / params['nPriceChangeInc'] discrete_price_change = np.arange(min_price_change, max_price_change, price_change_increment) discrete_price_change_list =list(np.append(discrete_price_change, max_price_change)) f_p = np.arange(len(price_changes_sorted) -1) / (len(price_changes_sorted) -1) cum_fn = np.append(f_p, 1) discrete_price_change_cdf = []for c in discrete_price_change_list: interpolated_point = np.interp(c, price_changes_sorted, cum_fn) discrete_price_change_cdf.append(interpolated_point) price_changes_sorted = np.array(price_changes_sorted) discrete_price_change_list = np.array(discrete_price_change_list) discrete_price_change_cdf = np.array(discrete_price_change_cdf) discrete_price_change_pdf = discrete_price_change_cdf - shift(discrete_price_change_cdf,1,cval=0) mean_price_change = np.dot(discrete_price_change_list,discrete_price_change_pdf)#print("discrete_price_change_list ", discrete_price_change_list)#print("discrete_price_change_cdf", discrete_price_change_cdf)#print("discrete_price_change_pdf", discrete_price_change_pdf)print("Finishing processing raw price data in {:.2f} secs. Expected price change is {:.2f}. Hist_price len is {}".format(time.time()-tS,mean_price_change,len(hist_price)))#input("enter any key to continue...") exog_params = {"hist_price": hist_price,"price_changes_sorted": price_changes_sorted,"discrete_price_change_list": discrete_price_change_list,"discrete_price_change_cdf": discrete_price_change_cdf}return exog_params
seed =189654913file='Parameters.xlsx'
# NOTE:# R__max: unit is MWh which is an energy unit# R_0: unit is MWh which is an energy unitparDf = pd.read_excel(f'{base_dir}/{file}', sheet_name='ParamsModel', index_col=0);print(f'{parDf}')# parDict = parDf.set_index('Index').T.to_dict('list')#.parDict = parDf.T.to_dict('list') #.params = {key:v for key, value in parDict.items() for v in value}params['seed'] = seedparams['T'] =min(params['T'], 192);print(f'{params=}')
0
Index
Algorithm BackwardDP
T 195
eta 0.9000
R__max 1
R_0 1
params={'Algorithm': 'BackwardDP', 'T': 192, 'eta': 0.9, 'R__max': 1, 'R_0': 1, 'seed': 189654913}
parDf = pd.read_excel(f'{base_dir}/{file}', sheet_name='GridSearch', index_col=0);print(parDf)# parDict = parDf.set_index('Index').T.to_dict('list')#.parDict = parDf.T.to_dict('list')paramsPolicy = {key:v for key, value in parDict.items() for v in value};print(f'{paramsPolicy=}')params.update(paramsPolicy); pprint(f'{params=}')
parDf = pd.read_excel(f'{base_dir}/{file}', sheet_name='BackwardDP', index_col=0);print(parDf)# parDict = parDf.set_index('Index').T.to_dict('list')#.parDict = parDf.T.to_dict('list')paramsPolicy = {key:v for key, value in parDict.items() for v in value};print(f'{paramsPolicy=}')params.update(paramsPolicy); pprint(f'{params=}')
#exog_params is a dictionary with three lists: # hist_price, price_changes_list, discrete_price_change_cdf# exog_params = process_raw_price_data(f'{base_dir}/{file}', params)exogParams = process_raw_price_data(f'{base_dir}/{file}', params) #.# print(f"{exogParams['hist_price']=}")# print(f"{exogParams['price_changes_sorted']=}")# print(f"{exogParams['discrete_price_change_list']=}")# print(f"{exogParams['discrete_price_change_cdf']=}")
Processing raw price data. Constructing price change list and cdf using FROM_CUM
Min price 0.00 and Max price 107.34
Min price change -54.40 and Max price change 65.81
Finishing processing raw price data in 2.70 secs. Expected price change is 1.28. Hist_price len is 192
print("Ingesting raw price data for understanding.")# load energy price data from the Excel spreadsheetraw_data = pd.read_excel(f'{base_dir}/{file}', sheet_name="Raw Data")# look at data spanning a weekdf_raw = raw_data.iloc[0:params['T'], 0:5]# rename columns to remove spaces (otherwise we can't access them)cols = df_raw.columnscols = cols.map(lambda x: x.replace(' ', '_') ifisinstance(x, str) else x)df_raw.columns = colsdf_raw.head()
The most important aspect of data preparation will be to create an empirical distribution rather than to try and fit the data to a parametric distribution. This involves the computation of a cumulative distribution from the data. We first sort the prices from smallest to largest. Let this ordered sequence be represented by \(\tilde{p}_t\) where \(\tilde{p}_{t-1} \le \tilde{p}_t\). Also, let \(T\) be the number of time periods. The percentage of time periods with a price less than \(\tilde{p}_t\) is then \(t/T\). The empirical cumulative distribution is given by \[
F_P(\tilde{p}_t) = \frac{t}{T}
\]
There has been an interest making use of batteries to exploit electricity price spikes. Power is bought when the price is low and power is sold when the price is high. This is known as battery arbitrage.
This project hightlights some interesting aspects of this kind of problem. Among these are:
Electricity prices on the grid can be very volatile (varies between about $20/MWh to about $1,000/MWh or even $10,000/MWh for brief periods)
Prices change every 5 minutes
We assume that we can observe the price and then decide if we want to buy or sell
We can only buy or sell for an entire 5-minute interval at a maximum rate of 0.01 MW
Wind energy can be forecasted but not very well. Rolling forecasts can be used.
Energy demand is variable but quite predictable. It depends on:
temperature (primarily)
humidity (less)
Energy may be bought from or sold to the grid at current grid prices.
There is a 5% to 10% loss when converting power from AC (grid) to DC (to store in battery).
4.2 Core Elements
This section attempts to answer three important questions: - What metrics are we going to track? - What decisions do we intend to make? - What are the sources of uncertainty?
For this problem, the only metric we are interested in is the amount of profit we make after each decision window. A single type of decision needs to be made at the start of each window - how much energy to buy or sell. The only source of uncertainty is the price of energy.
4.3 Mathematical Model | SUM Design
A Python class is used to implement the model for the SUM (System Under Management):
NOTE: For now, we will only use a single decision variable, ‘x_t_GB’. In the future the complete list will be used. In this notebook we will start to move away from the 1-hot representation.
4.3.3 Exogenous information variables
The exogenous information variables represent what we did not know (when we made a decision). These are the variables that we cannot control directly. The information in these variables become available after we make the decision \(x_t\).
When we assume that the price in each time period is revealed, without any model to predict the price based on past prices, we have:
\[
W_{t+1} = p_{t+1}
\]
Alternatively, when we assume that we observe the change in price \(\hat{p}=p_t-p_{t-1}\), we have:
\[
W_{t+1} = \hat{p}_{t+1}
\]
The exogenous information have been preprocessed and is captured in:
exogParams['hist_price']
exogParams['price_changes_sorted']
exogParams['discrete_price_change_list']
exogParams['discrete_price_change_cdf']
The latest exogenous information can be accesses by calling the following method from class EnergyStorageModel():
The transition function describe how the state variables evolve over time. Because we currently have two state variables in the state, \(S_t=(R_t,p_t)\), we have the equations:
The contribution (reward) function is implemented by the following method in class EnergyStorageModel():
def C_fn(self, x_t):
C_t = self.S_t.p_t*( -self.initArgs['eta']*x_t.x_t_GB )#. C(S_t, x_t, W_t+1); x_t_GB>0 for buy, <0 for sell
return C_t
4.3.6 Implementation of SUM Model
Here is the complete implementation of the EnergyStorageModel class:
class EnergyStorageModel():def__init__(self, SVarNames, xVarNames, S_0, params, exogParams, possibleDecisions, W_fn=None, S__M_fn=None, C_fn=None):self.initArgs = paramsself.prng = np.random.RandomState(params['seed'])self.exogParams = exogParamsself.S_0 = S_0self.SVarNames = SVarNamesself.xVarNames = xVarNamesself.possibleDecisions = possibleDecisionsself.State = namedtuple('State', SVarNames) # 'class'self.S_t =self.build_state(self.S_0) # 'instance'self.Decision = namedtuple('Decision', xVarNames) # 'class'self.cumC =0.0#. cumulative reward; use F or cumF for final (i.e. no cumulative) reward# self.states = [self.S_t] # keep a list of states visiteddef reset(self):self.cumC =0.0self.S_t =self.build_state(self.S_0)# self.states = [self.S_t]def build_state(self, info):returnself.State(*[info[sn] for sn inself.SVarNames])def build_decision(self, info):returnself.Decision(*[info[xn] for xn inself.xVarNames])def W_fn(self, t): W_tt1 =self.exogParams['hist_price'][t]return W_tt1def S__M_fn(self, t, x_t): #. p_tt1 =self.W_fn(t) R_tt1 =self.S_t.R_t + x_t.x_t_GBiflen(self.SVarNames) ==2: S_tt1 =self.build_state({'R_t': R_tt1, 'p_t': p_tt1})eliflen(self.SVarNames) ==3: S_tt1 =self.build_state({'R_t': R_tt1, 'p_t': p_tt1, 'p_t_1': self.S_t.p_t})return S_tt1def C_fn(self, x_t): C_t =self.S_t.p_t*( -self.initArgs['eta']*x_t.x_t_GB )#. C(S_t, x_t, W_t+1); x_t_GB>0 for buy, <0 for sellreturn C_tdef step(self, t, x_t):self.cumC +=self.C_fn(x_t)self.S_t =self.S__M_fn(t, x_t)return (self.S_t, self.cumC, x_t) #. for plotting
4.4 Uncertainty Model
As mentioned above in section 3, we make use of an empirical distribution rather than to try and fit the data to a parametric distribution. This involves the computation of a cumulative distribution from the data. We first sort the prices from smallest to largest. Let this ordered sequence be represented by \(\tilde{p}_t\) where \(\tilde{p}_{t-1} \le \tilde{p}_t\). Also, let \(T\) be the number of time periods. The percentage of time periods with a price less than \(\tilde{p}_t\) is then \(t/T\). The empirical cumulative distribution is given by \[
F_P(\tilde{p}_t) = \frac{t}{T}
\]
4.5 Policy Design
There are two main meta-classes of policy design. Each of these has two subclasses: - Policy Search - Policy Function Approximations (PFAs) - Cost Function Approximations (CFAs) - Lookahead - Value Function Approximations (VFAs) - Direct Lookaheads (DLAs)
In this project we will use: - A simple buy low, sell high parameterized policy (from the PFA class)
The buy low, sell high policy is implemented by the following method in class EnergyStoragePolicy():
def X__BuyLowSellHigh(self, t, S_t, theta, T): # SDAM-9.4.1
info = {
'x_t_EL': 0, #Wind to Load ('EC': Wind to Consumer)
'x_t_EB': 0, #Wind to Battery ('WS': Wind to Storage)
'x_t_GL': 0, #Grid to Load ('GC': Grid to Consumer)
'x_t_GB': 0, #Grid to Battery ('GS': Grid to Storage); can be negative to reverse flow (sell back to Grid)
'x_t_BL': 0, #Battery to Load ('SC': Storage to Consumer)
}
if t >= T:
print(f"ERROR: t={t} should not reach or exceed the max steps ({T})")
return self.model.build_decision(info)
if t == T - 1: #Last time period - sell
info['x_t_GB'] = -S_t.R_t #sell the whole inventory
# print(f'last t for this sample-path: selling all ...')
else:
# (SDAM-eq9.11): X^lohi(S_t|theta)
theta__buy = theta[0] # theta__buy
theta__sell = theta[1] # theta__sell
if S_t.p_t <= theta__buy: # BUY if p_t <= theta__buy
info['x_t_GB'] = (self.model.initArgs['R__max'] - S_t.R_t)/self.model.initArgs['eta']
elif S_t.p_t >= theta__sell: # SELL if p_t >= theta__sell
info['x_t_GB'] = -S_t.R_t #sell the whole inventory
else: # HOLD if theta__buy < p_t < theta__sell
info['x_t_GB'] = 0
return self.model.build_decision(info)
4.5.1 Implementation of Policy Design
The EnergyStoragePolicy() class implements the policy design.
import randomclass EnergyStoragePolicy():def__init__(self, model, piNames):self.model = modelself.piNames = piNamesself.Policy = namedtuple('Policy', piNames)def X__BuyLowSellHigh(self, t, S_t, theta, T): # SDAM-9.4.1 info = {'x_t_EL': 0, #Wind to Load ('EC': Wind to Consumer)'x_t_EB': 0, #Wind to Battery ('WS': Wind to Storage)'x_t_GL': 0, #Grid to Load ('GC': Grid to Consumer)'x_t_GB': 0, #Grid to Battery ('GS': Grid to Storage); can be negative to reverse flow (sell back to Grid)'x_t_BL': 0, #Battery to Load ('SC': Storage to Consumer) }if t >= T:print(f"ERROR: t={t} should not reach or exceed the max steps ({T})")returnself.model.build_decision(info)if t == T -1: #Last time period - sell info['x_t_GB'] =-S_t.R_t #sell the whole inventory# print(f'last t for this sample-path: selling all ...')else:# (SDAM-eq9.11): X^lohi(S_t|theta) theta__buy = theta[0] # theta__buy theta__sell = theta[1] # theta__sellif S_t.p_t <= theta__buy: # BUY if p_t <= theta__buy info['x_t_GB'] = (self.model.initArgs['R__max'] - S_t.R_t)/self.model.initArgs['eta']elif S_t.p_t >= theta__sell: # SELL if p_t >= theta__sell info['x_t_GB'] =-S_t.R_t #sell the whole inventoryelse: # HOLD if theta__buy < p_t < theta__sell info['x_t_GB'] =0returnself.model.build_decision(info)def run_policy(self, piInfo, piName, params): model_copy = copy(self.model) T = params['T']# nTrades = {'buy': 0, 'sell': 0, 'hold': 0}# nTrades = {'x_t_buy': 0, 'x_t_sell': 0, 'x_t_hold': 0} buyList = [] sellList = []for t inrange(T): #for each transition/step x_t =getattr(self, piName)(t, model_copy.S_t, piInfo, T)# nTrades['x_t_buy'] += x_t.x_t_buy; #print(f'!!!!! {x_t.buy=}')# nTrades['x_t_sell'] += x_t.x_t_sell; #print(f'!!!!! {x_t.sell=}')# nTrades['x_t_hold'] += model_copy.S_t.R_t#.# if x_t.x_t_buy > 0:if x_t.x_t_GB >0: buyList.append((t, model_copy.S_t.R_t, model_copy.S_t.p_t, x_t.x_t_GB, params['R__max']))elif x_t.x_t_GB <0: sellList.append((t, model_copy.S_t.R_t, model_copy.S_t.p_t, x_t.x_t_GB, params['R__max'])) _, _, _ = model_copy.step(t, x_t) # step the model forward one iteration cumC = model_copy.cumC #. # print(f"Energy traded - Sell: {nTrades['x_t_sell']:.2f} - Buy: {nTrades['x_t_buy']:.2f} - Hold % : {nTrades['x_t_hold']/model_copy.init_args['T']:.2f}") #.# print(f"Energy traded - Buy: {nTrades['x_t_buy']:.2f} - Hold: {nTrades['x_t_hold']:.2f} - Sell: {nTrades['x_t_sell']:.2f}")# print("Buy times and prices ")# for i in range(len(buy_list)):# print(f"t = {buy_list[i][0]:.2f} and price = {buy_list[i][1]:.2f}") #.# print("Sell times and prices ")# for i in range(len(sell_list)):# print(f"t = {sell_list[i][0]:.2f} and price = {sell_list[i][1]:.2f}") #.return cumCdef run_policy_sample_paths(self, theta, piName, params): FhatIomega__lI = []for l inrange(1, params['L'] +1): #for each sample-path model_copy = copy(self.model) record_l = [piName, theta, l] T = params['T']# nTrades = {'x_t_buy': 0, 'x_t_sell': 0, 'x_t_hold': 0} buyList = [] sellList = []for t inrange(T): #for each transition/step x_t =getattr(self, piName)(t, model_copy.S_t, theta, params['T'])# nTrades['x_t_buy'] += x_t.x_t_buy; #print(f'!!!!! {x_t.buy=}')# nTrades['x_t_sell'] += x_t.x_t_sell; #print(f'!!!!! {x_t.sell=}')# nTrades['x_t_hold'] += model_copy.S_t.R_t# if x_t.x_t_buy > 0:if x_t.x_t_GB >0: buyList.append((t, model_copy.S_t.R_t, model_copy.S_t.p_t, x_t.x_t_GB, params['R__max']))elif x_t.x_t_GB <0: sellList.append((t, model_copy.S_t.R_t, model_copy.S_t.p_t, x_t.x_t_GB, params['R__max'])) _, _, _ = model_copy.step(t, x_t) FhatIomega__lI.append(model_copy.cumC) #. just above (SDAM-eq2.9) #. Fhat for this sample-path is in model_copy.cumC# print(f"Inventory traded - Buy: {nTrades['x_t_buy']:.2f} - Hold: {nTrades['x_t_hold']:.2f} - Sell: {nTrades['x_t_sell']:.2f}")# print("Buy times ")# for i in range(len(buy_list)):# print(f"t={buy_list[i][0]}, R_t={buy_list[i][1]:.2f}, D_t={buy_list[i][2]:.2f}, x_t.x_t_buy={buy_list[i][3]:.2f}, R__max={buy_list[i][4]:.2f}")# print("Sell times ")# for i in range(len(sell_list)):# print(f"t={sell_list[i][0]}, R_t={sell_list[i][1]:.2f}, D_t={sell_list[i][2]:.2f}, x_t.x_t_sell={sell_list[i][3]:.2f}, R__max={buy_list[i][4]:.2f}")# return cumC# end Lreturn FhatIomega__lIdef perform_grid_search(self, params, thetas): tS = time.time() cumCI_theta_I = {} bestTheta =None i =0;print(f'... printing every 100th theta ...')for theta in thetas:#print("Starting theta {}".format(theta))# if theta[0] >= theta[1]:# cumCI_theta_I[theta] = 0# else:if i%100==0: print(f'=== {theta=} ===') cumC =self.run_policy(theta, "X__BuyLowSellHigh", params) cumCI_theta_I[theta] = cumC best_theta =max(cumCI_theta_I, key=cumCI_theta_I.get)print(f"Finishing theta {theta} with cumC {cumC:.2f}. Best theta so far {best_theta}. Best cumC {cumCI_theta_I[best_theta]:,}") i +=1print(f"Finishing GridSearch in {time.time() - tS:.2f} secs")print(f"Best theta: {best_theta}. Best cumC: {cumCI_theta_I[best_theta]:,}")return cumCI_theta_Idef perform_grid_search_sample_paths(self, params, thetas): tS = time.time() Fhat_mean =None Fhat_var =None Fhat__meanI_th_I = {} Fhat__stdvI_th_I = {} Fhat_mean =None i =0;print(f'... printing every 100th theta ...')for theta in thetas:# if theta[0] >= theta[1]:# Fhat__meanI_th_I[theta] = 0# # pass# else:# if i%10 == 0: print(f'=== {theta=} ===', end='\r', flush=True)if i%100==0: print(f'=== {theta=} ===') FhatIomega__lI =self.run_policy_sample_paths(theta, "X__BuyLowSellHigh", params) Fhat_mean = np.array(FhatIomega__lI).mean() #. (SDAM-eq2.9); call Fbar in future Fhat_var = np.sum(np.square(np.array(FhatIomega__lI) - Fhat_mean))/(params['L'] -1) Fhat__meanI_th_I[theta] = Fhat_mean Fhat__stdvI_th_I[theta]= np.sqrt(Fhat_var/params['L']) best_theta =max(Fhat__meanI_th_I, key=Fhat__meanI_th_I.get)# print(f"Finishing theta {theta} with cumC {Fhat__meanI_th_I[best_theta]:.2f}. Best theta so far {best_theta}. Best cumC {Fhat__meanI_th_I[best_theta]:,}") i +=1print(f"Finishing GridSearch in {time.time() - tS:.2f} secs")print(f"Best theta: {best_theta}. Best cumC {Fhat__meanI_th_I[best_theta]:,}")# return cumCI_theta_Ireturn Fhat__meanI_th_I, Fhat__stdvI_th_Idef grid_search_theta_values(self, thetas1, thetas2): #. using vectors reduces loops in perform_grid_search_sample_paths() thetas = [(th1, th2) for th1 in thetas1 for th2 in thetas2]return thetasdef plot_Fhat_map(self, FhatI_theta_I, thetasX, thetasY, labelX, labelY, title): Fhat_values = [FhatI_theta_I[(thetaX,thetaY)] for thetaY in thetasY for thetaX in thetasX] Fhats = np.array(Fhat_values) increment_count =len(thetasX) Fhats = np.reshape(Fhats, (-1, increment_count)) fig, ax = plt.subplots() im = ax.imshow(Fhats, cmap='hot', origin='lower', aspect='auto')# create colorbar cbar = ax.figure.colorbar(im, ax=ax)# cbar.ax.set_ylabel(cbarlabel, rotation=-90, va="bottom")# we want to show all ticks... ax.set_xticks(np.arange(0,len(thetasX), 5)) ax.set_yticks(np.arange(0,len(thetasY), 5))# ... and label them with the respective list entries ax.set_xticklabels(thetasX[::5]) ax.set_yticklabels(thetasY[::5])# rotate the tick labels and set their alignment.#plt.setp(ax.get_xticklabels(), rotation=45, ha="right",rotation_mode="anchor") ax.set_title(title) ax.set_xlabel(labelX) ax.set_ylabel(labelY)#fig.tight_layout() plt.show()returnTruedef plot_Fhat_chart(self, FhatI_theta_I, thetasX, labelX, labelY, title, color_style): mpl.rcParams['lines.linewidth'] =1.2 xylabelsize =18 plt.figure(figsize=(25, 8)) plt.title(title, fontsize=20) Fhats = FhatI_theta_I.values() plt.plot(thetasX, Fhats, color_style) plt.xlabel(labelX, rotation=0, ha='right', va='center', fontweight='bold', size=xylabelsize) plt.ylabel(labelY, rotation=0, ha='right', va='center', fontweight='bold', size=xylabelsize) plt.show()
4.6 Evaluating Policies
4.6.1 Training/Tuning
# UPDATE PARAMETERSparams.update({'Algorithm': 'GridSearch'}); pprint(f'{params=}')params.update({'R__max': 1})params.update({'R_0': 0})params.update({'L': 500})params.update({'T': 100}) #must be < 192 due to exog dataparams