I just submitted this pull request to statsmodels
(a Python toolkit for statistical and econometric modeling). It adds support for exponential smoothing of time series.
I have only been programming for a few months. I am not very good with Numpy, and just learning classes and decorators, so any help would be appreciated.
The main part of the code is below. Can anyone help with shortening this so it is more efficient? Especially around the for
loop. The complete file is here.
import numpy as np
import statsmodels.tools.eval_measures as em
#debug use
numpy.set_printoptions(suppress=True)
def exp_smoothing(y, alpha, gamma, delta=0, cycle=None, damp=1, initial=None,
trend='additive', forecast=None, season='additive', output='data'):
'''
Exponential Smoothing
This function handles 15 different Standard Exponential Smoothing models
Parameters
----------
y: array
Time series data
alpha: float
Smoothing factor for data between 0 and 1.
gamma: non zero integer or float
Smoothing factor for trend generally between 0 and 1
delta: non zero integer or float
Smoothing factor for season generally between 0 and 1
cycle: int
Length of cycles in a season. (ie: 12 for months, 4 for quarters)
damp: non zero integer or float {default = 1}
Autoregressive or damping parameter specifies a rate of decay
in the trend. Generally 0>d>1
initial: str, {'3avg'}(Optional)
Indicate initial point for bt and y
default: bt = y[0]-y[1], st = y[0]
3avg: Yields the average of the first 3 differences for bt.
trend: str, {'additive','multiplicative', 'brown'}
Indicate model type of trend default is 'additive'
additive: uses additive models such as Holt's linear & Damped-Trend
Linear Exponential Smoothing. Generalized as:
s_t = a * y_t + (1-a) * (s_t-1 + b_t-1)
b_t = g *(s_t - s_t-1) + (1 - g) * b_t-1
multiplicative: uses multiplicative models such as Exponential trend &
Taylor's modification of Pegels' model. Generalized as:
s_t = a * y_t + (1-a) * (s_t-1 * b_t-1)
b_t = g *(s_t / s_t-1) + (1 - g) * b_t-1
brown: used to deal with the special cases in Brown linear smoothing
forecast: int (Optional)
Number of periods ahead.
season: str, {'additive','multiplicative'}
Indicate type of season default is 'additive'
output: str, {'data', 'describe','forecast'}(Not implemented)
Returns
-------
pdata: array
Data that is smoothened using model chosen
Notes
-----
This function is able to perform the following algorithms:
Simple Exponential Smoothing(SES)
Simple Seasonal models (both multiplicative and additive)
Brown's Linear Exponential Smoothing
Holt's Double Exponential Smoothing
Exponential trend method
Damped-Trend Linear Exponential Smoothing
Multiplicative damped trend (Taylor 2003)
Holt-Winters Exponential Smoothing:
multiplicative trend, additive trend, and damped models for both
multiplicative season, additive season, and damped models for both
References
----------
*Wikipedia
*Forecasting: principles and practice by Hyndman & Athanasopoulos
*NIST.gov http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc433.htm
*Oklahoma State SAS chapter 30 section 11
*IBM SPSS Custom Exponential Smoothing Models
'''
#Initialize data
y = np.asarray(y)
ylen = len(y)
if ylen <= 3:
raise ValueError("Cannot implement model, must have at least 4 data points")
if alpha == 0:
raise ValueError("Cannot fit model, alpha must not be 0")
#forcast length
if forecast >= 1:
ylen += 1
#Setup array lengths
sdata = np.zeros(ylen - 2)
bdata = np.zeros(ylen - 2)
# Setup seasonal values
if type(cycle) is int:
if ylen < 2 * cycle:
raise ValueError("Cannot implement model, must be 2 at least cycles long")
#Setting b1 value
bdata[0] = np.mean((y[cycle:2 * cycle] - y[:cycle]) / float(cycle))
#Setup initial seasonal indicies
#coerce cycle start lengths
if len(y) % cycle != 0:
cdata = y[:len(y) % cycle*-1].reshape(-1, cycle)
else:
cdata = y.reshape(-1, cycle)
cdata = (cdata / cdata.mean(axis=1).reshape(-1, 1)).mean(axis=0)
cdata = np.concatenate([cdata, np.zeros(ylen - 3)])
else:
#Initialize to bypass delta function with 0
cycle = 0
cdata = np.zeros(ylen)
#Setting b1 value
if gamma == 0:
bdata[0] = 0
elif initial == '3avg':
bdata[0] = np.mean(abs(np.diff(y[:4])))
else:
bdata[0] = abs(y[0] - y[1])
#Setting s1 value
sdata[0] = y[0]
#Equations & Update ylen to align array
ylen -= 3
for i in range(ylen):
s = sdata[i]
b = bdata[i]
#handles multiplicative seasons
if season == 'multiplicative':
if trend == 'multiplicative':
sdata[i + 1] = alpha * (y[i + 2] / cdata[i]) + (1 - alpha) * s * (b**damp)
bdata[i + 1] = gamma * (sdata[i + 1] / s) + (1 - gamma) * (b ** damp)
cdata[i + cycle] = delta * (y[i + 2] / sdata[i + 1]) + (1 - delta) * cdata[i]
#handles additive models
else:
sdata[i + 1] = alpha * (y[i + 2] / cdata[i]) + (1 - alpha) * (s + damp * b)
bdata[i + 1] = gamma * (sdata[i + 1] - s) + (1 - gamma) * damp * b
cdata[i + cycle] = delta * (y[i + 2] / sdata[i + 1]) + (1 - delta) * cdata[i]
else:
if trend == 'multiplicative':
sdata[i + 1] = alpha * (y[i + 2] - cdata[i]) + (1 - alpha) * s * (b**damp)
bdata[i + 1] = gamma * (sdata[i + 1] / s) + (1 - gamma) * (b ** damp)
cdata[i + cycle] = delta * (y[i + 2] - sdata[i + 1]) + (1 - delta) * cdata[i]
#handles additive models
else:
sdata[i + 1] = alpha * (y[i + 2] - cdata[i]) + (1 - alpha) * (s + damp * b)
bdata[i + 1] = gamma * (sdata[i + 1] - s) + (1 - gamma) * damp * b
cdata[i + cycle] = delta * (y[i + 2] - sdata[i + 1]) + (1 - delta) * cdata[i]
#debug
## print 'period=', i ,'cdata=', cdata[i+cycle],'sdata=', sdata[i]
#Temporary fix: back fill data with Nan to align with y
filx = [np.nan,np.nan]
bdata = np.concatenate([filx, bdata])
sdata = np.concatenate([filx, sdata])
if season == 'multiplicative':
if trend == 'multiplicative':
pdata = (sdata * bdata) * cdata[:len(cdata) - cycle+3]
else:
pdata = (sdata + bdata) * cdata[:len(cdata) - cycle+3]
else:
if trend == 'multiplicative':
pdata = sdata * bdata + cdata[:len(cdata) - cycle+3]
#Handles special case for Brown linear
elif trend == 'brown':
at = 2 * sdata - bdata
bt = alpha / (1 - alpha) * (sdata - bdata)
sdata = at
bdata = bt
pdata = sdata + bdata + cdata[:len(cdata) - cycle+3]
else:
pdata = sdata + bdata + cdata[:len(cdata) - cycle+3]
#Calculations for summary items (NOT USED YET)
x1 = y[2:]
x2 = pdata[2:len(y)]
rmse = em.rmse(x1,x2)
#forcast
if forecast >= 2:
#Configure damp
if damp == 1:
m = np.arange(2, forecast+1)
else:
m = np.cumsum(damp ** np.arange(1, forecast+1))
m = m[1:]
#Config season
if cycle == 0:
if season == 'multiplicative':
c = 1
else:
c = 0
elif forecast > cycle:
raise NotImplementedError("Forecast beyond cycle length is unavailable")
else:
c = cdata[ylen+1:]
c = c[:forecast-1]
#Config trend
if season == 'multiplicative':
if trend == 'multiplicative':
fdata = sdata[ylen] * (bdata[ylen] ** m) * c
else:
fdata = sdata[ylen] + m * bdata[ylen] * c
else:
if trend == 'multiplicative':
fdata = sdata[ylen] * (bdata[ylen] ** m) + c
else:
fdata = sdata[ylen] + m * bdata[ylen] + c
#debug
## print 'fdata=', fdata, 'sdata=', sdata[ylen], 'm=', m, 'bdata', bdata[ylen], 'c=', c
pdata = np.append(pdata, fdata)
#debug
print 'fdata=', fdata
print 'sdata=', sdata
print 'bdata=', bdata
print 'cdata=', cdata
print 'pdata=', pdata
return pdata
else:
print 'sdata=', sdata
print 'bdata=', bdata
print 'cdata=', cdata
print 'pdata=', pdata
return pdata
Here is the test data and to check it is here everything in blue
#-------Test-------
#(y, alpha, gamma, delta = 0, cycle = None, damp = 1, initial = None,
# trend = 'additive', forecast=None, season = 'additive', output='data')
y=[152,176,160,192,220,272,256,280,300,280,312,328]
exp_smoothing(y,0.2, 0.3,0,None,1, None, 'additive', 3)
1 Answer 1
Some quick comments just based on reading the documentation and the first few lines:
Typos, inconsistencies, and omissions:
- The first line should be a summary of the behaviour of the function, not a title, so something like "Exponentially smooth a time series".
- There's no need for capital letters in "Standard Exponential Smoothing" or in "Damped-Trend Linear Exponential Smoothing".
- There's an extra space in "alpha: float".
- In "Smoothing factor for data between 0 and 1" there needs to be punctuation between "data" and "between". (A similar comment applies to some other parameters.)
- Does "between 0 and 1" include the endpoints or not?
- Sometimes you have a full stop at the end of sentences and sometimes you don't.
- "non-zero" needs a hyphen.
- "Length of cycles in a season" should be "Number of cycles in a season".
- Write "i.e.," instead of "ie:", but in this case I think "for example," would be better.
- What does "generally between 0 and 1" mean?
- "0>d>1" is an impossible condition.
- What is "bt"?
- "Indicate model type of trend default is 'additive'" make no sense in English.
- "used to deal with the special cases" — deal with them how?
- "Number of periods ahead" — ahead of what?
- "Not implemented" — suggests that this code is not ready for merge.
You write, "This function is able to perform the following algorithms" but you don't say how. It would be worth explaining what parameters to pass in order to get each of these behaviours.
I don't like the way you've used the word "season". In English a "season" is one fourth of a year, or, metaphorically, one part of a cycle. But in "Length of cycles in a season" you appear to be using the word to mean the whole cycle. This can only be confusing to the reader.
You've left in a lot of debugging print statements. But these should surely be removed before issuing the pull request. It looks as though it would be worth your while learning to use Python's debugging facilities, then you wouldn't need to use debugging print statements.
alpha
needs to be "between 0 and 1", but all you actually check isalpha == 0
.Some of the other parameters have restricted values. For example,
trend
must be'additive'
,'multiplicative'
, or'brown'
, but you don't actually check its value.Generally it's best to check all your parameters first, rather than doing half the computation and then checking some of the parameters.
You asked whether "something like this [would] be better for a class?" A class represents a group of things with common behaviour. But you don't seem to have any thing here, so there's no need for a class.
exp_smoothing
is fine. \$\endgroup\$