Help improve my my python code for Exponential Smoothing Sumitted to Statsmodel

Question 1

I just submitted this pull request to statsmodels (a Python toolkit for statistical and econometric modeling). It adds support for exponential smoothing of time series.

I have only been programming for a few months. I am not very good with Numpy, and just learning classes and decorators, so any help would be appreciated.

The main part of the code is below. Can anyone help with shortening this so it is more efficient? Especially around the for loop. The complete file is here.

import numpy as np
import statsmodels.tools.eval_measures as em
#debug use
numpy.set_printoptions(suppress=True)
def exp_smoothing(y, alpha, gamma, delta=0, cycle=None, damp=1, initial=None,
 trend='additive', forecast=None, season='additive', output='data'):
 '''
 Exponential Smoothing
 This function handles 15 different Standard Exponential Smoothing models
 Parameters
 ----------
 y: array
 Time series data
 alpha: float
 Smoothing factor for data between 0 and 1.
 gamma: non zero integer or float
 Smoothing factor for trend generally between 0 and 1
 delta: non zero integer or float
 Smoothing factor for season generally between 0 and 1
 cycle: int
 Length of cycles in a season. (ie: 12 for months, 4 for quarters)
 damp: non zero integer or float {default = 1}
 Autoregressive or damping parameter specifies a rate of decay
 in the trend. Generally 0>d>1
 initial: str, {'3avg'}(Optional)
 Indicate initial point for bt and y
 default: bt = y[0]-y[1], st = y[0]
 3avg: Yields the average of the first 3 differences for bt.
 trend: str, {'additive','multiplicative', 'brown'}
 Indicate model type of trend default is 'additive'
 additive: uses additive models such as Holt's linear & Damped-Trend
 Linear Exponential Smoothing. Generalized as:
 s_t = a * y_t + (1-a) * (s_t-1 + b_t-1)
 b_t = g *(s_t - s_t-1) + (1 - g) * b_t-1
 multiplicative: uses multiplicative models such as Exponential trend &
 Taylor's modification of Pegels' model. Generalized as:
 s_t = a * y_t + (1-a) * (s_t-1 * b_t-1)
 b_t = g *(s_t / s_t-1) + (1 - g) * b_t-1
 brown: used to deal with the special cases in Brown linear smoothing
 forecast: int (Optional)
 Number of periods ahead.
 season: str, {'additive','multiplicative'}
 Indicate type of season default is 'additive'
 output: str, {'data', 'describe','forecast'}(Not implemented)
 Returns
 -------
 pdata: array
 Data that is smoothened using model chosen
 Notes
 -----
 This function is able to perform the following algorithms:
 Simple Exponential Smoothing(SES)
 Simple Seasonal models (both multiplicative and additive)
 Brown's Linear Exponential Smoothing
 Holt's Double Exponential Smoothing
 Exponential trend method
 Damped-Trend Linear Exponential Smoothing
 Multiplicative damped trend (Taylor 2003)
 Holt-Winters Exponential Smoothing:
 multiplicative trend, additive trend, and damped models for both
 multiplicative season, additive season, and damped models for both
 References
 ----------
 *Wikipedia
 *Forecasting: principles and practice by Hyndman & Athanasopoulos
 *NIST.gov http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc433.htm
 *Oklahoma State SAS chapter 30 section 11
 *IBM SPSS Custom Exponential Smoothing Models
 '''
 #Initialize data
 y = np.asarray(y)
 ylen = len(y)
 if ylen <= 3:
 raise ValueError("Cannot implement model, must have at least 4 data points")
 if alpha == 0:
 raise ValueError("Cannot fit model, alpha must not be 0")
 #forcast length
 if forecast >= 1:
 ylen += 1
 #Setup array lengths
 sdata = np.zeros(ylen - 2)
 bdata = np.zeros(ylen - 2)
 # Setup seasonal values
 if type(cycle) is int:
 if ylen < 2 * cycle:
 raise ValueError("Cannot implement model, must be 2 at least cycles long")
 #Setting b1 value
 bdata[0] = np.mean((y[cycle:2 * cycle] - y[:cycle]) / float(cycle))
 #Setup initial seasonal indicies
 #coerce cycle start lengths
 if len(y) % cycle != 0:
 cdata = y[:len(y) % cycle*-1].reshape(-1, cycle)
 else:
 cdata = y.reshape(-1, cycle)
 cdata = (cdata / cdata.mean(axis=1).reshape(-1, 1)).mean(axis=0)
 cdata = np.concatenate([cdata, np.zeros(ylen - 3)])
 else:
 #Initialize to bypass delta function with 0
 cycle = 0
 cdata = np.zeros(ylen)
 #Setting b1 value
 if gamma == 0:
 bdata[0] = 0
 elif initial == '3avg':
 bdata[0] = np.mean(abs(np.diff(y[:4])))
 else:
 bdata[0] = abs(y[0] - y[1])
 #Setting s1 value
 sdata[0] = y[0]
 #Equations & Update ylen to align array
 ylen -= 3
 for i in range(ylen):
 s = sdata[i]
 b = bdata[i]
 #handles multiplicative seasons
 if season == 'multiplicative':
 if trend == 'multiplicative':
 sdata[i + 1] = alpha * (y[i + 2] / cdata[i]) + (1 - alpha) * s * (b**damp)
 bdata[i + 1] = gamma * (sdata[i + 1] / s) + (1 - gamma) * (b ** damp)
 cdata[i + cycle] = delta * (y[i + 2] / sdata[i + 1]) + (1 - delta) * cdata[i]
 #handles additive models
 else:
 sdata[i + 1] = alpha * (y[i + 2] / cdata[i]) + (1 - alpha) * (s + damp * b)
 bdata[i + 1] = gamma * (sdata[i + 1] - s) + (1 - gamma) * damp * b
 cdata[i + cycle] = delta * (y[i + 2] / sdata[i + 1]) + (1 - delta) * cdata[i]
 else:
 if trend == 'multiplicative':
 sdata[i + 1] = alpha * (y[i + 2] - cdata[i]) + (1 - alpha) * s * (b**damp)
 bdata[i + 1] = gamma * (sdata[i + 1] / s) + (1 - gamma) * (b ** damp)
 cdata[i + cycle] = delta * (y[i + 2] - sdata[i + 1]) + (1 - delta) * cdata[i]
 #handles additive models
 else:
 sdata[i + 1] = alpha * (y[i + 2] - cdata[i]) + (1 - alpha) * (s + damp * b)
 bdata[i + 1] = gamma * (sdata[i + 1] - s) + (1 - gamma) * damp * b
 cdata[i + cycle] = delta * (y[i + 2] - sdata[i + 1]) + (1 - delta) * cdata[i]
 #debug
## print 'period=', i ,'cdata=', cdata[i+cycle],'sdata=', sdata[i]
 #Temporary fix: back fill data with Nan to align with y
 filx = [np.nan,np.nan]
 bdata = np.concatenate([filx, bdata])
 sdata = np.concatenate([filx, sdata])
 if season == 'multiplicative':
 if trend == 'multiplicative':
 pdata = (sdata * bdata) * cdata[:len(cdata) - cycle+3]
 else:
 pdata = (sdata + bdata) * cdata[:len(cdata) - cycle+3]
 else:
 if trend == 'multiplicative':
 pdata = sdata * bdata + cdata[:len(cdata) - cycle+3]
 #Handles special case for Brown linear
 elif trend == 'brown':
 at = 2 * sdata - bdata
 bt = alpha / (1 - alpha) * (sdata - bdata)
 sdata = at
 bdata = bt
 pdata = sdata + bdata + cdata[:len(cdata) - cycle+3]
 else:
 pdata = sdata + bdata + cdata[:len(cdata) - cycle+3]
 #Calculations for summary items (NOT USED YET)
 x1 = y[2:]
 x2 = pdata[2:len(y)]
 rmse = em.rmse(x1,x2)
 #forcast
 if forecast >= 2:
 #Configure damp
 if damp == 1:
 m = np.arange(2, forecast+1)
 else:
 m = np.cumsum(damp ** np.arange(1, forecast+1))
 m = m[1:]
 #Config season
 if cycle == 0:
 if season == 'multiplicative':
 c = 1
 else:
 c = 0
 elif forecast > cycle:
 raise NotImplementedError("Forecast beyond cycle length is unavailable")
 else:
 c = cdata[ylen+1:]
 c = c[:forecast-1]
 #Config trend
 if season == 'multiplicative':
 if trend == 'multiplicative':
 fdata = sdata[ylen] * (bdata[ylen] ** m) * c
 else:
 fdata = sdata[ylen] + m * bdata[ylen] * c
 else:
 if trend == 'multiplicative':
 fdata = sdata[ylen] * (bdata[ylen] ** m) + c
 else:
 fdata = sdata[ylen] + m * bdata[ylen] + c
 #debug
## print 'fdata=', fdata, 'sdata=', sdata[ylen], 'm=', m, 'bdata', bdata[ylen], 'c=', c
 pdata = np.append(pdata, fdata)
 #debug
 print 'fdata=', fdata
 print 'sdata=', sdata
 print 'bdata=', bdata
 print 'cdata=', cdata
 print 'pdata=', pdata
 return pdata
 else:
 print 'sdata=', sdata
 print 'bdata=', bdata
 print 'cdata=', cdata
 print 'pdata=', pdata
 return pdata

Here is the test data and to check it is here everything in blue

#-------Test-------
#(y, alpha, gamma, delta = 0, cycle = None, damp = 1, initial = None,
# trend = 'additive', forecast=None, season = 'additive', output='data')
y=[152,176,160,192,220,272,256,280,300,280,312,328]
exp_smoothing(y,0.2, 0.3,0,None,1, None, 'additive', 3)

Question 2

You have cut out all the documentation from the code you posted. How do you expect us to review it without knowing what it is supposed to do or how to use it? Please restore the documentation!

Question 3

@Gareth Rees You can download the code at github where the pull is.

Question 4

We'd like you to post all the code that you want reviewing here. See the help: "If you want a code review, you must post the relevant snippets of code in your question." And obviously the documentation is a relevant part of what you want us to review.

Question 5

@GarethRees Alright I will post the documentation for the main function that does all the work. But I am not posting the wrappers because they just make it easier to implement specific conditions. It would be too long.

Question 6

Thank you — documentation for exp_smoothing is fine.

Question 7

Some quick comments just based on reading the documentation and the first few lines:

Typos, inconsistencies, and omissions:
- The first line should be a summary of the behaviour of the function, not a title, so something like "Exponentially smooth a time series".
- There's no need for capital letters in "Standard Exponential Smoothing" or in "Damped-Trend Linear Exponential Smoothing".
- There's an extra space in "alpha: float".
- In "Smoothing factor for data between 0 and 1" there needs to be punctuation between "data" and "between". (A similar comment applies to some other parameters.)
- Does "between 0 and 1" include the endpoints or not?
- Sometimes you have a full stop at the end of sentences and sometimes you don't.
- "non-zero" needs a hyphen.
- "Length of cycles in a season" should be "Number of cycles in a season".
- Write "i.e.," instead of "ie:", but in this case I think "for example," would be better.
- What does "generally between 0 and 1" mean?
- "0>d>1" is an impossible condition.
- What is "bt"?
- "Indicate model type of trend default is 'additive'" make no sense in English.
- "used to deal with the special cases" — deal with them how?
- "Number of periods ahead" — ahead of what?
- "Not implemented" — suggests that this code is not ready for merge.
You write, "This function is able to perform the following algorithms" but you don't say how. It would be worth explaining what parameters to pass in order to get each of these behaviours.
I don't like the way you've used the word "season". In English a "season" is one fourth of a year, or, metaphorically, one part of a cycle. But in "Length of cycles in a season" you appear to be using the word to mean the whole cycle. This can only be confusing to the reader.
You've left in a lot of debugging print statements. But these should surely be removed before issuing the pull request. It looks as though it would be worth your while learning to use Python's debugging facilities, then you wouldn't need to use debugging print statements.
alpha needs to be "between 0 and 1", but all you actually check is alpha == 0.
Some of the other parameters have restricted values. For example, trend must be 'additive', 'multiplicative', or 'brown', but you don't actually check its value.
Generally it's best to check all your parameters first, rather than doing half the computation and then checking some of the parameters.
You asked whether "something like this [would] be better for a class?" A class represents a group of things with common behaviour. But you don't seem to have any thing here, so there's no need for a class.

Gareth Rees Gareth Rees 50.1k3 gold badges130 silver badges210 bronze badges · Answer 1 · 2014-01-04 13:08:55Z

Some quick comments just based on reading the documentation and the first few lines:

Typos, inconsistencies, and omissions:
- The first line should be a summary of the behaviour of the function, not a title, so something like "Exponentially smooth a time series".
- There's no need for capital letters in "Standard Exponential Smoothing" or in "Damped-Trend Linear Exponential Smoothing".
- There's an extra space in "alpha: float".
- In "Smoothing factor for data between 0 and 1" there needs to be punctuation between "data" and "between". (A similar comment applies to some other parameters.)
- Does "between 0 and 1" include the endpoints or not?
- Sometimes you have a full stop at the end of sentences and sometimes you don't.
- "non-zero" needs a hyphen.
- "Length of cycles in a season" should be "Number of cycles in a season".
- Write "i.e.," instead of "ie:", but in this case I think "for example," would be better.
- What does "generally between 0 and 1" mean?
- "0>d>1" is an impossible condition.
- What is "bt"?
- "Indicate model type of trend default is 'additive'" make no sense in English.
- "used to deal with the special cases" — deal with them how?
- "Number of periods ahead" — ahead of what?
- "Not implemented" — suggests that this code is not ready for merge.
You write, "This function is able to perform the following algorithms" but you don't say how. It would be worth explaining what parameters to pass in order to get each of these behaviours.
I don't like the way you've used the word "season". In English a "season" is one fourth of a year, or, metaphorically, one part of a cycle. But in "Length of cycles in a season" you appear to be using the word to mean the whole cycle. This can only be confusing to the reader.
You've left in a lot of debugging print statements. But these should surely be removed before issuing the pull request. It looks as though it would be worth your while learning to use Python's debugging facilities, then you wouldn't need to use debugging print statements.
alpha needs to be "between 0 and 1", but all you actually check is alpha == 0.
Some of the other parameters have restricted values. For example, trend must be 'additive', 'multiplicative', or 'brown', but you don't actually check its value.
Generally it's best to check all your parameters first, rather than doing half the computation and then checking some of the parameters.
You asked whether "something like this [would] be better for a class?" A class represents a group of things with common behaviour. But you don't seem to have any thing here, so there's no need for a class.

Stack Exchange Network

Help improve my my python code for Exponential Smoothing Sumitted to Statsmodel

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Help improve my my python code for Exponential Smoothing Sumitted to Statsmodel

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions