I wrote a small code in Python to conduct a backtest. My rebalance
function is very slow and I would like to improve it. I need to profile it but I would welcome any opinion on how to refactor my code nicely.
def Rebalance(self, currentDate, data, VolatilityForSizing,Wthtslippage=True):
# Handle open positions
yesterday = max(self.PtfHandler.Positions.index[self.PtfHandler.Positions.index<currentDate])
CurrentPositions = self.PtfHandler.Positions.loc[yesterday]
self.PtfHandler.Positions.loc[currentDate]=CurrentPositions.values
OpenPositions = CurrentPositions[CurrentPositions !=0].keys()
NotOpenPositions = set(CurrentPositions.keys())-set(OpenPositions)
# Save Metrics in Ptf
PositionPnL = dict.fromkeys(NotOpenPositions,0)
PtfMetrics = dict()
PtfMetrics['Date'] = currentDate
PtfMetrics['TradedPositions'] = len(OpenPositions)
SumReturn = 0
# FirstSignal at current date from FirstSignalHandler
FirstSignalSetMax = set(CurrentPositions[self.FirstSignalHandler.FirstSignalSetMax.loc[currentDate]].keys())
FirstSignalSetMin = set(CurrentPositions[self.FirstSignalHandler.FirstSignalSetMin.loc[currentDate]].keys())
goLongCandidate = self.SecondSignal.initiateLong.ix[currentDate]
goShortCandidate = self.SecondSignal.initiateShort.ix[currentDate]
goLongCandidate = goLongCandidate[goLongCandidate>0].keys() # turn goLongCandidate which is a serie into a list
goShortCandidate = goShortCandidate[goShortCandidate>0].keys()
PotentialLongToClose = set(CurrentPositions[CurrentPositions.isin(OpenPositions) & CurrentPositions>0].keys())
LongToClose = list(PotentialLongToClose.intersection(FirstSignalSetMax))
self.PtfHandler.Positions.loc[currentDate,LongToClose] = 0
for stock in LongToClose:
self.PtfHandler.saveStrategyEndDate(stock,currentDate)
pnl = self.PtfHandler.getNonTradedPositionPnL(currentDate,yesterday,stock,CurrentPositions[stock],data,self.FirstSignalHandler.VolOnEntry)
pnl = pnl + self.PtfHandler.getSlippageImpact(stock,CurrentPositions[stock],self.FirstSignalHandler.VolOnEntry,saveReturnWthSlippage=Wthtslippage)
PositionPnL.update({stock:pnl})
SumReturn = SumReturn + pnl
PotentialShortToClose = set(CurrentPositions[CurrentPositions.isin(OpenPositions) & CurrentPositions<0].keys())
ShortToClose = list(PotentialShortToClose.intersection(FirstSignalSetMin))
self.PtfHandler.Positions.loc[currentDate,ShortToClose] = 0
for stock in ShortToClose:
self.PtfHandler.saveStrategyEndDate(stock,currentDate)
pnl = self.PtfHandler.getNonTradedPositionPnL(currentDate,yesterday,stock,CurrentPositions[stock],data,self.FirstSignalHandler.VolOnEntry)
pnl = pnl + self.PtfHandler.getSlippageImpact(stock,CurrentPositions[stock],self.FirstSignalHandler.VolOnEntry,saveReturnWthSlippage=Wthtslippage)
PositionPnL.update({stock:pnl})
SumReturn = SumReturn + pnl
NewPotentialLong = set(goLongCandidate).intersection(NotOpenPositions)
NewPotentialLong = list(NewPotentialLong.intersection(FirstSignalSetMax))
for stock in NewPotentialLong:
self.PtfHandler.saveStrategyStartDate(stock,currentDate)
qty = self.AssetManager.getNewTradeWeight(currentDate, stock,VolatilityForSizing)
self.PtfHandler.Positions.loc[currentDate,stock] =qty
pnl = self.PtfHandler.getSlippageImpact(stock,qty,self.FirstSignalHandler.VolOnEntry,saveReturnWthSlippage=Wthtslippage)
PositionPnL[stock]=pnl
SumReturn = SumReturn + pnl
NewPotentialShort = set(goShortCandidate).intersection(NotOpenPositions)
NewPotentialShort = list(NewPotentialShort.intersection(FirstSignalSetMin))
for stock in NewPotentialShort:
self.PtfHandler.saveStrategyStartDate(stock,currentDate)
qty = self.AssetManager.getNewTradeWeight(currentDate, stock,VolatilityForSizing)
self.PtfHandler.Positions.loc[currentDate,stock] =-qty
pnl = self.PtfHandler.getSlippageImpact(stock,qty,self.FirstSignalHandler.VolOnEntry,saveReturnWthSlippage=Wthtslippage)
SumReturn = SumReturn + pnl
PositionPnL[stock]= pnl
# Save portfolio positions and returns given rebalancing
NotTradedPositions = list(set(OpenPositions)-set(ShortToClose)-set(LongToClose))
for stock in NotTradedPositions:
pnl = self.PtfHandler.getNonTradedPositionPnL(currentDate,yesterday,stock,CurrentPositions[stock],data,self.FirstSignalHandler.VolOnEntry)
SumReturn = SumReturn + pnl
PositionPnL.update({stock:pnl})
self.PtfHandler.AdjustedReturns = self.PtfHandler.AdjustedReturns.append(PositionPnL,ignore_index=True)
PtfMetrics['PtfAdjReturn'] = SumReturn
YesterdayCumReturn = self.PtfHandler.PtfAdjustedReturns.ix[-1]['CumReturn'] if not self.PtfHandler.PtfAdjustedReturns.empty else 0
PtfMetrics['CumReturn'] = YesterdayCumReturn + SumReturn
self.PtfHandler.PtfAdjustedReturns = self.PtfHandler.PtfAdjustedReturns.append(PtfMetrics,ignore_index=True)
Edit:
I just profiled the function. The first calculation to get yesterday date is very costly - which I thought. DataFrame.append
is also very costly. Same for the PnL calculation. I probably need to improve the function.
1 Answer 1
As jonrsharpe said, read the style guide for Python, PEP0008. One of the first things it will suggest is that you stick to just 4 spaces for indentation, but at the very least you should always use the exact same amount of indentation across your whole script.
It also suggests that you keep line lengths to under 79 characters and that you always leave a space between operators so I'd change this:
yesterday = max(self.PtfHandler.Positions.index[self.PtfHandler.Positions.index<currentDate])
into this:
yesterday = max(self.PtfHandler.Positions.index[
self.PtfHandler.Positions.index < currentDate])
In particular you shouldn't have an asymmetric application of this. You really shouldn't ever do CurrentPositions !=0
, ideally CurrentPositions != 0
but CurrentPositions!=0
is better.
You should also add spaces to comma separated lists. Treat it like written english where a list of apples, bananas, pears etc. will have spaces after each comma.
PositionPnL = dict.fromkeys(NotOpenPositions, 0)
It probably seems redundant there, but makes a big difference here:
pnl = self.PtfHandler.getNonTradedPositionPnL(currentDate,yesterday,stock,CurrentPositions[stock],data,self.FirstSignalHandler.VolOnEntry)
All that runs together and is harder to read. I'd prefer
pnl = self.PtfHandler.getNonTradedPositionPnL(currentDate, yesterday, stock,
CurrentPositions[stock], data, self.FirstSignalHandler.VolOnEntry)
Why use dict()
instead of {}
? It's arguably less clear to call the function and offers no real advantage.
Why are goLongCandidate
and goShortCandidate
evaluated separately? Here they're identical, they are changed later but it'd be neater to just evaluate one and then make a copy of it for the other.
Also inline comments should be short, if they're running long on an already wrong line, put them on the next line or previous one. Really though, the comment feels unnecessary to me since the line is quite clear in what it does.
goLongCandidate = goLongCandidate[goLongCandidate>0].keys() # turn goLongCandidate which is a serie into a list
This line can be shortened and made clearer with the =+
operator, which just adds the value on the right to the current value of the variable on the left.
SumReturn = SumReturn + pnl
SumReturn += pnl
I'll make a note that in parameter arguments you shouldn't put whitespace either side of the =
operator, so you were right to do that here:
self.PtfHandler.AdjustedReturns = self.PtfHandler.AdjustedReturns.append(PositionPnL,ignore_index=True)
...but still add the space after the comma.
append
was 20x slower than building regular vanilla Python lists of each column and then constructing the dataframe by column. \$\endgroup\$