Summary: performance issue with for-loop, vectorization possible?
I am processing a time series of securities price data, stored in a pandas dataframe df_buy_sell
. First, an initial function defines buy (+1) and sell (-1) signals df_buy_sell["Action"]
.
Input Data link to download
Date Open Adj_Close Action Signal Slippage Spread Sell_for Buy_for
8/21/2014 14.96 13.964128 1 1 0.002992 0.0374 14.919608 15.000392
8/29/2014 14.52 13.495533 -1 0 0.002904 0.0363 14.480796 14.559204
11/11/2014 14.8 14.307712 1 1 0.00296 0.037 14.76004 14.83996
8/17/2015 16.98 16.233931 -1 0 0.003396 0.04245 16.934154 17.025846
8/17/2016 12.82 11.867902 1 1 0.002564 0.03205 12.785386 12.854614
8/18/2016 11.8 11.290437 -1 0 0.00236 0.0295 11.76814 11.83186
2/22/2017 11.38 11.3 1 1 0.002276 0.02845 11.349274 11.410726
4/20/2017 10.8 10.8 -1 0 0.00216 0.027 10.77084 10.82916
4/27/2017 11.2 11.38 1 1 0.00224 0.028 11.16976 11.23024
5/12/2017 10.98 10.9 -1 0 0.002196 0.02745 10.950354 11.009646
5/23/2017 11.58 11.48 1 1 0.002316 0.02895 11.548734 11.611266
Based on the buy and sell signals in df_buy_sell["Action"]
, the trade()
function below buys and sells the maximum number of stocks possible at that time/in that row, given a specific amount of cash. As a result, the remaining amount of cash changes, so does the price of the stock in scope for the next buy/sell signal/row.
Therefore, I put a for index, row in df.iterrows()
loop to the core of this function. However, this loop is very slow and I have not been able to vectorize it so far.
Is it possible to vectorize this for
-loop?
Function
FX_rate = 9.22703405
initial_capital = 100000*FX_rate
def trade(df_buy_sell, initial_capital, FX_rate):
remaining_cash = initial_capital
# Create columns for row-wise operations
df_buy_sell["Positions"] = 0
df_buy_sell["Comission"] = None
df_buy_sell["SExchange"] = None
df_buy_sell["holdings"] = 0
df_buy_sell["Cash"] = None
## begin row-wise operations
# GO LONG
for index, row in df_buy_sell.iterrows():
if row["Action"] > 0:
start_positions = int(remaining_cash / row["Buy_for"]) # THIS GOES TO NEXT ROW
df_buy_sell.loc[index, "Positions"] = start_positions
df_buy_sell.loc[index, "Holdings"] = start_positions * row["Adj_Close"]
spent = start_positions * row["Buy_for"]
# Comission
comission = (4.90*FX_rate) + (0.0025 * spent)
if comission < (9.9*FX_rate):
comission = (9.9*FX_rate)
elif comission > (59.9*FX_rate):
comission = (59.9*FX_rate)
df_buy_sell.loc[index, "Comission"] = comission
# Exchange fees
exchange_fee = spent * 0.000025
if exchange_fee < (2.5*FX_rate):
exchange_fee = (2.5*FX_rate)
df_buy_sell.loc[index, "SExchange"] = exchange_fee
# Cash left after GO LONG
remaining_cash = remaining_cash - spent - exchange_fee - comission
df_buy_sell.loc[index, "Cash"] = remaining_cash
# GO SHORT
if row["Action"] < 0:
row["Positions"] = 0
row["Holdings"] = 0
earned = start_positions * row["Sell_for"]
# Comission
comission = (4.90*FX_rate) + (0.0025 * earned)
if comission < (9.9*FX_rate):
comission = (9.9*FX_rate)
elif comission > (59.9*FX_rate):
comission = (59.9*FX_rate)
df_buy_sell.loc[index, "Comission"] = comission
# Exchange fees
exchange_fee = earned * 0.000025
if exchange_fee < (2.5*FX_rate):
exchange_fee = (2.5*FX_rate)
df_buy_sell.loc[index, "SExchange"] = exchange_fee
# Cash left after GO LONG
remaining_cash = remaining_cash + earned - exchange_fee - comission # THIS GOES TO NEXT ROW
df_buy_sell.loc[index, "Cash"] = remaining_cash
## end row-wise operations
return df_buy_sell
Output Data link to download
Date Open Adj_Close Action Signal Slippage Spread Sell_for Buy_for Positions Comission SExchange Holdings Cash
8/21/2014 14.96 13.964128 1 1 0.002992 0.0374 14.919608 15.000392 61511 552.6993396 23.06758513 858947.4774 -561.4742367
8/29/2014 14.52 13.495533 -1 0 0.002904 0.0363 14.480796 14.559204 0 552.6993396 23.06758513 0 889591.0016
11/11/2014 14.8 14.307712 1 1 0.00296 0.037 14.76004 14.83996 59945 552.6993396 23.06758513 857675.7958 -566.1675302
8/17/2015 16.98 16.233931 -1 0 0.003396 0.04245 16.934154 17.025846 0 552.6993396 25.37794654 0 1013973.617
8/17/2016 12.82 11.867902 1 1 0.002564 0.03205 12.785386 12.854614 78880 552.6993396 25.34929881 936140.1098 -576.3842447
8/18/2016 11.8 11.290437 -1 0 0.00236 0.0295 11.76814 11.83186 0 552.6993396 23.20677208 0 927118.5928
2/22/2017 11.38 11.3 1 1 0.002276 0.02845 11.349274 11.410726 81249 552.6993396 23.17775192 918113.7 -567.3610219
4/20/2017 10.8 10.8 -1 0 0.00216 0.027 10.77084 10.82916 0 552.6993396 23.06758513 0 873976.8512
4/27/2017 11.2 11.38 1 1 0.00224 0.028 11.16976 11.23024 77823 552.6993396 23.06758513 885625.74 -569.8832313
5/12/2017 10.98 10.9 -1 0 0.002196 0.02745 10.950354 11.009646 0 552.6993396 23.06758513 0 851043.7492
5/23/2017 11.58 11.48 1 1 0.002316 0.02895 11.548734 11.611266 73294 552.6993396 23.06758513 841415.12 -568.1479428
1 Answer 1
Use Panda's masks,
df_buy_sell[condition]
lets you select all rows in the dataframe that matches your condition. You could then apply your entire function block to all the rows at once. For example,
df_buy = df_by_sell[df_by_sell.Action >0]
start_positions = remaining_cash / df_buy["Buy_for"] # THIS GOES TO NEXT ROW
df_buy_sell.loc[df_buy.index, "Positions"] = start_positions
df_buy_sell.loc[df_buy.index, "Holdings"] = start_positions * df_buy["Adj_Close"]
spent = start_positions * df_by["Buy_for"]
...
exchange_fee[exchange_fee < (2.5*FX_rate)] = (2.5*FX_rate)
...
Once you are selecting from the dataframe with masks, you can apply your mathematical functions to your masked dataframes just like any numpy array.
That being said, it looks like 90% of your functions for Action>0 and Action<0 are the same, so I would recommend writing these code blocks into functions for code maintainability, readability, and re-use.
If you had functions,
def calculate_commission(transaction_amount):
...
def calculate_exchange_fees(transaction_amount):
...
def calculate_remaining_cash():
...
You could update your dateframe with something like:
df_buy_sell.loc[df_buy.index, "Commission"] = calculate_commission(spent)
df_buy_sell.loc[df_buy.index, "Commission"] = calculate_commission(earned)
which would be much cleaner and easier to read. You also have a lot of re-used constants within your calculations(such as 4.90, 9.9,59.9), which I'd suggest setting variables for, like you did for FX_rate. It's likely that someone else reading your code will have no idea what these constants mean; you might not even remember what they mean if you were to re-visit this code in a week, or a month, or a year. If you stored them as variables instead, if at any point the value changes, maybe 4.90 rises to 5.30, you only have to update the 1 variable instead of every reference you currently have for it inside your code block.
Use masks to quickly modify dataframes and write functions/variables for anything you use more than once! :)
EDIT I can't find where in your function you are setting the value to the next row as mentioned in your title (additional reason to write code into functions for readability), but you can do this in pandas by using the shift function.
For example, df_by_sell[target_column] = df_by_sell[column_to_shift].shift(1)
Make your calculations within the row, then shift it to the position you want.
-
\$\begingroup\$ many thanks. I will refactor accordingly. "start_positions" and "remaining_cash" are variables that are defined/updated in both the "go long" and "go short" part, i.e. (Action >0) and (Action < 0) \$\endgroup\$sudonym– sudonym2017年09月05日 06:12:24 +00:00Commented Sep 5, 2017 at 6:12