3
\$\begingroup\$

I'm attempting to apply a long set of conditions and operations onto a pandas dataframe (see the dataframe below with VTI, upper, lower, etc). I attempted to use apply, but I was having a lot of trouble doing so. My current solution (which works perfectly) relies on a for loop iterating through the dataframe. But my sense is that this is an inefficient way to complete my simulation. I'd appreciate help on the design of my code.

VTI uppelower sell buy AU BU BL date Tok order
44.58 NaN NaN False False False False False 2001年06月15日 5 0
44.29 NaN NaN False False False False False 2001年06月18日 5 1
44.42 NaN NaN False False False False False 2001年06月19日 5 2
44.88 NaN NaN False False False False False 2001年06月20日 5 3
45.24 NaN NaN False False False False False 2001年06月21日 5 4

If I wanted to run a bunch of conditions and for loops like the below and run the function below (the get row data function) only if the row meets the conditions provided, how would I do so?

My intuition says to use .apply() but I'm not clear how to do it within this scenario. With all the if's and for-loops combined, it's a lot of rows. The below actually outputs an entirely new dataframe. I'm wondering if there are more efficient/better ways to think about the design of this simulation/stock backtesting process.

Get row data simply obtains key data from the dataframe and computes certain information based on globals (like how much capital I have already, how many stocks I have already) and spits out a list. I append all these lists into a dataframe that I call the portfolio.

I've given a snippet of the code that I've already made using a for-loop.

 ##This is only for the sell portion of the algorithm
 if val['sell'] == True and tokens == maxtokens: 
 print 'nothign to sell'
if val['sell'] == True and tokens < maxtokens: 
 print 'sellprice', price
 #CHOOSE THE MOST EXPENSIVE POSITION AND SELL IT#
 portfolio = sorted(portfolio, key = lambda x: x[0], reverse = True)
 soldpositions = [position for position in portfolio if position[7] == 'sold']
 soldpositions = sorted(portfolio, key=lambda x: x[1], reverse = True)
 for position in portfolio:
 #Position must exist 
 if position[9] == True:
 print 'position b4 sold', position
 #Position's price must be LOWER than current price 
 #AND the difference between the position's price and current price must be greater than sellbuybuffer
 if abs(position[0] - price) <= sellbuybuffer:
 print 'does not meet sellbuybuffer' 
 if position[0] < price and abs(position[0] - price) >= sellbuybuffer:
 status = 'sold'
 #This is if we have no sold positions yet 
 if len(soldpositions) == 0:
 ##this is the function that is applied only if the row meets a large number of conditions
 get_row_data(price, date, position, totalstocks, capital, status, tokens)
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Mar 5, 2014 at 15:07
\$\endgroup\$
0

1 Answer 1

3
\$\begingroup\$

Pandas allows you to filter dataframes efficiently using boolean formulas.

Instead of using a for loop and conditional branching, use the following syntax:

df = portfolio[(portfolio['sell'] == True) & (portfolio['Tok'] < maxtokens)]

To sort a dataframe, you can also simply write:

portfolio = portfolio.sort('VTI', ascending=False)
sold_positions = portfolio[portfolio['BL'] == True].sort('upperlower', ascending=True)
answered Mar 13, 2014 at 3:24
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.