2
\$\begingroup\$

I need some help to refactor the script. I have a script which matches the buy and sell trades based on FIFO order.

The initial dataframe looks like this:

AsxCode Order.Type Trade.Date Price Quantity match_status match_vol 
DMP Buy 17/08/2015 42.1179 105 
DMP Sell 26/10/2015 47.05 105 
RFG Buy 17/03/2015 7.49 640 
RFG Buy 4/06/2015 5.98 870 
RFG Buy 29/09/2015 4.2 700 
RFG Sell 1/07/2015 5.4286 1510

Here is the expected output from the script:

AsxCode Order.Type Trade.Date Price Quantity match_status match_vol 
DMP Buy 17/08/2015 42.1179 105 FULL 105 
DMP Sell 26/10/2015 47.05 105 FULL 105 
RFG Buy 17/03/2015 7.49 640 FULL 640 
RFG Buy 4/06/2015 5.98 870 FULL 870 
RFG Buy 29/09/2015 4.2 700 0 4.21 
RFG Sell 1/07/2015 5.4286 1510 FULL 1510

The script has to match the sell trade (indicated by Order.Type = 'Sell') with the oldest buy. The oldest buy is indicated by Trade.Date. If the sell trade finds a buy trade it has to update the match_status column and the match_vol column. FULL is used to indicate fully matched and PARTIAL is used to indicate partially matched. The match_vol indicates how many sell units matched with the buy.

Here's the pseudo code for the script:

function(security_code)
{
 # for a given security code split into two dataframes
 # buy_trades and sell_trades
 buys = df[df$AsxCode==security_code & df$Order.Type=='Buy',] 
 buy_trades <- buys[order(as.Date(buys$Trade.Date, format="%Y-%m-%d")),]
 sells = df[df$AsxCode==security_code & df$Order.Type=='Sell',]
 sell_trades <- sells[order(sells$Trade.Date),] 
 buy_trades <<- buy_trades 
 sell_trades <<- sell_trades 
 apply(sell_trades, 1, match_buy_trades) 
 match_buy_trades() 
 { 
 for each buy_trades 
 {
 next if matched before(i.e buy_trades$match_status == full)
 if partial match 
 { 
 update buy_trades$match_vol
 update buy_trades$match_status to partial
 update sell_trades$match_vol
 update sell_trades$match_status
 }
 if full match 
 { 
 update buy_trades$match_vol
 update buy_trades$match_status
 update sell_trades$match_vol
 update sell_trades$match_status 
 break
 } 
 if all sell units matched exit 
 }
}

Here is the actual code:

sells = df[df$AsxCode==sec & df$Order.Type=='Sell',]
sell_trades <<- sells[order(sells$Trade.Date),] 
buys = df[df$AsxCode==sec & df$Order.Type=='Buy',] 
buy_trades <<- buys[order(as.Date(buys$Trade.Date)),] 
apply(sell_trades, 1, find_matching_buy) 
result <<- rbind(result, rbind(buy,sell))
find_matching_buy <- function(x){ 
 sell_units <- strtoi(x["Quantity"])
 sell_match_units <- strtoi(x["match_vol"]) 
 sell_unmatch_units <- sell_units - sell_match_units 
 buy_trades <<- buy_trades[order(buy_trades$Trade.Date),] 
 for(i in 1:nrow(buy_trades)){ 
 buy_units <- strtoi(buy_trades[i,"Quantity"]) 
 buy_match_units <- strtoi(buy_trades[i,"match_vol"]) 
 buy_unmatch_units <- buy_units - buy_match_units 
 buy_status <- buy_trades[i,"match_status"] 
 if (buy_status == "FULL" ) next
 if (sell_unmatch_units == buy_unmatch_units){ 
 buy_trades[i,"match_status"] <<- "FULL"
 buy_trades[i,"match_vol"] <<- buy_match_units + sell_unmatch_units 
 sell_trades[sell_index,"match_vol"] <<- sell_match_units + sell_unmatch_units 
 sell_trades[sell_index,"match_status"] <<- "FULL"
 break()
 }else if (sell_unmatch_units > buy_unmatch_units){ 
 buy_trades[i,"match_status"] <<- "FULL"
 buy_trades[i,"match_vol"] <<- buy_match_units + buy_unmatch_units 
 sell_trades[sell_index,"match_status"] <<- "Partial" 
 sell_trades[sell_index,"match_vol"] <<- sell_match_units + buy_unmatch_units 
 }else if (sell_unmatch_units < buy_unmatch_units){ 
 buy_trades[i,"match_status"] <<- "Partial"
 buy_trades[i,"match_vol"] <<- buy_match_units + sell_unmatch_units 
 sell_trades[sell_index,"match_status"] <<- "FULL"
 break
 } 
 sell_match_units <- strtoi(sell_trades[sell_index,"match_vol"]) # 
 sell_unmatch_units <- sell_units - sell_match_units 
 } 
 # Returning the result
 buy <<- buy_trades
 sell <<- sell_trades
}

I don't find the code very modular and structured. The other thing which is nagging me is inside the find_matching_buy() how the results are passed through buy and sell dataframe. Also, you can notice that buy_trades and sell_trades are modified globally.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked May 24, 2016 at 2:53
\$\endgroup\$
2
  • \$\begingroup\$ Please update your question to include the dput of your sample data so we can copy it into R; I tried using read.table on the sample data you provided but can't get your code to run without errors. \$\endgroup\$ Commented May 24, 2016 at 15:03
  • \$\begingroup\$ I think you made a mistake in the expected output for your example data -- shouldn't the matched volume be 0 for the last RFG buy instead of 4.21? \$\endgroup\$ Commented May 24, 2016 at 16:00

1 Answer 1

1
\$\begingroup\$

I have a few comments on the code:

  1. Globally changing values is a pretty big no-no in R code, in large part because it makes functions have unexpected side effects.
  2. This code makes use of a lot of for loops instead of using R's vectorized operators.
  3. The code separately loops through the buys for each sell, which could be wasteful (why do you need to check a buy again after it has already been matched to a sell?)

Based on this, I have a few proposals:

  1. I would do away with the concept of find_matching_buy and I would simultaneously match all the buys and sells for a security.
  2. When simultaneously matching the buys and sells, I would simplify the logic as follows. Let b be the total volume of buys and s be the total volume of sells. If b == s, then mark everything as fully matching. If b > s, then match all sells as fully matching and the first s volume of the buys as fully matching. If s > b, then match all the buys as fully matching and the first b volume of sells as fully matching.

Let's start with a fully reproducible example dataset (I obtained this using dput):

(dat <- structure(list(AsxCode = c("DMP", "DMP", "RFG", "RFG", "RFG", "RFG"), Order.Type = c("Buy", "Sell", "Buy", "Buy", "Buy", "Sell"), Trade.Date = structure(c(16664, 16734, 16511, 16590, 16707, 16617), class = "Date"), Price = c(42.1179, 47.05, 7.49, 5.98, 4.2, 5.4286), Quantity = c(105L, 105L, 640L, 870L, 700L, 1510L)), .Names = c("AsxCode", "Order.Type", "Trade.Date", "Price", "Quantity"), row.names = c(NA, -6L), class = "data.frame"))
# AsxCode Order.Type Trade.Date Price Quantity
# 1 DMP Buy 2015年08月17日 42.1179 105
# 2 DMP Sell 2015年10月26日 47.0500 105
# 3 RFG Buy 2015年03月17日 7.4900 640
# 4 RFG Buy 2015年06月04日 5.9800 870
# 5 RFG Buy 2015年09月29日 4.2000 700
# 6 RFG Sell 2015年07月01日 5.4286 1510

The central piece of the code involves determining the matching for sells and buys for a given security, doing away with the concept of find_matching_buy using the approach I described above. I'll call this function match.security:

match.security <- function(x) {
 x <- x[order(x$Trade.Date),]
 x$match_status <- "FULL"
 x$match_vol <- x$Quantity
 b <- sum(x$Quantity[x$Order.Type == "Buy"])
 s <- sum(x$Quantity[x$Order.Type == "Sell"])
 if (b > s) {
 # Only some of the buys are matched; update match_status and match_vol for buys
 b.quant <- x$Quantity[x$Order.Type == "Buy"]
 b.vol <- diff(c(0, pmin(cumsum(b.quant), s)))
 x$match_vol[x$Order.Type == "Buy"] <- b.vol 
 x$match_status[x$Order.Type == "Buy"] <-
 ifelse(b.vol == b.quant, "FULL", ifelse(b.vol == 0, "0", "Partial"))
 } else if (s > b) {
 # Only some of the sells are matched; update match_status and match_vol for sells
 s.quant <- x$Quantity[x$Order.Type == "Sell"]
 s.vol <- diff(c(0, pmin(cumsum(s.quant), b)))
 x$match_vol[x$Order.Type == "Sell"] <- s.vol 
 x$match_status[x$Order.Type == "Sell"] <-
 ifelse(s.vol == s.quant, "FULL", ifelse(s.vol == 0, "0", "Partial"))
 }
 return(x)
}

This function first labels everything as fully matched, and then computes the total volume of buys b and total volume sells s. If b > s it goes back and marks the unmatched and partially matched buys (similarly processing sells with s > b). The tricky bit is to determine the matched volume for each buy, which we do with b.vol <- diff(c(0, pmin(cumsum(b.quant), s))), where b.quant is the Quantity values for the buys and s is the total quantity of sells. For security RFG in the example data above, we have:

b.quant <- c(640, 870, 700)
s <- 1510

First, we compute the cumulative sum of b.quant, which for each buy represents the total quantity of buys if we included all earlier buys and that buy:

cumsum(b.quant)
# [1] 640 1510 2210

However, we can't match more volume of buys than the volume of sells, so we will cap these values at s using the pmin function:

pmin(cumsum(b.quant), s)
# [1] 640 1510 1510

We can see that the differences between these values are the amount of the buy that is actually matched: 640 of the first buy is matched (640-0), 870 of the second buy is matched (1510-640), and none of the third buy is matched (1510-1510). We get this by taking the pairwise differences between elements:

diff(c(0, pmin(cumsum(b.quant), s)))
# [1] 640 870 0

This may seem like a very roundabout/complex way to compute the matched volume for the buys, but in fact it has the nice property that cumsum, pmin, and diff are all vectorized functions in R, so this will actually be much more efficient than a for loop when you have a large number of buys and sells for a given security.

We can run our match.security function on each of our securities to make sure it is running correctly:

match.security(dat[dat$AsxCode == "DMP",])
# AsxCode Order.Type Trade.Date Price Quantity match_status match_vol
# 1 DMP Buy 2015年08月17日 42.1179 105 FULL 105
# 2 DMP Sell 2015年10月26日 47.0500 105 FULL 105
match.security(dat[dat$AsxCode == "RFG",])
# AsxCode Order.Type Trade.Date Price Quantity match_status match_vol
# 3 RFG Buy 2015年03月17日 7.4900 640 FULL 640
# 4 RFG Buy 2015年06月04日 5.9800 870 FULL 870
# 6 RFG Sell 2015年07月01日 5.4286 1510 FULL 1510
# 5 RFG Buy 2015年09月29日 4.2000 700 0 0

The last step is to run match.security for each security and to combine the results into a final data frame. To do so, I would use the following:

(matched.dat <- do.call(rbind, lapply(split(dat, dat$AsxCode), match.security)))
# AsxCode Order.Type Trade.Date Price Quantity match_status match_vol
# DMP.1 DMP Buy 2015年08月17日 42.1179 105 FULL 105
# DMP.2 DMP Sell 2015年10月26日 47.0500 105 FULL 105
# RFG.3 RFG Buy 2015年03月17日 7.4900 640 FULL 640
# RFG.4 RFG Buy 2015年06月04日 5.9800 870 FULL 870
# RFG.6 RFG Sell 2015年07月01日 5.4286 1510 FULL 1510
# RFG.5 RFG Buy 2015年09月29日 4.2000 700 0 0

Breaking down this line of code:

  • split(dat, dat$AsxCode) splits dat up into a list of data frames, each associated with one security.
  • lapply(..., match.security) calls match.security on each list element and stores the results of those calls into a list.
  • do.call(rbind, ...) combines all of our returned lists into a single data frame using rbind.
answered May 24, 2016 at 15:53
\$\endgroup\$
2
  • \$\begingroup\$ Hi Josliber, You are very true, the tricky bit is to determine the matched volume for each buy. I have another problem. I was trying to use match.security to also calculate the profit associated with each sell trade. But could not figure out how to do it ? Please help ? Thanks. \$\endgroup\$ Commented May 27, 2016 at 4:26
  • \$\begingroup\$ @AndroidBeginner This sounds like a pretty big extension to the code here that probably merits its own question. I would suggest formulating this as a Stack Overflow question, showing what you have done and where you have gotten stuck. \$\endgroup\$ Commented May 27, 2016 at 14:59

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.