I need some help to refactor the script. I have a script which matches the buy and sell trades based on FIFO order.
The initial dataframe looks like this:
AsxCode Order.Type Trade.Date Price Quantity match_status match_vol DMP Buy 17/08/2015 42.1179 105 DMP Sell 26/10/2015 47.05 105 RFG Buy 17/03/2015 7.49 640 RFG Buy 4/06/2015 5.98 870 RFG Buy 29/09/2015 4.2 700 RFG Sell 1/07/2015 5.4286 1510
Here is the expected output from the script:
AsxCode Order.Type Trade.Date Price Quantity match_status match_vol DMP Buy 17/08/2015 42.1179 105 FULL 105 DMP Sell 26/10/2015 47.05 105 FULL 105 RFG Buy 17/03/2015 7.49 640 FULL 640 RFG Buy 4/06/2015 5.98 870 FULL 870 RFG Buy 29/09/2015 4.2 700 0 4.21 RFG Sell 1/07/2015 5.4286 1510 FULL 1510
The script has to match the sell trade (indicated by Order.Type = 'Sell'
) with
the oldest buy. The oldest buy is indicated by Trade.Date
. If the sell trade finds a buy trade it has to update the match_status
column and the match_vol column. FULL is used to indicate fully matched and PARTIAL is used to indicate
partially matched. The match_vol
indicates how many sell units matched with the buy.
Here's the pseudo code for the script:
function(security_code) { # for a given security code split into two dataframes # buy_trades and sell_trades buys = df[df$AsxCode==security_code & df$Order.Type=='Buy',] buy_trades <- buys[order(as.Date(buys$Trade.Date, format="%Y-%m-%d")),] sells = df[df$AsxCode==security_code & df$Order.Type=='Sell',] sell_trades <- sells[order(sells$Trade.Date),] buy_trades <<- buy_trades sell_trades <<- sell_trades apply(sell_trades, 1, match_buy_trades) match_buy_trades() { for each buy_trades { next if matched before(i.e buy_trades$match_status == full) if partial match { update buy_trades$match_vol update buy_trades$match_status to partial update sell_trades$match_vol update sell_trades$match_status } if full match { update buy_trades$match_vol update buy_trades$match_status update sell_trades$match_vol update sell_trades$match_status break } if all sell units matched exit } }
Here is the actual code:
sells = df[df$AsxCode==sec & df$Order.Type=='Sell',]
sell_trades <<- sells[order(sells$Trade.Date),]
buys = df[df$AsxCode==sec & df$Order.Type=='Buy',]
buy_trades <<- buys[order(as.Date(buys$Trade.Date)),]
apply(sell_trades, 1, find_matching_buy)
result <<- rbind(result, rbind(buy,sell))
find_matching_buy <- function(x){
sell_units <- strtoi(x["Quantity"])
sell_match_units <- strtoi(x["match_vol"])
sell_unmatch_units <- sell_units - sell_match_units
buy_trades <<- buy_trades[order(buy_trades$Trade.Date),]
for(i in 1:nrow(buy_trades)){
buy_units <- strtoi(buy_trades[i,"Quantity"])
buy_match_units <- strtoi(buy_trades[i,"match_vol"])
buy_unmatch_units <- buy_units - buy_match_units
buy_status <- buy_trades[i,"match_status"]
if (buy_status == "FULL" ) next
if (sell_unmatch_units == buy_unmatch_units){
buy_trades[i,"match_status"] <<- "FULL"
buy_trades[i,"match_vol"] <<- buy_match_units + sell_unmatch_units
sell_trades[sell_index,"match_vol"] <<- sell_match_units + sell_unmatch_units
sell_trades[sell_index,"match_status"] <<- "FULL"
break()
}else if (sell_unmatch_units > buy_unmatch_units){
buy_trades[i,"match_status"] <<- "FULL"
buy_trades[i,"match_vol"] <<- buy_match_units + buy_unmatch_units
sell_trades[sell_index,"match_status"] <<- "Partial"
sell_trades[sell_index,"match_vol"] <<- sell_match_units + buy_unmatch_units
}else if (sell_unmatch_units < buy_unmatch_units){
buy_trades[i,"match_status"] <<- "Partial"
buy_trades[i,"match_vol"] <<- buy_match_units + sell_unmatch_units
sell_trades[sell_index,"match_status"] <<- "FULL"
break
}
sell_match_units <- strtoi(sell_trades[sell_index,"match_vol"]) #
sell_unmatch_units <- sell_units - sell_match_units
}
# Returning the result
buy <<- buy_trades
sell <<- sell_trades
}
I don't find the code very modular and structured. The other thing which is nagging me is inside the find_matching_buy()
how the results are passed through buy and sell dataframe. Also, you can notice that buy_trades
and sell_trades
are modified globally.
1 Answer 1
I have a few comments on the code:
- Globally changing values is a pretty big no-no in R code, in large part because it makes functions have unexpected side effects.
- This code makes use of a lot of for loops instead of using R's vectorized operators.
- The code separately loops through the buys for each sell, which could be wasteful (why do you need to check a buy again after it has already been matched to a sell?)
Based on this, I have a few proposals:
- I would do away with the concept of
find_matching_buy
and I would simultaneously match all the buys and sells for a security. - When simultaneously matching the buys and sells, I would simplify the logic as follows. Let
b
be the total volume of buys ands
be the total volume of sells. Ifb == s
, then mark everything as fully matching. Ifb > s
, then match all sells as fully matching and the firsts
volume of the buys as fully matching. Ifs > b
, then match all the buys as fully matching and the firstb
volume of sells as fully matching.
Let's start with a fully reproducible example dataset (I obtained this using dput
):
(dat <- structure(list(AsxCode = c("DMP", "DMP", "RFG", "RFG", "RFG", "RFG"), Order.Type = c("Buy", "Sell", "Buy", "Buy", "Buy", "Sell"), Trade.Date = structure(c(16664, 16734, 16511, 16590, 16707, 16617), class = "Date"), Price = c(42.1179, 47.05, 7.49, 5.98, 4.2, 5.4286), Quantity = c(105L, 105L, 640L, 870L, 700L, 1510L)), .Names = c("AsxCode", "Order.Type", "Trade.Date", "Price", "Quantity"), row.names = c(NA, -6L), class = "data.frame"))
# AsxCode Order.Type Trade.Date Price Quantity
# 1 DMP Buy 2015年08月17日 42.1179 105
# 2 DMP Sell 2015年10月26日 47.0500 105
# 3 RFG Buy 2015年03月17日 7.4900 640
# 4 RFG Buy 2015年06月04日 5.9800 870
# 5 RFG Buy 2015年09月29日 4.2000 700
# 6 RFG Sell 2015年07月01日 5.4286 1510
The central piece of the code involves determining the matching for sells and buys for a given security, doing away with the concept of find_matching_buy
using the approach I described above. I'll call this function match.security
:
match.security <- function(x) {
x <- x[order(x$Trade.Date),]
x$match_status <- "FULL"
x$match_vol <- x$Quantity
b <- sum(x$Quantity[x$Order.Type == "Buy"])
s <- sum(x$Quantity[x$Order.Type == "Sell"])
if (b > s) {
# Only some of the buys are matched; update match_status and match_vol for buys
b.quant <- x$Quantity[x$Order.Type == "Buy"]
b.vol <- diff(c(0, pmin(cumsum(b.quant), s)))
x$match_vol[x$Order.Type == "Buy"] <- b.vol
x$match_status[x$Order.Type == "Buy"] <-
ifelse(b.vol == b.quant, "FULL", ifelse(b.vol == 0, "0", "Partial"))
} else if (s > b) {
# Only some of the sells are matched; update match_status and match_vol for sells
s.quant <- x$Quantity[x$Order.Type == "Sell"]
s.vol <- diff(c(0, pmin(cumsum(s.quant), b)))
x$match_vol[x$Order.Type == "Sell"] <- s.vol
x$match_status[x$Order.Type == "Sell"] <-
ifelse(s.vol == s.quant, "FULL", ifelse(s.vol == 0, "0", "Partial"))
}
return(x)
}
This function first labels everything as fully matched, and then computes the total volume of buys b
and total volume sells s
. If b > s
it goes back and marks the unmatched and partially matched buys (similarly processing sells with s > b
). The tricky bit is to determine the matched volume for each buy, which we do with b.vol <- diff(c(0, pmin(cumsum(b.quant), s)))
, where b.quant
is the Quantity
values for the buys and s
is the total quantity of sells. For security RFG in the example data above, we have:
b.quant <- c(640, 870, 700)
s <- 1510
First, we compute the cumulative sum of b.quant
, which for each buy represents the total quantity of buys if we included all earlier buys and that buy:
cumsum(b.quant)
# [1] 640 1510 2210
However, we can't match more volume of buys than the volume of sells, so we will cap these values at s
using the pmin
function:
pmin(cumsum(b.quant), s)
# [1] 640 1510 1510
We can see that the differences between these values are the amount of the buy that is actually matched: 640 of the first buy is matched (640-0), 870 of the second buy is matched (1510-640), and none of the third buy is matched (1510-1510). We get this by taking the pairwise differences between elements:
diff(c(0, pmin(cumsum(b.quant), s)))
# [1] 640 870 0
This may seem like a very roundabout/complex way to compute the matched volume for the buys, but in fact it has the nice property that cumsum
, pmin
, and diff
are all vectorized functions in R, so this will actually be much more efficient than a for
loop when you have a large number of buys and sells for a given security.
We can run our match.security
function on each of our securities to make sure it is running correctly:
match.security(dat[dat$AsxCode == "DMP",])
# AsxCode Order.Type Trade.Date Price Quantity match_status match_vol
# 1 DMP Buy 2015年08月17日 42.1179 105 FULL 105
# 2 DMP Sell 2015年10月26日 47.0500 105 FULL 105
match.security(dat[dat$AsxCode == "RFG",])
# AsxCode Order.Type Trade.Date Price Quantity match_status match_vol
# 3 RFG Buy 2015年03月17日 7.4900 640 FULL 640
# 4 RFG Buy 2015年06月04日 5.9800 870 FULL 870
# 6 RFG Sell 2015年07月01日 5.4286 1510 FULL 1510
# 5 RFG Buy 2015年09月29日 4.2000 700 0 0
The last step is to run match.security
for each security and to combine the results into a final data frame. To do so, I would use the following:
(matched.dat <- do.call(rbind, lapply(split(dat, dat$AsxCode), match.security)))
# AsxCode Order.Type Trade.Date Price Quantity match_status match_vol
# DMP.1 DMP Buy 2015年08月17日 42.1179 105 FULL 105
# DMP.2 DMP Sell 2015年10月26日 47.0500 105 FULL 105
# RFG.3 RFG Buy 2015年03月17日 7.4900 640 FULL 640
# RFG.4 RFG Buy 2015年06月04日 5.9800 870 FULL 870
# RFG.6 RFG Sell 2015年07月01日 5.4286 1510 FULL 1510
# RFG.5 RFG Buy 2015年09月29日 4.2000 700 0 0
Breaking down this line of code:
split(dat, dat$AsxCode)
splitsdat
up into a list of data frames, each associated with one security.lapply(..., match.security)
callsmatch.security
on each list element and stores the results of those calls into a list.do.call(rbind, ...)
combines all of our returned lists into a single data frame usingrbind
.
-
\$\begingroup\$ Hi Josliber, You are very true, the tricky bit is to determine the matched volume for each buy. I have another problem. I was trying to use match.security to also calculate the profit associated with each sell trade. But could not figure out how to do it ? Please help ? Thanks. \$\endgroup\$Android Beginner– Android Beginner2016年05月27日 04:26:02 +00:00Commented May 27, 2016 at 4:26
-
\$\begingroup\$ @AndroidBeginner This sounds like a pretty big extension to the code here that probably merits its own question. I would suggest formulating this as a Stack Overflow question, showing what you have done and where you have gotten stuck. \$\endgroup\$josliber– josliber2016年05月27日 14:59:32 +00:00Commented May 27, 2016 at 14:59
dput
of your sample data so we can copy it into R; I tried usingread.table
on the sample data you provided but can't get your code to run without errors. \$\endgroup\$