0
$\begingroup$

We are given historical trade data from a cryptocurrency exchange — in our case, Kraken — which, for each trade, includes the following information:

  1. Time of trade
  2. Trade Price
  3. Trade ID (Integers that increment upwards by 1 each trade)
  4. Order Type (if the trade was a result of a market order or limit order being placed)
  5. Taker Side of Trade (Buy or Sell)

This specific restriction of only trade data is present because for many exchanges, there has been a somewhat-recent shift where large amounts of historical trade data are easily accessible for free, but historical quote or matched trade and quote data is not.

After analyzing the log-returns of the trade price from trade to trade, I noticed that there was significant negative auto-correlation at lag 1 which I assume is due to the bid-ask bounce given the way this market functions/the orders are matched being similar to other markets where this is observed.

We do not wish to analyze/trade at high/ultra-high frequencies (sub-1-minute) using just this data due to various structural reasons as well as the bid-ask bounce being pronounced, distorting our analysis, so we instead down-sample our returns to 1-minute bars.

We then evaluate the auto-correlation of the 1-minute log returns again and still observe low in magnitude, but still higher-magnitude-than-greater-lags negative auto-correlation at lag 1 indicating potential lessening, but still persistence of, bid-ask bounce.

Note that during this, we did not fill-forward our down-sampled bars meaning that there are periods where there are no bars.

My questions are then the following:

  1. Is there a way we can use this trade data for back-testing/analysis in a fruitful way in the presence of bid-ask bounce? as this could introduce spurious mean-reverting behavior in the price process if we down-sample this trade data "as-is". I saw that this post addresses ways of retrieving an efficient price from the trade price series, but wanted to know if there have been any more modern approaches/approaches not as involved since we intend to trade at non-HFT frequencies.

  2. Is there a way that we can "de-bounce" this trade data so that it is more suitable for our purposes? such as by removing the impact of lag-1 auto-correlation on the trades by log-differencing the trade prices, applying a linear AR(1) (or other lags that indicate persistence of BAB) model, and instead looking at the back-transformed residuals as our "de-bounced" trade price series, or, does this do more harm than good if the bid-ask bounce effect is not too pronounced?

  3. Bonus due to potentially being too broad: Is any of this analysis/back-testing impacted by not filling-forward our bars to fill empty bars (1-minute periods where no trades occurred.)? I am trying to avoid this as when there are a large number of empty bars, this biases our auto-correlation (made more positive) and our volatility estimates (made lower) among potential other biases.

asked Aug 18 at 3:05
$\endgroup$

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.