1
$\begingroup$

Over the years I've worked on a few applications that needed to model time intervals that are disjoint. For example, there's some kind of equipment available and time slots are booked out. For a data structure to represent this, you generally need to be able to insert, remove, and perform range queries (i.e., looking for the next availability).

In the past I've represented these with some kind of ordered map or tree. In Rust this might look something like BTreeMap<u32, 32> with start being the key and the end being the value. This works really well in practice, but it's not as fast I'd like for some real-time applications for really read-heavy range queries with thousands of intervals.

A lot of the specialized interval libraries I've found don't seem to perform better than a plain ordered map like I mentioned (many use an ordered map internally anyway). Many of these also weren't designed with cache-efficiency or SIMD in mind and spend a lot of time on tree (pointer) traversal.

I spent some time prototyping with some other data structures that might fit (adaptive radix trees to take advantage of the small integer key, dense bitsets for small ranges, applying a spatial index in 1-dimension, fixed grid hierarchies) but haven't been able to find anything much faster than an ordered map.

I was wondering if anyone has come across any interesting data structures that handle this well. Approaches that use a combination of multiple data structures to accelerate reads at the cost of slightly slower inserts/removals could be interesting.

asked Jul 19, 2024 at 2:05
$\endgroup$
6
  • 1
    $\begingroup$ I think it depends to be honest on how fast we're talking? I've written a simple tree data-structure for interval range-queries, and it could handle hundreds of thousands per second. I'm just curious what was the slow point for you, did you do some tests and find it was too slow? Or was there something particular about it, that insertions + look-ups were fast, but deletions were slow? $\endgroup$ Commented Jul 19, 2024 at 2:39
  • $\begingroup$ @Minko_Minkov Based on performance testing with real data, the slow point is mostly range queries though insert/remove are still significant. It's targeting a real-time use case, where the scale is tens-hundreds of thousands of range queries but ideally they can all complete within tens of milliseconds. $\endgroup$ Commented Jul 19, 2024 at 3:26
  • $\begingroup$ news.ycombinator.com/item?id=41000844 $\endgroup$ Commented Jul 24, 2024 at 6:07
  • $\begingroup$ @D.W. I didn't get much of a response so I thought I'd cross-post it - I hope that's alright $\endgroup$ Commented Jul 24, 2024 at 13:01
  • 1
    $\begingroup$ @grovesNL There's no rule against it. I hope you get some good answers. I wanted to cross-link so that anyone who has a similar question and finds this page via search can check out the information there. $\endgroup$ Commented Jul 24, 2024 at 17:37

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.