I was thinking of a data structure that I cannot describe better than with the word "book", or more exactly "binder". I was wondering if this had already been implemented in libraries like Boost or others.
Principle
It's a data structure that allows constant-time random access and amortized constant time insertion/deletion (as long as the sequence of those are performed on indexes not too far away from each other).
It consists of two stacks "left" and "right" representing the state of a binder that would be open on a particular page. Inserting and deleting elements near this page is pretty fast since it consists in pushing/popping the pages to/from the left stack, and flipping some pages if necessary. Of course in the worst case one would have to flip from the first page to the last page (fill one stack with the other), for example if the user inserts something at the beginning and then something at the end.
Here is a very rudimentary implementation in C++:
#include <iostream>
#include <vector>
using namespace std;
template<class T>
class Book {
vector<T> left, right;
public:
inline size_t size()
{
return left.size() + right.size();
}
void push(T t)
{
insert(size(),t);
}
T& get(size_t i)
{
if (i < left.size())
return left[i];
else return right[right.size()-i+left.size()-1];
}
void insert(size_t i, T t)
{
reposition(i-1);
left.push_back(t);
}
void remove(size_t i)
{
reposition(i);
left.pop_back();
}
protected:
void reposition(size_t i)
// puts the element at position i on the top of the left stack
{
while (i < left.size()-1)
{
right.push_back(left.back());
left.pop_back();
}
while (right.size() >= size()-i)
{
left.push_back(right.back());
right.pop_back();
}
}
};
int main()
{
Book<int> b;
for (int i = 0; i < 15; i++)
b.push(i);
b.insert(5,42);
b.remove(10);
int i = 0;
while (i < b.size())
{
if (b.get(i)%2 != 0)
b.remove(i);
else i++;
}
for (int i = 0; i < b.size(); i++)
{
cout << b.get(i) << " ";
}
cout << endl;
return 0;
}
Note: A possible nice optimizations would be to have the two stacks in reverse order, to allow a memcpy of the whole portion to be moved from on stack to the other instead of moving elements one by one (memcpy is usually much faster).
Use
So a good use case of this DS would be an algorithm that runs through a sequence of elements and can, for each index, add new elements or remove old elements near this index.
A particular case of this is filtering an array of elements. All you need to do is to iterate over all the indexes and simply call binder.remove(i) if the element at index i is to be removed.
An efficient (and optimal) way to do it, given an array of elements, would usually be to allocate another array of the same size, and copy only the elements that we want to save. It is easy to see that this does the same amount of operations than our Binder implementation (if it starts on the first page), the main difference being that Binder allows a simpler algorithm hiding the dirty details of implementation. (And we don't end up with a different object.)
Things get even messier to implement when we need to handle deletion of close elements or a mix of deletion and addition, while with the Binder DS it is all encapsulated and the algorithm stays very simple.
So the Binder DS can be seen as an array (or "vector" in the sense of C++) which allows efficiently working on its content by including a working buffer, so one never has to allocate extra space by hand (so it helps reduce dynamic memory allocations). It makes the whole mutation processes transparent.
Conclusion
I haven't time-tested it against more conventional algorithms, but I expect it would be a little slower in element access than a plain vector because of the higher complexity of having two buffers instead of one.
Do you think this is a good idea of a data structure, and it would prove efficient? Has it already been done, and in which library? Do you have a better alternative to propose?
-
2a doubly linked list with a finger table approximates itratchet freak– ratchet freak01/17/2014 13:05:34Commented Jan 17, 2014 at 13:05
-
3Related: Gap bufferJohn Bartholomew– John Bartholomew01/17/2014 13:07:21Commented Jan 17, 2014 at 13:07
-
Being somewhat trained in functional programming I'd take the copying algorithm over the mutating one anytime. Because it can be implemented in terms of local function and generic mapcat/monadic bind functor. Makes it easily parallelisable among other things.Jan Hudec– Jan Hudec01/17/2014 14:27:23Commented Jan 17, 2014 at 14:27
-
I'd consider migrating this to cs.se. The data structure in question is rather specialized and I am not sure it is even possible.Jan Hudec– Jan Hudec01/17/2014 14:40:29Commented Jan 17, 2014 at 14:40
-
1Do you have a specific algorithm in mind? Because I can't think of anything appropriate that wouldn't be adequately served by the traditional structures.Jan Hudec– Jan Hudec01/17/2014 14:49:23Commented Jan 17, 2014 at 14:49
1 Answer 1
It's basically a list zipper, a well-known functional data structure. In the form described on Wikipedia, it is a persistent data structure, which means that after you do an update (delete or insert), the old version of the data structure is still accessible. By using arrays instead of singly linked lists to implement the stacks, you give up persistency and instead gain the ability for random access, which is actually a cool idea.
Since insert
and delete
are only efficient if they occur near the current position, I would suggest the following modification, inspired by the typical usage pattern of a zipper in functional programming: Instead of giving the position as an argument to insert/delete
, I'd introduce the concept of a focus. The focus of your data structure is the index that is currently in the middle of your two stacks. It is the place where you can efficiently insert and delete. So I'd provide interfaces to read the focus and methods next()
and previous()
to move it. You can only insert and delete at the current focus, so you can't make the mistake of introducing a costly operation into your algorithm by accident.
-
Nice API design ideas. However, I think it'd be useful to keep index-based insert/delete in order to make some algorithms simpler where we might not write at a precise focus but on cells close before or after it. However, the functions could be named more explicitly, like "focus_insert(i,v)".Lionel Parreaux– Lionel Parreaux01/30/2014 00:40:38Commented Jan 30, 2014 at 0:40
-
A doubly-linked-list seems superior because you can have more than one focusStack Exchange Broke The Law– Stack Exchange Broke The Law02/10/2023 15:43:50Commented Feb 10, 2023 at 15:43