I need to implement a Ring Buffer/FIFO for data coming from a TCP socket.
It must support the following operations:
- Append a the recv()'ed chunk of bytes.
- Allow me to peek at the beginning of the buffer, since i get differently-sized packets, and I must decode a small fixed-size header to know how many bytes to process.
- Remove a chunk of bytes from the beginning of the buffer for processing.
I guess my needs are pretty standard for a TCP-based streaming protocol, but surprisingly I have never found a "best practice" method for doing this.
There are similar questions on SO already, most suggest to use collections.deque
, which is fine but has some shortcomings that must be worked around for my needs:
- It doesn't easily allow peeking.
- It doesn't allow removal of chunks of bytes.
Mixing various suggestions I came up with the following implementation, which seems working but I wonder: can I do any better, performance-wise? Removing a single byte at a time in get()
doesn't look optimal at all.
import collections
import itertools
class RingBuffer (object):
"""Ring buffer"""
def __init__ (self, size = 4096):
self._buf = collections.deque (maxlen = size)
def put (self, data):
"""Adds data to the end of the buffer"""
self._buf.extend (data)
def get (self, size):
"""Retrieves data from the beginning of the buffer"""
data = str ()
for i in xrange (size):
data += self._buf.popleft ()
return data
def peek (self, size):
"""\"Peeks\" at the beginning of the buffer (i.e.: retrieves data without removing them from the buffer)"""
return str (bytearray (itertools.islice (self._buf, size)))
def len (self):
"""Returns the length of the buffer"""
return len (self._buf)
If you are wondering why I am returning a string from peek()
, that is because I need to process its return value with struct.unpack_from()
.
1 Answer 1
I suspect you’re stuck using one-pop-at-a-time if you want to get multiple elements at once.
I found this quote from one of the collections
developers on Stack Overflow about using pop() for many elements in a deque:
There is no multi-pop method for deques.
The OP on that question had a very similar construction to you (they were saving the data to a list rather than a str), and he said:
Yes, it is correct. Yes, it is reasonably efficient though it can be further sped-up with boundmethods and itertools. No, it isn't stupid :-)
I don’t know enough about boundmethods or itertools to know what he was suggesting.
That doesn’t seem particularly helpful, so here are a few other comments on your general coding style:
- You shouldn’t have a space immediately before the opening parens that starts a list of arguments to a function. (PEP 8: Pet Peeves)
- It’s also recommended that you don’t have a space around the equals sign for keyword arguments or default values. (PEP 8: Other Recommendations)
- In the interest of using descriptive variable names, I see no reason not to extend the length of the variable name
_buf
to_buffer
. Characters are cheap. - Lines should be limited to 79 characters (PEP 8: Maximum Line Length), which includes docstrings: specifically, the docstring for
peek()
. - If you have a single double-quote within a triple-quoted string, you don’t need to escape it. You only need to escape quotes within a triple-quoted string if you have three of them at once.
- The docstring for
RingBuffer()
is almost useless. It tells me nothing that I couldn’t have learnt from the name of the class.
And a few comments on your get()
method in particular:
In the for loop, your index variable is named
i
, but you never use the value of this variable. It’s common practice to give unused index variables the name_
, to make it explicit that the value of this variable is unimportant. That is,for _ in xrange(size): data += self._buf.popleft()
It used to be the case that it was better to construct a list of strings and then call
join()
on the list than be continually appending to strings, but I think that’s fixed in newer versions of Python.Regardless, I happen to have a personal preference for that construction. It makes the method slightly shorter:
def alt_get(self, size): """Retrieves data from the beginning of the buffer.""" data = ''.join(self._buf.popleft() for _ in xrange(size)) return data
But I think that’s entirely personal preference: I did an short performance test, and I couldn’t see any difference between this and your
get()
method.Suppose I have an empty ring buffer, and I try to
get()
an element from it:r = RingBuffer() r.get()
The user sees the IndexError that arises from trying to pop an empty deque:
Traceback (most recent call last): File "buffer.py", line 29, in <module> r.get(1) File "buffer.py", line 17, in get data += self._buf.popleft () IndexError: pop from an empty deque
Unless you knew that RingBuffer() was using a deque under the hood, this error message would be confusing. I’d recommend having a custom error message to explain that somebody’s trying to get more items than you have to give.
It’s also worth noting that if you use a
try ... except
block here, the buffer will be emptied before you notice, and the contents could be lost. You need a check before you access anything from the buffer. Something like this:def get(self, size): """Retrieves data from the beginning of the buffer""" if size > len(self._buf): raise IndexError("Too many items: trying to access %d items " "from a buffer of length %d" % (size, len(self))) data = [self._buf.popleft() for _ in xrange(size)] return data
Your
peak()
method doesn't suffer from this problem.
-
\$\begingroup\$ A "bound method" is a "method object" — it's what you get when you evaluate
obj.method
. If you evaluatem = obj.method
once and then callm(args)
many times, this is faster than callingobj.method(args)
because the method lookup is only done once. \$\endgroup\$Gareth Rees– Gareth Rees2015年03月27日 00:10:32 +00:00Commented Mar 27, 2015 at 0:10 -
\$\begingroup\$ Thanks for your detailed answer, @alexwlchan. I was already aware of most of your observations, I just omitted them for brevity (apart from the whitespace thing, which stems from my C background and my eyes are just trained to see code that way). Anyway it still puzzles me how surprisingly difficult it is to find a working Ring Buffer implementation with those requirements, even in C/C++! Maybe I am tackling the TCP buffering problem the wrong way? What's the best practice in this case? \$\endgroup\$SukkoPera– SukkoPera2015年03月27日 08:56:28 +00:00Commented Mar 27, 2015 at 8:56
Explore related questions
See similar questions with these tags.
StringIO
instance as you buffer would enable easier peeking and getting chunks of bytes. \$\endgroup\$