Extracting lines from a bytearray

Question 1

This class is part of a utility that reads lines from a set of nonblocking file descriptors, blocking only when there are no complete lines to emit.

class NonblockingLineBuffer:
 def __init__(self, fd, encoding):
 self.fd = fd
 self.enc = encoding
 self.buf = bytearray()
 def absorb(self):
 while True:
 try:
 block = os.read(self.fd, 8192)
 except BlockingIOError:
 return
 if block:
 self.buf.extend(block)
 else:
 self.is_open = False
 # We don't close the file here because caller
 # needs to remove the fd from the poll set first.
 return
 def emit(self):
 def emit1(chunk):
 self.emitted_this_cycle = True
 return (self.fd, chunk.decode(self.enc).rstrip())
 buf = self.buf
 self.emitted_this_cycle = False
 while buf:
 r = buf.find(b'\r')
 n = buf.find(b'\n')
 if r == -1 and n == -1:
 if not self.is_open:
 yield emit1(buf)
 buf.clear()
 elif r == -1 or r > n:
 yield emit1(buf[:n])
 buf = buf[(n+1):]
 elif n == -1 or n > r:
 yield emit1(buf[:r])
 if n == r+1:
 buf = buf[(r+2):]
 else:
 buf = buf[(r+1):]
 self.buf = buf
 if not self.is_open:
 self.emitted_this_cycle = True
 yield (self.fd, None)

This question is specifically about emit, which is complicated, confusing, and might not be as efficient as it could be. Please suggest ways to make it less complicated and/or confusing, and more efficient.

(I know it could be much simpler if it didn't need to reimplement universal newline handling, but that is unfortunately a requirement from the larger context.)

(If there's something in the standard library that does some or all of the larger task, that would also be a welcome answer.)

Question 2

Would you post the entire utility that works with multiple file descriptors? I suspect that there's a simpler way to achieve the same effect.

Question 3

@200_success I made a bunch of changes since I posted the question, and I also got some good advice from Janne Karila about the narrowly-construed problem, so I've done as you suggest in a new question: codereview.stackexchange.com/questions/84299/…

Question 4

You have overlooked a corner case: while you normally treat \r\n as a single separator, this not the case when the two bytes are split between blocks.
splitlines could handle the universal newlines for you.

Here's what I came up with; still not very pretty I'm afraid. Initialize self.carry_cr = False in constructor.

def emit(self):
 buf = self.buf
 if buf:
 # skip \n if previous buffer ended with \r
 if self.carry_cr and buf.startswith(b'\n'):
 del buf[0]
 self.carry_cr = False
 lines = buf.splitlines()
 if buf:
 if self.is_open and not buf.endswith((b'\r', b'\n')):
 buf = lines.pop()
 else:
 if buf.endswith(b'\r'):
 self.carry_cr = True
 del buf[:]
 self.buf = buf
 self.emitted_this_cycle = False
 if lines:
 self.emitted_this_cycle = True
 for line in lines:
 yield (self.fd, line.decode(self.enc).rstrip())
 if not self.is_open:
 self.emitted_this_cycle = True
 yield (self.fd, None)

Question 5

This is enough of an improvement over what I had that I'm going to accept this and ask a new question showing more of the context (as suggested by 200_success). Thank you.

Janne Karila Janne Karila 10.6k21 silver badges34 bronze badges · Accepted Answer · 2015-03-17 12:01:00Z

You have overlooked a corner case: while you normally treat \r\n as a single separator, this not the case when the two bytes are split between blocks.
splitlines could handle the universal newlines for you.

Here's what I came up with; still not very pretty I'm afraid. Initialize self.carry_cr = False in constructor.

def emit(self):
 buf = self.buf
 if buf:
 # skip \n if previous buffer ended with \r
 if self.carry_cr and buf.startswith(b'\n'):
 del buf[0]
 self.carry_cr = False
 lines = buf.splitlines()
 if buf:
 if self.is_open and not buf.endswith((b'\r', b'\n')):
 buf = lines.pop()
 else:
 if buf.endswith(b'\r'):
 self.carry_cr = True
 del buf[:]
 self.buf = buf
 self.emitted_this_cycle = False
 if lines:
 self.emitted_this_cycle = True
 for line in lines:
 yield (self.fd, line.decode(self.enc).rstrip())
 if not self.is_open:
 self.emitted_this_cycle = True
 yield (self.fd, None)

This is enough of an improvement over what I had that I'm going to accept this and ask a new question showing more of the context (as suggested by 200_success). Thank you.

Stack Exchange Network

Extracting lines from a bytearray

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Extracting lines from a bytearray

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions