This class is part of a utility that reads lines from a set of nonblocking file descriptors, blocking only when there are no complete lines to emit.
class NonblockingLineBuffer:
def __init__(self, fd, encoding):
self.fd = fd
self.enc = encoding
self.buf = bytearray()
def absorb(self):
while True:
try:
block = os.read(self.fd, 8192)
except BlockingIOError:
return
if block:
self.buf.extend(block)
else:
self.is_open = False
# We don't close the file here because caller
# needs to remove the fd from the poll set first.
return
def emit(self):
def emit1(chunk):
self.emitted_this_cycle = True
return (self.fd, chunk.decode(self.enc).rstrip())
buf = self.buf
self.emitted_this_cycle = False
while buf:
r = buf.find(b'\r')
n = buf.find(b'\n')
if r == -1 and n == -1:
if not self.is_open:
yield emit1(buf)
buf.clear()
elif r == -1 or r > n:
yield emit1(buf[:n])
buf = buf[(n+1):]
elif n == -1 or n > r:
yield emit1(buf[:r])
if n == r+1:
buf = buf[(r+2):]
else:
buf = buf[(r+1):]
self.buf = buf
if not self.is_open:
self.emitted_this_cycle = True
yield (self.fd, None)
This question is specifically about emit
, which is complicated, confusing, and might not be as efficient as it could be. Please suggest ways to make it less complicated and/or confusing, and more efficient.
(I know it could be much simpler if it didn't need to reimplement universal newline handling, but that is unfortunately a requirement from the larger context.)
(If there's something in the standard library that does some or all of the larger task, that would also be a welcome answer.)
-
\$\begingroup\$ Would you post the entire utility that works with multiple file descriptors? I suspect that there's a simpler way to achieve the same effect. \$\endgroup\$200_success– 200_success2015年03月17日 01:36:32 +00:00Commented Mar 17, 2015 at 1:36
-
1\$\begingroup\$ @200_success I made a bunch of changes since I posted the question, and I also got some good advice from Janne Karila about the narrowly-construed problem, so I've done as you suggest in a new question: codereview.stackexchange.com/questions/84299/… \$\endgroup\$zwol– zwol2015年03月17日 18:30:01 +00:00Commented Mar 17, 2015 at 18:30
1 Answer 1
- You have overlooked a corner case: while you normally treat
\r\n
as a single separator, this not the case when the two bytes are split between blocks. splitlines
could handle the universal newlines for you.
Here's what I came up with; still not very pretty I'm afraid. Initialize self.carry_cr = False
in constructor.
def emit(self):
buf = self.buf
if buf:
# skip \n if previous buffer ended with \r
if self.carry_cr and buf.startswith(b'\n'):
del buf[0]
self.carry_cr = False
lines = buf.splitlines()
if buf:
if self.is_open and not buf.endswith((b'\r', b'\n')):
buf = lines.pop()
else:
if buf.endswith(b'\r'):
self.carry_cr = True
del buf[:]
self.buf = buf
self.emitted_this_cycle = False
if lines:
self.emitted_this_cycle = True
for line in lines:
yield (self.fd, line.decode(self.enc).rstrip())
if not self.is_open:
self.emitted_this_cycle = True
yield (self.fd, None)
-
\$\begingroup\$ This is enough of an improvement over what I had that I'm going to accept this and ask a new question showing more of the context (as suggested by 200_success). Thank you. \$\endgroup\$zwol– zwol2015年03月17日 18:15:34 +00:00Commented Mar 17, 2015 at 18:15