[Python-Dev] RE: cloning iterators again

Raymond Hettinger python at rcn.com
Mon Oct 27 06:24:57 EST 2003


> I also note that the current tee() doesn't let you use __copy__ easily
> (it would be quite messy I think). The linked-list version supports
> __copy__ trivially. This may be important if we execute (as I think
> we should) on the idea of making selected iterators __copy__-able
> (especially all the standard container iterators and xrange).

The current tee() was written to support only a two way split, but it
can easily be cast as a multi-way splitter with no problem. 
The only real difference in the ideas presented so far are whether the
underlying queue should be implemented as a singly linked list or as a
double stack.
As a proof-of-concept, here is GvR's code re-cast with the queue changed
to a double stack implementation. The interface is completely
unchanged. The memory consumed is double that the current tee() but
much less than the linked list version. The speed is half that of the
current tee() and roughly comparable to or slightly better than the
linked list version.
Raymond Hettinger
------------------------------------------------------------------------
--
""" Guido's demo program re-cast with a different underlying data
structure
Replaces the linked list based queue with a two stack based queue.
Advantages:
 The double stack method consumes only two pointers per data element
 while the linked list method consumes space for a link object
 (8 to 10 words).
 The double stack method uses contiguous memory while the link
 objects are more fragmented.
 The stack method uses append() and pop() which are optimized to
 minimize memory management calls. For the link method, every 
 link costs a malloc and free.
Todo:
 Handle Wrappers that are GC'd before termination.
 Add support for holding an exception.
"""
class TeeMaster(object):
 """Holder for information common to wrapped iterators
 """
 def __init__(self, it):
 self.inbasket = []
 self.inrefcnt = []
 self.outbasket = []
 self.outrefcnt = []
 self.numseen = 0
 self.it = it
 self.numsplits = 0
class Wrapper(object):
 """Copyable wrapper around an iterator.
 Any number of Wrappers around the same iterator share the TeeMaster
 object. The Wrapper that is most behind will drop the refcnt to
 zero, which causes the reference to be popped off of the queue.
 The newest Wrapper gets a brand new TeeMaster object. Later
 wrappers share an existing TeeMaster object. Since they may
 have arrived late in the game, they need to know how many objects
 have already been seen by the wrapper. When they call next(),
 they ask for the next numseen.
 If a Wrapper is garbage-collected before it finishes, the refcount
 floor needs to be raised. That has not yet been implemented.
 """
 __slots__ = ["master", "numseen"]
 def __init__(self, it, master=None):
 """Constructor. The master argument is used by __copy__
below."""
 if master is None:
 master = TeeMaster(it)
 self.master = master
 self.numseen = master.numseen
 self.master.numsplits += 1
 def __copy__(self):
 """Copy the iterator.
 This returns a new iterator that will return the same series
 of results as the original.
 """
 return Wrapper(None, self.master)
 def __iter__(self):
 """All iterators should support __iter__() returning self."""
 return self
 def next(self):
 """Get the next value of the iterator, or raise
StopIteration."""
 master = self.master
 inbasket, inrefcnt = master.inbasket, master.inrefcnt
 if master.numseen == self.numseen:
 # This is the lead dog so get a value through the iterator
 value = master.it.next()
 master.numseen += 1 
 # Save it for the other dogs
 inbasket.append(value)
 inrefcnt.append(master.numsplits-1)
 self.numseen += 1
 return value
 # Not a lead dog -- the view never changes :-(
 location = len(inbasket) - (master.numseen - self.numseen)
 if location >= 0:
 # Our food is in the inbasket
 value = inbasket[location]
 inrefcnt[location] -= 1
 rc = inrefcnt[location]
 else:
 # Our food is in the outbasket
 location = -(location + 1)
 value = master.outbasket[location]
 master.outrefcnt[location] -= 1
 rc = master.outrefcnt[location]
 
 # Purge doggie bowl when no food is left
 if rc == 0:
 if len(master.outbasket) == 0:
 master.outbasket, master.inbasket = master.inbasket,
master.outbasket
 master.outrefcnt, master.inrefcnt = master.inrefcnt,
master.outrefcnt
 master.outbasket.reverse()
 master.outrefcnt.reverse()
 master.outbasket.pop()
 master.outrefcnt.pop()
 self.numseen += 1
 return value
def tee(it):
 """Replacement for Raymond's tee(); see examples in itertools
docs."""
 if not hasattr(it, "__copy__"):
 it = Wrapper(it)
 return (it, it.__copy__())
def test():
 """A simple demonstration of the Wrapper class."""
 import random
 def gen():
 for i in range(10):
 yield i
 it = gen()
 a, b = tee(it)
 b, c = tee(b)
 c, d = tee(c)
 iterators = [a, b, c, d]
 while iterators != [None, None, None, None]:
 i = random.randrange(4)
 it = iterators[i]
 if it is None:
 next = "----"
 else:
 try:
 next = it.next()
 except StopIteration:
 next = "****"
 iterators[i] = None
 print "%4d%s%4s%s" % (i, " ."*i, next, " ."*(3-i))
if __name__ == "__main__":
 test()


More information about the Python-Dev mailing list

AltStyle によって変換されたページ (->オリジナル) /