[Python-checkins] peps: Latest update.

georg.brandl python-checkins at python.org
Wed Mar 23 21:22:28 CET 2011


http://hg.python.org/peps/rev/ff76cb6c841d
changeset: 19:ff76cb6c841d
user: Barry Warsaw <barry at python.org>
date: Mon Jul 17 18:49:21 2000 +0000
summary:
 Latest update.
After consultation with Guido, zip() is chosen as the name of this
built-in.
In reference implementation added an __len__() method.
Added a `Rejected Elaborations' section to talk about suggestions from
the list that I've rejected (and the reasoning behind the rejection).
Also: rewrite of paragraph 1 under "Standard For-Loops" for clarity;
Spelling and grammar fixes; use a References section.
files:
 pep-0201.txt | 195 ++++++++++++++++++++++++++++++++-------
 1 files changed, 160 insertions(+), 35 deletions(-)
diff --git a/pep-0201.txt b/pep-0201.txt
--- a/pep-0201.txt
+++ b/pep-0201.txt
@@ -25,9 +25,10 @@
 Motivation for this feature has its roots in a concept described
 as `parallel for loops'. A standard for-loop in Python iterates
 over every element in the sequence until the sequence is
- exhausted. The for-loop can also be explicitly exited with a
- `break' statement, and for-loops can have else: clauses, but these
- is has no bearing on this PEP.
+ exhausted. A `break' statement inside the loop suite causes an
+ explicit loop exit. For-loops also have else: clauses which get
+ executed when the loop exits normally (i.e. not by execution of a
+ break).
 
 For-loops can iterate over built-in types such as lists and
 tuples, but they can also iterate over instance types that conform
@@ -35,13 +36,13 @@
 instance should implement the __getitem__() method, expecting a
 monotonically increasing index starting at 0, and this method
 should raise an IndexError when the sequence is exhausted. This
- protocol is current undocumented -- a defect in Python's
+ protocol is currently undocumented -- a defect in Python's
 documentation hopefully soon corrected.
 
- For loops are described in the language reference manual here
- http://www.python.org/doc/devel/ref/for.html
+ For-loops are described in the Python language reference
+ manual[1].
 
- An example for-loop
+ An example for-loop:
 
 >>> for i in (1, 2, 3): print i
 ... 
@@ -88,7 +89,7 @@
 
 - The use of the magic `None' first argument is non-obvious.
 
- - Its has arbitrary, often unintended, and inflexible semantics
+ - It has arbitrary, often unintended, and inflexible semantics
 when the lists are not of the same length: the shorter sequences
 are padded with `None'.
 
@@ -110,11 +111,11 @@
 
 The proposed solution is to introduce a new built-in sequence
 generator function, available in the __builtin__ module. This
- function is to be called `marry' and has the following signature:
+ function is to be called `zip' and has the following signature:
 
- marry(seqa, [seqb, [...]], [pad=<value>])
+ zip(seqa, [seqb, [...]], [pad=<value>])
 
- marry() takes one or more sequences and weaves their elements
+ zip() takes one or more sequences and weaves their elements
 together, just as map(None, ...) does with sequences of equal
 length. The optional keyword argument `pad', if supplied, is a
 value used to pad all shorter sequences to the length of the
@@ -122,15 +123,15 @@
 the shortest sequence is exhausted.
 
 It is not possible to pad short lists with different pad values,
- nor will marry() ever raise an exception with lists of different
- lengths. To accomplish both of these, the sequences must be
- checked and processed before the call to marry().
+ nor will zip() ever raise an exception with lists of different
+ lengths. To accomplish either behavior, the sequences must be
+ checked and processed before the call to zip().
 
 
 
 Lazy Execution
 
- For performance purposes, marry() does not construct the list of
+ For performance purposes, zip() does not construct the list of
 tuples immediately. Instead it instantiates an object that
 implements a __getitem__() method and conforms to the informal
 for-loop protocol. This method constructs the individual tuples
@@ -148,25 +149,25 @@
 >>> c = (9, 10, 11)
 >>> d = (12, 13)
 
- >>> marry(a, b)
+ >>> zip(a, b)
 [(1, 5), (2, 6), (3, 7), (4, 8)]
 
- >>> marry(a, d)
+ >>> zip(a, d)
 [(1, 12), (2, 13)]
 
- >>> marry(a, d, pad=0)
+ >>> zip(a, d, pad=0)
 [(1, 12), (2, 13), (3, 0), (4, 0)]
 
- >>> marry(a, d, pid=0)
+ >>> zip(a, d, pid=0)
 Traceback (most recent call last):
 File "<stdin>", line 1, in ?
- File "/usr/tmp/python-iKAOxR", line 11, in marry
+ File "/usr/tmp/python-iKAOxR", line 11, in zip
 TypeError: unexpected keyword arguments
 
- >>> marry(a, b, c, d)
+ >>> zip(a, b, c, d)
 [(1, 5, 9, 12), (2, 6, 10, 13)]
 
- >>> marry(a, b, c, d, pad=None)
+ >>> zip(a, b, c, d, pad=None)
 [(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
 >>> map(None, a, b, c, d)
 [(1, 5, 9, 12), (2, 6, 10, 13), (3, 7, 11, None), (4, 8, None, None)]
@@ -175,17 +176,19 @@
 
 Reference Implementation
 
- Here is a reference implementation, in Python of the marry()
+ Here is a reference implementation, in Python of the zip()
 built-in function and helper class. These would ultimately be
 replaced by equivalent C code.
 
- class _Marriage:
+ class _Zipper:
 def __init__(self, args, kws):
+ # Defaults
 self.__padgiven = 0
 if kws.has_key('pad'):
 self.__padgiven = 1
 self.__pad = kws['pad']
 del kws['pad']
+ # Assert no unknown arguments are left
 if kws:
 raise TypeError('unexpected keyword arguments')
 self.__sequences = args
@@ -206,6 +209,23 @@
 ret.append(self.__pad)
 return tuple(ret)
 
+ def __len__(self):
+ # If we're padding, then len is the length of the longest sequence,
+ # otherwise it's the length of the shortest sequence.
+ if not self.__padgiven:
+ shortest = -1
+ for s in self.__sequences:
+ slen = len(s)
+ if shortest < 0 or slen < shortest:
+ shortest = slen
+ return shortest
+ longest = 0
+ for s in self.__sequences:
+ slen = len(s)
+ if slen > longest:
+ longest = slen
+ return longest
+
 def __str__(self):
 ret = []
 i = 0
@@ -219,25 +239,130 @@
 __repr__ = __str__
 
 
- def marry(*args, **kws):
- return _Marriage(args, kws)
+ def zip(*args, **kws):
+ return _Zipper(args, kws)
+
+
+
+Rejected Elaborations
+
+ Some people have suggested that the user be able to specify the
+ type of the inner and outer containers for the zipped sequence.
+ This would be specified by additional keyword arguments to zip(),
+ named `inner' and `outer'.
+
+ This elaboration is rejected for several reasons. First, there
+ really is no outer container, even though there appears to be an
+ outer list container the example above. This is simply an
+ artifact of the repr() of the zipped object. User code can do its
+ own looping over the zipped object via __getitem__(), and build
+ any type of outer container for the fully evaluated, concrete
+ sequence. For example, to build a zipped object with lists as an
+ outer container, use
+
+ >>> list(zip(sequence_a, sequence_b, sequence_c))
+
+ for tuple outer container, use
+ 
+ >>> tuple(zip(sequence_a, sequence_b, sequence_c))
+
+ This type of construction will usually not be necessary though,
+ since it is expected that zipped objects will most often appear in
+ for-loops.
+
+ Second, allowing the user to specify the inner container
+ introduces needless complexity and arbitrary decisions. You might
+ imagine that instead of the default tuple inner container, the
+ user could prefer a list, or a dictionary, or instances of some
+ sequence-like class.
+
+ One problem is the API. Should the argument to `inner' be a type
+ or a template object? For flexibility, the argument should
+ probably be a type object (i.e. TupleType, ListType, DictType), or
+ a class. For classes, the implementation could just pass the zip
+ element to the constructor. But what about built-in types that
+ don't have constructors? They would have to be special-cased in
+ the implementation (i.e. what is the constructor for TupleType?
+ The tuple() built-in).
+
+ Another problem that arises is for zips greater than length two.
+ Say you had three sequences and you wanted the inner type to be a
+ dictionary. What would the semantics of the following be?
+
+ >>> zip(sequence_a, sequence_b, sequence_c, inner=DictType)
+
+ Would the key be (element_a, element_b) and the value be
+ element_c, or would the key be element_a and the value be
+ (element_b, element_c)? Or should an exception be thrown?
+
+ This suggests that the specification of the inner container type
+ is needless complexity. It isn't likely that the inner container
+ will need to be specified very often, and it is easy to roll your
+ own should you need it. Tuples are chosen for the inner container
+ type due to their (slight) memory footprint and performance
+ advantages.
 
 
 
 Open Issues
 
- What should "marry(a)" do?
+ - What should "zip(a)" do? Given
 
- Given a = (1, 2, 3), should marry(a) return [(1,), (2,), (3,)] or
- should it return [1, 2, 3]? The first is more consistent with the
- description given above, while the latter is what map(None, a)
- does, and may be more consistent with user expectation.
+ a = (1, 2, 3); zip(a)
 
- The latter interpretation requires special casing, which is not
- present in the reference implementation. It returns
+ three outcomes are possible.
 
- >>> marry(a)
- [(1,), (2,), (3,), (4,)]
+ 1) Returns [(1,), (2,), (3,)]
+
+ Pros: no special casing in the implementation or in user
+ code, and is more consistent with the description of it's
+ semantics. Cons: this isn't what map(None, a) would return,
+ and may be counter to user expectations.
+
+ 2) Returns [1, 2, 3]
+
+ Pros: consistency with map(None, a), and simpler code for
+ for-loops, e.g.
+
+ for i in zip(a):
+
+ instead of
+
+ for (i,) in zip(a):
+
+ Cons: too much complexity and special casing for what should
+ be a relatively rare usage pattern.
+
+ 3) Raises TypeError
+
+ Pros: None
+
+ Cons: needless restriction
+
+ Current scoring seems to generally favor outcome 1.
+
+ - The name of the built-in `zip' may cause some initial confusion
+ with the zip compression algorithm. Other suggestions include
+ (but are not limited to!): marry, weave, parallel, lace, braid,
+ interlace, permute, furl, tuples, lists, stitch, collate, knit,
+ plait, and with. All have disadvantages, and there is no clear
+ unanimous choice, therefore the decision was made to go with
+ `zip' because the same functionality is available in other
+ languages (e.g. Haskell) under the name `zip'[2].
+
+
+
+References
+
+ [1] http://www.python.org/doc/devel/ref/for.html 
+ [2] http://www.haskell.org/onlinereport/standard-prelude.html#$vzip
+
+ TBD: URL to python-dev archives
+
+
+Copyright
+
+ This document has been placed in the public domain.
 
 
 
-- 
Repository URL: http://hg.python.org/peps


More information about the Python-checkins mailing list

AltStyle によって変換されたページ (->オリジナル) /