homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: split(None, maxsplit) does not strip whitespace correctly
Type: behavior Stage:
Components: Documentation Versions: Python 3.0, Python 2.4, Python 2.6, Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, effbot, fdrake, georg.brandl, jafo, nirs
Priority: low Keywords:

Created on 2007年09月07日 01:18 by nirs, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Messages (11)
msg55720 - (view) Author: Nir Soffer (nirs) * Date: 2007年09月07日 01:18
string object .split doc say (http://docs.python.org/lib/string-
methods.html):
 "If sep is not specified or is None, a different splitting algorithm 
is applied. First, whitespace characters (spaces, tabs, newlines, 
returns, and formfeeds) are stripped from both ends."
If the maxsplit argument is set and is smaller then the number of 
possible parts, whitespace is not removed.
Examples:
>>> 'k: v\n'.split(None, 1)
['k:', 'v\n']
Expected: ['k:', 'v']
>>> u'k: v\n'.split(None, 1)
[u'k:', u'v\n']
Expected: [u'k:', u'v']
With larger values of maxsplits, it works correctly:
>>> 'k: v\n'.split(None, 2)
['k:', 'v']
>>> u'k: v\n'.split(None, 2)
[u'k:', u'v']
This looks like implementation bug, because there it does not make sense 
that the striping depends on the maxsplit argument, and it will be hard 
to explain such behavior.
Maybe the striping should be removed in Python 3? It does not make sense 
to strip a string behind your back when you want to split it, and the 
caller can easily strip the string if needed.
msg55806 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2007年09月10日 22:13
Looks like a *documentation* bug to me; at the implementation level,
None just means "no empty parts, treat runs of whitespace as separators".
msg55807 - (view) Author: Nir Soffer (nirs) * Date: 2007年09月10日 22:32
I did not look into the source, but obviously there is striping of 
leading and trailing whitespace. 
When you specify a separator you get:
>>> ' '.split(' ')
['', '', '']
>>> ' a b '.split(' ')
['', 'a', 'b', '']
So one would expect to get this without striping:
>>> ' a b '.split()
['', 'a', 'b', '']
But you get this:
>>> ' a b '.split()
['a', 'b']
So the documentation is correct.
msg55809 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2007年09月10日 22:41
But wasn't your complaint that the implementation didn't match the
documentation?
As I said, the *implementation* treats "runs of whitespace" as
separators, except for whitespace at the beginning or end (or in other
words, it never returns empty strings). That matches the documentation,
except for the "first" in "first, whitespace characters are stripped
from both ends". As far as I can tell, the documentation has never
matched the implementation here.
msg55819 - (view) Author: Nir Soffer (nirs) * Date: 2007年09月11日 11:12
There is a problem only when maxsplit is smaller than the available 
splits. In other cases, the docs and the behavior match.
msg55962 - (view) Author: Sean Reifschneider (jafo) * (Python committer) Date: 2007年09月17日 11:05
I believe this is just a place where the documentation could be cleared
up. Seems to me the confusion is from the document saying
(paraphrased): "white space is removed from both ends".
Perhaps it should say something like "runs of 1 or more whitespace are
collapsed (up to the maximum split), and then split on" or simply "split
on runs of 1 or more whitespace. In other words, 3 spaces together
would be treated as a single split-point instead of 3 0-length fields
separated by spaces."
So, in the first example provided by "nirs" in this issue, "both ends"
refers to both the left and right side of "k:". Since maxsplit is 1,
the second part (v) is left untouched. This is the intended operation.
This is a documentation bug, not a library bug.
Fred: Thoughts on wording?
msg56021 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2007年09月19日 02:22
The algorithm is actually kind of odd::
 >>> " a b".split(None, 0)
 ['a b']
 >>> "a b ".split(None, 0)
 ['a b ']
 >>> "a b ".split(None, 1)
 ['a', 'b ']
So trailing whitespace on the original string is stripped only if the
number of splits is great enough to lead to a possible split past the
last element. But leading whitespace is always removed.
Basically the algorithm stops looking for whitespace once it has
encountered maxsplit instances of contiguous whitespace plus leading
whitespace.
msg56024 - (view) Author: Sean Reifschneider (jafo) * (Python committer) Date: 2007年09月19日 02:42
In looking at the current documentation:
http://docs.python.org/dev/library/string.html#string.split
I don't see the wording the original poster mentions. The current
documentation of the separator is clear and reasonable. I'm going to
call this closed, unless someone can suggest specific wording changes to
the document let's call this done.
msg56026 - (view) Author: Nir Soffer (nirs) * Date: 2007年09月19日 04:12
I quoted str.split docs:
- http://docs.python.org/lib/string-methods.html
- http://docs.python.org/dev/library/stdtypes.html
- http://docs.python.org/dev/3.0/library/stdtypes.html
string.split doc does it explain this:
>>> ' a b '.split(None, 1)
['a', 'b ']
>>> ' a b '.split(None, 2)
['a', 'b']
.split method docs is more clear and describe this in a very simple way. 
This is a better description of the current behavior:
 "If sep is not specified or is None, a different splitting algorithm 
is applied. First, whitespace characters (spaces, tabs, newlines, 
returns, and formfeeds) are stripped from the start of the string. Then, 
words are separated by arbitrary length strings of whitespace 
characters. Consecutive whitespace delimiters are treated as a single 
delimiter ("' 1 \t 2 \n 3 '.split()" returns "['1', '2', '3']").
 If maxsplit is nonzero, at most maxsplit number of splits occur, and 
the remainder of the string is returned as the final element of the 
list, unless it is empty. Splitting an empty string or a string 
consisting of just whitespace returns an empty list."
msg56257 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2007年10月07日 20:05
Re-opening as jafo was referring to the string module's function
implementation which is deprecated. The real issue is that the
built-in types docs are bad.
msg56272 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2007年10月08日 07:50
This should now be fixed in r58368.
History
Date User Action Args
2022年04月11日 14:56:26adminsetgithub: 45464
2007年10月08日 07:50:36georg.brandlsetstatus: open -> closed
nosy: + georg.brandl
resolution: fixed
messages: + msg56272
2007年10月07日 20:05:53brett.cannonlinkissue1240 superseder
2007年10月07日 20:05:30brett.cannonsetstatus: closed -> open
assignee: fdrake ->
messages: + msg56257
resolution: not a bug -> (no value)
versions: + Python 2.6
2007年09月19日 04:12:51nirssetmessages: + msg56026
2007年09月19日 02:42:36jafosetstatus: open -> closed
resolution: not a bug
messages: + msg56024
2007年09月19日 02:22:39brett.cannonsetnosy: + brett.cannon
messages: + msg56021
2007年09月17日 11:05:20jafosetpriority: low
assignee: fdrake
messages: + msg55962
components: + Documentation, - Library (Lib)
nosy: + fdrake, jafo
2007年09月11日 11:12:44nirssetmessages: + msg55819
2007年09月10日 22:41:05effbotsetmessages: + msg55809
2007年09月10日 22:32:47nirssetmessages: + msg55807
2007年09月10日 22:13:35effbotsetnosy: + effbot
messages: + msg55806
2007年09月07日 17:31:54gvanrossumsetmessages: - msg55726
2007年09月07日 17:31:49gvanrossumsetmessages: - msg55721
2007年09月07日 02:04:13nirssettype: behavior
messages: + msg55726
2007年09月07日 01:19:23nirssetmessages: + msg55721
title: split(None, maxplit) does not strip whitespace correctly -> split(None, maxsplit) does not strip whitespace correctly
2007年09月07日 01:18:40nirscreate

AltStyle によって変換されたページ (->オリジナル) /