homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Mark.Bell
Recipients Catherine.Devlin, Mark.Bell, Philippe Cloutier, ZackerySpytz, barry, cheryl.sabella, corona10, gvanrossum, karlcow, mrabarnett, serhiy.storchaka, syeberman, veky
Date 2021年05月18日.13:13:50
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1621343631.06.0.324337514234.issue28937@roundup.psfhosted.org>
In-reply-to
Content
So I have taken a look at the original patch that was provided and I have been able to update it so that it is compatible with the current release. I have also flipped the logic in the wrapping functions so that they take a `keepempty` flag (which is the opposite of the `prune` flag). 
I had to make a few extra changes since there are now some extra catches in things like PyUnicode_Split which spot that if len(self) > len(sep) then they can just return [self]. However that now needs an extra test since that shortcut can only be used if len(self) > 0. You can find the code here: https://github.com/markcbell/cpython/tree/split-keepempty
However in exploring this, I'm not sure that this patch interacts correctly with maxsplit. For example, 
 ' x y z'.split(maxsplit=1, keepempty=True)
results in
 ['', '', 'x', 'y z']
since the first two empty strings items are "free" and don't count towards the maxsplit. I think the length of the result returned must be <= maxsplit + 1, is this right?
I'm about to rework the logic to avoid this, but before I go too far could someone double check my test cases to make sure that I have the correct idea about how this is supposed to work please. Only the 8 lines marked "New case" show new behaviour, all the other come from how string.split works currently. Of course the same patterns should apply to bytestrings and bytearrays.
 ''.split() == []
 ''.split(' ') == ['']
 ''.split(' ', keepempty=False) == [] # New case
 ' '.split(' ') == ['', '', '']
 ' '.split(' ', maxsplit=1) == ['', ' ']
 ' '.split(' ', maxsplit=1, keepempty=False) == [' '] # New case
 ' a b c '.split() == ['a', 'b', 'c']
 ​' a b c '.split(maxsplit=0) == ['a b c ']
 ​' a b c '.split(maxsplit=1) == ['a', 'b c ']
 ' a b c '.split(' ') == ['', '', 'a', 'b', 'c', '', '']
 ​' a b c '.split(' ', maxsplit=0) == [' a b c ']
 ​' a b c '.split(' ', maxsplit=1) == ['', ' a b c ']
 ​' a b c '.split(' ', maxsplit=2) == ['', '', 'a b c ']
 ​' a b c '.split(' ', maxsplit=3) == ['', '', 'a', 'b c ']
 ​' a b c '.split(' ', maxsplit=4) == ['', '', 'a', 'b', 'c ']
 ​' a b c '.split(' ', maxsplit=5) == ['', '', 'a', 'b', 'c', ' ']
 ​' a b c '.split(' ', maxsplit=6) == ['', '', 'a', 'b', 'c', '', '']
 ​' a b c '.split(' ', keepempty=False) == ['a', 'b', 'c'] # New case
 ​' a b c '.split(' ', maxsplit=0, keepempty=False) == [' a b c '] # New case
 ​' a b c '.split(' ', maxsplit=1, keepempty=False) == ['a', 'b c '] # New case
 ​' a b c '.split(' ', maxsplit=2, keepempty=False) == ['a', 'b', 'c '] # New case
 ​' a b c '.split(' ', maxsplit=3, keepempty=False) == ['a', 'b', 'c', ' '] # New case
 ​' a b c '.split(' ', maxsplit=4, keepempty=False) == ['a', 'b', 'c'] # New case
History
Date User Action Args
2021年05月18日 13:13:51Mark.Bellsetrecipients: + Mark.Bell, gvanrossum, barry, syeberman, mrabarnett, karlcow, serhiy.storchaka, Catherine.Devlin, veky, cheryl.sabella, corona10, ZackerySpytz, Philippe Cloutier
2021年05月18日 13:13:51Mark.Bellsetmessageid: <1621343631.06.0.324337514234.issue28937@roundup.psfhosted.org>
2021年05月18日 13:13:51Mark.Belllinkissue28937 messages
2021年05月18日 13:13:50Mark.Bellcreate

AltStyle によって変換されたページ (->オリジナル) /