Message46359
| Author |
filip |
| Recipients |
| Date |
2006年01月16日.21:56:08 |
| SpamBayes Score |
| Marked as misclassified |
| Message-id |
| In-reply-to |
| Content |
Logged In: YES
user_id=308203
I agree completely that splitting on non-zero matches should
be supported - and that the default behavior should change
at some point - but I don't think this patch quite covers
it. Taking an example from the dev-python thread back in
August of 2004
(http://mail.python.org/pipermail/python-dev/2004-August/047272.html):
>>> re.split('x*', 'abxxxcdefxxx', emptyok=True)
['', 'a', 'b', '', 'c', 'd', 'e', 'f', '', '']
To me, this means there's an empty string, beginning and
ending in pos 0, followed by a zero-width divider also
beginning and ending in the same position, followed by an
'a', etc. That seems awkward to me. I think a more intuitive
result would be (I'm omitting the emptyok argument in the
following examples):
>>> re.split('x*', 'abxxxcdefxxx')
['a', 'b', 'c', 'd', 'e', 'f', '']
That is, empty matches cause a split when they are not
adjacent to a non-empty match and not at the beginning or
the end of the string. Grouping parentheses would, of
course, reveal the empty-string boundaries:
>>> re.split('(x*)', 'abxxxcdefxxx')
['', 'a', '', 'b', 'xxx', '', 'c', '', 'd', '', 'e', '',
'f', 'xxx', '']
Using the same approach, these results would also seem
perfectly reasonable to me:
>>> re.split('(?m)$', 'foo\nbar\nbaz')
['foo', '\nbar', '\nbaz']
>>> re.split('(?m)^', 'foo\nbar\nbaz')
['foo\n', 'bar\n', 'baz']
Splitting a one-character string should be possible only if
the pattern matches that character:
>>> re.split('\w*', 'a')
['', '']
>>> re.split('\d*', 'a')
['a']
|
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2007年08月23日 15:38:36 | admin | link | issue988761 messages |
| 2007年08月23日 15:38:36 | admin | create |
|