Python 3.6.5
Is there any better solution than this one? Particularly the last line. I don't like it.
import re
s = "my_separator first thing my_separator second thing"
data = re.split("(my_separator )", s)[1:]
data = [even+odd for i, odd in enumerate(data) for j, even in enumerate(data) if i%2==1 and j%2==0 and i==j+1]
4 Answers 4
You can exploit zip
and iter
ators to allow you to pair things together:
data = [a + b for a, b in zip(*[iter(data)]*2)]
You could use just re
, and change the separator with a look ahead assertion.
data = re.split(" (?=my_separator)", s)
You can use str.split
, and just add the separator back:
sep = 'my_separator '
data = s.split(sep)[1:]
data = [sep + i for i in data]
data = [sep + i for i in s.split(sep)]
-
\$\begingroup\$ Didn't know about
itertools
recipes, thanks a lot! I consider the lookahead most elegant as I won't have to process it then. \$\endgroup\$VaNa– VaNa2018年05月15日 14:31:45 +00:00Commented May 15, 2018 at 14:31 -
\$\begingroup\$ The lookahead is not as general as I thought because split has to consume something and that something has to precede the separator :-(
pairwise
it is! \$\endgroup\$VaNa– VaNa2018年05月15日 14:46:45 +00:00Commented May 15, 2018 at 14:46 -
\$\begingroup\$
pairwise
cuts it as(1,2),(2,3),(3,4),...
, my code does(1,2),(3,4),...
. \$\endgroup\$VaNa– VaNa2018年05月15日 15:15:05 +00:00Commented May 15, 2018 at 15:15 -
1\$\begingroup\$ @VaNa My bad, yes, I've fixed that \$\endgroup\$2018年05月15日 15:19:58 +00:00Commented May 15, 2018 at 15:19
-
2\$\begingroup\$ @VaNa: The lookahead works fine, just not in Python. :-/ In Ruby :
"my_separator first thing my_separator second thing".split(/(?=my_separator)/)
. Done! \$\endgroup\$Eric Duminil– Eric Duminil2018年05月15日 16:07:36 +00:00Commented May 15, 2018 at 16:07
As already commented, use the str.split()
version itself:
SEPARATOR = "my_separator "
s = "my_separator first thing my_separator second thing"
data = [SEPARATOR + part for part in s.split(SEPARATOR) if part]
-
\$\begingroup\$ You meant it like this! You won! :-D \$\endgroup\$VaNa– VaNa2018年05月15日 15:19:36 +00:00Commented May 15, 2018 at 15:19
-
2\$\begingroup\$ Note that it doesn't work if
'my_separator'
isn't present in the string. \$\endgroup\$Eric Duminil– Eric Duminil2018年05月15日 16:04:41 +00:00Commented May 15, 2018 at 16:04
hjpotters92’s answer is great for fixed separator strings. If the separators vary and one wants to join them with each subsequent match one can use the following two approaches, neither of which requires closures:
1 Generator function
def split_with_separator1(s, sep):
tokens = iter(re.split(sep, s))
next(tokens)
while True:
yield next(tokens) + next(tokens)
The expression inside the loop works because the Python language guarantees left-to-right evaluation (unlike many other languages, e. g. C).
2 Interleaved slices and binary map
import operator
def split_with_separator2(s, sep)
tokens = re.split(sep, s)
return map(operator.add, tokens[1::2], tokens[2::2])
Of course one can slice with itertools.islice
instead if one doesn't want to create two ephemeral token list copies.
your last line "repaired"
import re
s = "my_separator first thing my_separator second thing"
data = re.split("(my_separator )", s)[1:]
data = [data[i]+data[i+1] for i in range(0, len(data), 2)]
-
1\$\begingroup\$ Although this one does not seem to be that elegant, it is correct and better than mine. Thanks! \$\endgroup\$VaNa– VaNa2018年05月15日 15:18:17 +00:00Commented May 15, 2018 at 15:18
str.split
can accept string separator. \$\endgroup\$re.split
expression in a group (()
) it keeps the separators in the results (but puts them separately from the results). \$\endgroup\$