The following is function that I have written to group a string to groups, based on whether there's consecutive repeated occurrence. Fop example AAAABBBBAAB
is grouped as [A+,B+,A+,B]
. Is it possible to make below code more pythonic? If yes, how?
def create_groups(alphabets):
""" function group the alphabets to list of A(+)s and B(+)s """
index = 1
current = alphabets[0]
count = 0
groups = []
accumulate = False
while index < len(alphabets):
if current == alphabets[index]:
count += 1
accumulate = True
else:
accumulate = False
if accumulate == False or index == len(alphabets)-1:
group_indicator = current + '+' if count > 0 else current
groups.append(group_indicator)
current = alphabets[index]
count = 0
index += 1
return groups
2 Answers 2
First of all, your method is not really correct: for AAAABBBBAAB
it returns [A+, B+, A+]
instead of the required [A+, B+, A+, B]
. That's because the last group is never added to the list of groups.
This is not very Pythonic:
if accumulate == False:
Write it this way instead:
if not accumulate:
Also, instead of iterating over the "alphabet" using indexes, it would be more Pythonic to rewrite to iterate over each letter, in the style for letter in alphabet
.
"alphabets" is not a good name. It seems letters
would be better.
The algorithm can be simplified, and you could eliminate several intermediary variables:
def create_groups(letters):
""" function group the alphabets to list of A(+)s and B(+)s """
prev = letters[0]
count = 0
groups = []
for current in letters[1:] + '0円':
if current == prev:
count += 1
else:
group_indicator = prev + '+' if count > 0 else prev
groups.append(group_indicator)
count = 0
prev = current
return groups
In the for
loop, I appended '0円'
to the end, as a dirty trick to make the loop do one more iteration to append the last letter group to groups
. For this to work, it must be a character that's different from the last letter in letters
.
The above is sort of a "naive" solution, in the sense that probably there is a Python library that can do this easier. Kinda like what @jonrsharpe suggested, but he didn't complete the solution of converting [['A', 'A', 'A', 'A'], ['B', 'B', 'B', 'B'], ['A', 'A'], ['B']]
in the format that you need. Based on his solution, you could do something like this:
from itertools import groupby
def create_groups(letters):
return [x + '+' if list(g)[1:] else x for x, g in groupby(letters, str)]
What I don't like about this is the way we put the letters in a list just to know if there are 2 or more of them (the list(g)[1:]
step). There might be a better way.
-
\$\begingroup\$ "he didn't complete the solution" - this isn't a code-writing service, I thought I'd let the OP have some of the fun! \$\endgroup\$jonrsharpe– jonrsharpe2014年08月25日 10:41:25 +00:00Commented Aug 25, 2014 at 10:41
You can simplify your logic significantly using itertools.groupby
:
>>> from itertools import groupby
>>> [list(g) for _, g in groupby("AAAABBBBAAB")]
[['A', 'A', 'A', 'A'], ['B', 'B', 'B', 'B'], ['A', 'A'], ['B']]
-
\$\begingroup\$
ord
is unnecessary here: "If not specified,key
defaults to an identity function and returns the element unchanged." \$\endgroup\$Gareth Rees– Gareth Rees2014年08月26日 11:35:52 +00:00Commented Aug 26, 2014 at 11:35 -
\$\begingroup\$ @GarethRees ah, I'd read that as meaning
id
. Edited, thanks. \$\endgroup\$jonrsharpe– jonrsharpe2014年08月26日 15:05:18 +00:00Commented Aug 26, 2014 at 15:05