Grouping string to list of strings

Question 1

The following is function that I have written to group a string to groups, based on whether there's consecutive repeated occurrence. Fop example AAAABBBBAAB is grouped as [A+,B+,A+,B]. Is it possible to make below code more pythonic? If yes, how?

def create_groups(alphabets):
 """ function group the alphabets to list of A(+)s and B(+)s """
 index = 1
 current = alphabets[0]
 count = 0
 groups = []
 accumulate = False
 while index < len(alphabets):
 if current == alphabets[index]:
 count += 1
 accumulate = True
 else:
 accumulate = False
 if accumulate == False or index == len(alphabets)-1:
 group_indicator = current + '+' if count > 0 else current
 groups.append(group_indicator)
 current = alphabets[index]
 count = 0
 index += 1
 return groups

Question 2

First of all, your method is not really correct: for AAAABBBBAAB it returns [A+, B+, A+] instead of the required [A+, B+, A+, B]. That's because the last group is never added to the list of groups.

This is not very Pythonic:

if accumulate == False:

Write it this way instead:

if not accumulate:

Also, instead of iterating over the "alphabet" using indexes, it would be more Pythonic to rewrite to iterate over each letter, in the style for letter in alphabet.

"alphabets" is not a good name. It seems letters would be better.

The algorithm can be simplified, and you could eliminate several intermediary variables:

def create_groups(letters):
 """ function group the alphabets to list of A(+)s and B(+)s """
 prev = letters[0]
 count = 0
 groups = []
 for current in letters[1:] + '0円':
 if current == prev:
 count += 1
 else:
 group_indicator = prev + '+' if count > 0 else prev
 groups.append(group_indicator)
 count = 0
 prev = current
 return groups

In the for loop, I appended '0円' to the end, as a dirty trick to make the loop do one more iteration to append the last letter group to groups. For this to work, it must be a character that's different from the last letter in letters.

The above is sort of a "naive" solution, in the sense that probably there is a Python library that can do this easier. Kinda like what @jonrsharpe suggested, but he didn't complete the solution of converting [['A', 'A', 'A', 'A'], ['B', 'B', 'B', 'B'], ['A', 'A'], ['B']] in the format that you need. Based on his solution, you could do something like this:

from itertools import groupby
def create_groups(letters):
 return [x + '+' if list(g)[1:] else x for x, g in groupby(letters, str)]

What I don't like about this is the way we put the letters in a list just to know if there are 2 or more of them (the list(g)[1:] step). There might be a better way.

Question 3

"he didn't complete the solution" - this isn't a code-writing service, I thought I'd let the OP have some of the fun!

Question 4

You can simplify your logic significantly using itertools.groupby:

>>> from itertools import groupby
>>> [list(g) for _, g in groupby("AAAABBBBAAB")]
[['A', 'A', 'A', 'A'], ['B', 'B', 'B', 'B'], ['A', 'A'], ['B']]

Question 5

ord is unnecessary here: "If not specified, key defaults to an identity function and returns the element unchanged."

Question 6

@GarethRees ah, I'd read that as meaning id. Edited, thanks.

janos janos 113k15 gold badges154 silver badges396 bronze badges · Accepted Answer · 2014-08-24 14:15:14Z

First of all, your method is not really correct: for AAAABBBBAAB it returns [A+, B+, A+] instead of the required [A+, B+, A+, B]. That's because the last group is never added to the list of groups.

This is not very Pythonic:

if accumulate == False:

Write it this way instead:

if not accumulate:

Also, instead of iterating over the "alphabet" using indexes, it would be more Pythonic to rewrite to iterate over each letter, in the style for letter in alphabet.

"alphabets" is not a good name. It seems letters would be better.

The algorithm can be simplified, and you could eliminate several intermediary variables:

def create_groups(letters):
 """ function group the alphabets to list of A(+)s and B(+)s """
 prev = letters[0]
 count = 0
 groups = []
 for current in letters[1:] + '0円':
 if current == prev:
 count += 1
 else:
 group_indicator = prev + '+' if count > 0 else prev
 groups.append(group_indicator)
 count = 0
 prev = current
 return groups

In the for loop, I appended '0円' to the end, as a dirty trick to make the loop do one more iteration to append the last letter group to groups. For this to work, it must be a character that's different from the last letter in letters.

The above is sort of a "naive" solution, in the sense that probably there is a Python library that can do this easier. Kinda like what @jonrsharpe suggested, but he didn't complete the solution of converting [['A', 'A', 'A', 'A'], ['B', 'B', 'B', 'B'], ['A', 'A'], ['B']] in the format that you need. Based on his solution, you could do something like this:

from itertools import groupby
def create_groups(letters):
 return [x + '+' if list(g)[1:] else x for x, g in groupby(letters, str)]

What I don't like about this is the way we put the letters in a list just to know if there are 2 or more of them (the list(g)[1:] step). There might be a better way.

"he didn't complete the solution" - this isn't a code-writing service, I thought I'd let the OP have some of the fun!

Stack Exchange Network

Grouping string to list of strings

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Grouping string to list of strings

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions