1
\$\begingroup\$

I was thinking about splitting a string based on certain categories, for example alphabetic and numerical, lower and upper case, and even the 3 lines of the keyboard. I wrote a generic function for this but it seems unreadable, so I ask you for better modularization and improvements to readability.

import doctest
def split_on_changes(string, cats):
 """
 Given an input list and some categories,
 splits the list whenever the category changes.
 Also splits if the next char is in no categories.
 >>> split_on_changes("010010110", ["0", "1"])
 ['0', '1', '00', '1', '0', '11', '0']
 >>> split_on_changes("this is an alaskian example", ["qwertyuiop", "asdfghjkl", "zxcvbnm "])
 ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', 'n ', 'alask', 'i', 'a', 'n ', 'e', 'x', 'a', 'm', 'p', 'l', 'e']
 >>> split_on_changes("MiXeD CASing ahahah", ["qwertyuiopasdfghjklzxcvbnm", "qwertyuiopasdfghjklzxcvbnm".upper()])
 ['M', 'i', 'X', 'e', 'D', ' ', 'CAS', 'ing', ' ', 'ahahah']
 """
 def same_cat(a,b):
 for cat in cats:
 if a in cat and b in cat: return True
 return ''.join([x + '·' if not same_cat(x,string[i+1]) else x
 for i,x in enumerate(string[:-1])] + list(string[-1])).split('·')
if __name__ == "__main__":
 doctest.testmod()
asked Apr 19, 2015 at 17:30
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

Relying on special values (namely '·') not to be present in the input is a bad idea. Your problem is basically a variant of itertools.groupby(), so you should use that.

from itertools import groupby
def split_on_changes(string, cats):
 """
 docstring and doctest here
 """
 def category(char):
 """
 Return the index of the category to which char belongs
 (or None if it does not belong to any category).
 """
 return next((i for (i, cat) in enumerate(cats) if char in cat), None)
 return list(''.join(group) for _, group in groupby(string, category))

The category() helper function is based on this technique to find the first element of a list that satisfies a predicate. It could also be written as

def category(char):
 for i, cat in enumerate(cats):
 if char in cat:
 return i

Depending on what you want to do with the results, you might want to consider not calling list() and string.join().

answered Apr 19, 2015 at 18:27
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.