I was thinking about splitting a string based on certain categories, for example alphabetic and numerical, lower and upper case, and even the 3 lines of the keyboard. I wrote a generic function for this but it seems unreadable, so I ask you for better modularization and improvements to readability.
import doctest
def split_on_changes(string, cats):
"""
Given an input list and some categories,
splits the list whenever the category changes.
Also splits if the next char is in no categories.
>>> split_on_changes("010010110", ["0", "1"])
['0', '1', '00', '1', '0', '11', '0']
>>> split_on_changes("this is an alaskian example", ["qwertyuiop", "asdfghjkl", "zxcvbnm "])
['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', 'n ', 'alask', 'i', 'a', 'n ', 'e', 'x', 'a', 'm', 'p', 'l', 'e']
>>> split_on_changes("MiXeD CASing ahahah", ["qwertyuiopasdfghjklzxcvbnm", "qwertyuiopasdfghjklzxcvbnm".upper()])
['M', 'i', 'X', 'e', 'D', ' ', 'CAS', 'ing', ' ', 'ahahah']
"""
def same_cat(a,b):
for cat in cats:
if a in cat and b in cat: return True
return ''.join([x + '·' if not same_cat(x,string[i+1]) else x
for i,x in enumerate(string[:-1])] + list(string[-1])).split('·')
if __name__ == "__main__":
doctest.testmod()
1 Answer 1
Relying on special values (namely '·'
) not to be present in the input is a bad idea. Your problem is basically a variant of itertools.groupby()
, so you should use that.
from itertools import groupby
def split_on_changes(string, cats):
"""
docstring and doctest here
"""
def category(char):
"""
Return the index of the category to which char belongs
(or None if it does not belong to any category).
"""
return next((i for (i, cat) in enumerate(cats) if char in cat), None)
return list(''.join(group) for _, group in groupby(string, category))
The category()
helper function is based on this technique to find the first element of a list that satisfies a predicate. It could also be written as
def category(char):
for i, cat in enumerate(cats):
if char in cat:
return i
Depending on what you want to do with the results, you might want to consider not calling list()
and string.join()
.