2
\$\begingroup\$

I'm trying to create a list from a string with items alternating between words and parse characters like ['Hello', ' ', 'World']

Is there a built in function, existing module, or simpler way to achieve something like below? I'm interested in variable chars for parsing.

sample.txt

def parse_chars(string, chars):
 parse_set = {c for c in chars}
 string_list = []
 start = 0
 for index, char in enumerate(string):
 if char not in parse_set:
 if index - start > 0:
 word = string[start:index]
 string_list.append(word)
 string_list.append(char)
 start = index + 1
 document_len = len(string)
 if start != document_len:
 word = string[start:document_len]
 string_list.append(word)
 return string_list
filename = 'sample.txt'
with open(filename) as document_open:
 document_string = document_open.read()
alphanumeric = (map(chr, range(48, 58)) + 
 map(chr, range(65, 90)) +
 map(chr, range(97, 123)))
print parse_chars(document_string, alphanumeric)

[' ', 'A', ' ', 'space', ' ', 'then', ' ', '3', ' ', 'blank', ' ', 'lines', '\n', '\n', '\n', '3', ' ', 'blank', ' ', 'spaces', ' ', ' ', ' ', 'end']

asked May 20, 2015 at 9:04
\$\endgroup\$
0

1 Answer 1

1
\$\begingroup\$

The documentation for re.split says:

If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

For example:

>>> import re
>>> re.split('( )', 'hello world')
['hello', ' ', 'world']

If the string starts or ends with a separator, you get an empty string:

>>> re.split('( )', ' a b c ')
['', ' ', 'a', ' ', 'b', ' ', 'c', ' ', '']

You probably don't want these empty strings, so you should filter them out:

>>> [w for w in re.split('( )', ' a b c ') if w]
[' ', 'a', ' ', 'b', ' ', 'c', ' ']

So your parse_chars function would become:

[w for w in re.split('([^0-9A-Za-z])', string) if w]

For example:

>>> [w for w in re.split('([^0-9A-Za-z])', '10 green bottles!') if w]
['10', ' ', 'green', ' ', 'bottles', '!']
answered May 20, 2015 at 12:28
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.