I have a string containing several items listed in the following notation:
myString = '[A][B][C]'
And I would like to parse that to a python list of several strings:
['A', 'B', 'C']
I know this can be solved with:
myString = myString.lstrip('[')
myString = myString.rstrip(']')
myList = myString.split('][')
I'd just like to know if there is an even more pythonic way of doing it. Compare https://stackoverflow.com/a/1653143/10983441 where the most elegant way in the end was to use pyparsing with nestedExpr.
2 Answers 2
If you have a regular pattern that describes what you want to do with a string, using a regular expression (regex) is usually a good idea. In addition to using re.split
, as shown in another answer by @python_user, you can also use re.findall
, which has the advantage that you don't have to manually deal with the opening and closing delimiters:
import re
re.findall('\[(.)\]', '[A][B][C]')
# ['A', 'B', 'C']
This finds all single characters (.
), which are surrounded by square parenthesis (\[...\]
) and selects only the character itself ((.)
).
If you want to allow more than one character between the parenthesis, you need to use a non-greedy version of *
, the *?
:
re.findall('\[(.*)\]', '[][a][a2][+%]')
# ['][a][a2][+%']
re.findall('\[(.*?)\]', '[][a][a2][+%]')
# ['', 'a', 'a2', '+%']
Regarding your code itself, Python has an official style-guide, PEP8, which recommends using lower_case
instead of pascalCase
for variables and functions.
You could also chain your calls together without sacrificing too much readability (even gaining some, arguably):
my_list = my_string.lstrip('[').rstrip(']').split('][')
-
1\$\begingroup\$ seeing this answer makes me want to up my regex game :D \$\endgroup\$python_user– python_user2021年01月14日 14:12:18 +00:00Commented Jan 14, 2021 at 14:12
-
\$\begingroup\$ normal strip can be used instead of rstrip and lstrip:
my_list = my_string.strip("][").split("][")
, but string slicing could be used instead:my_list = my_string[1:-1].split("][")
\$\endgroup\$my_stack_exchange_account– my_stack_exchange_account2021年01月14日 21:33:48 +00:00Commented Jan 14, 2021 at 21:33
A Regex based solution
>>> my_string_one
'[A][B][C]'
>>> re.split(r"\[([A-Z])\]", my_string_one)[1:-1:2]
['A', 'B', 'C']
>>> my_string_two
'[A][B][C][D][E]'
>>> re.split(r"\[([A-Z])\]", my_string_two)[1:-1:2]
['A', 'B', 'C', 'D', 'E']
You can use re.split
with the expression \[([A-Z])\
having a capture group for the uppercase letters. This is under the assumption that your strings always follow this pattern otherwise you may not get what you expect.