3
\$\begingroup\$

I have a string containing several items listed in the following notation:

myString = '[A][B][C]'

And I would like to parse that to a python list of several strings:

['A', 'B', 'C']

I know this can be solved with:

myString = myString.lstrip('[')
myString = myString.rstrip(']')
myList = myString.split('][')

I'd just like to know if there is an even more pythonic way of doing it. Compare https://stackoverflow.com/a/1653143/10983441 where the most elegant way in the end was to use pyparsing with nestedExpr.

asked Jan 14, 2021 at 10:15
\$\endgroup\$
0

2 Answers 2

4
\$\begingroup\$

If you have a regular pattern that describes what you want to do with a string, using a regular expression (regex) is usually a good idea. In addition to using re.split, as shown in another answer by @python_user, you can also use re.findall, which has the advantage that you don't have to manually deal with the opening and closing delimiters:

import re
re.findall('\[(.)\]', '[A][B][C]')
# ['A', 'B', 'C']

This finds all single characters (.), which are surrounded by square parenthesis (\[...\]) and selects only the character itself ((.)).

If you want to allow more than one character between the parenthesis, you need to use a non-greedy version of *, the *?:

re.findall('\[(.*)\]', '[][a][a2][+%]')
# ['][a][a2][+%']
re.findall('\[(.*?)\]', '[][a][a2][+%]')
# ['', 'a', 'a2', '+%']

Regarding your code itself, Python has an official style-guide, PEP8, which recommends using lower_case instead of pascalCase for variables and functions.

You could also chain your calls together without sacrificing too much readability (even gaining some, arguably):

my_list = my_string.lstrip('[').rstrip(']').split('][')
answered Jan 14, 2021 at 14:09
\$\endgroup\$
2
  • 1
    \$\begingroup\$ seeing this answer makes me want to up my regex game :D \$\endgroup\$ Commented Jan 14, 2021 at 14:12
  • \$\begingroup\$ normal strip can be used instead of rstrip and lstrip: my_list = my_string.strip("][").split("]["), but string slicing could be used instead: my_list = my_string[1:-1].split("][") \$\endgroup\$ Commented Jan 14, 2021 at 21:33
4
\$\begingroup\$

A Regex based solution

>>> my_string_one
'[A][B][C]'
>>> re.split(r"\[([A-Z])\]", my_string_one)[1:-1:2]
['A', 'B', 'C']
>>> my_string_two
'[A][B][C][D][E]'
>>> re.split(r"\[([A-Z])\]", my_string_two)[1:-1:2]
['A', 'B', 'C', 'D', 'E']

You can use re.split with the expression \[([A-Z])\ having a capture group for the uppercase letters. This is under the assumption that your strings always follow this pattern otherwise you may not get what you expect.

answered Jan 14, 2021 at 13:10
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.