0
whitespace_pattern = u"\s" # bug: tried to use unicode \u0020, broke regex
time_sig_pattern = \
 """^%(ws)s*time signature:%(ws)s*(?P<top>\d+)%(ws)s*\/%(ws)s*(?P<bottom>\d+)%(ws)s*$""" %{"ws": whitespace_pattern}
time_sig = compile(time_sig_pattern, U|M)

For some reason, adding the Verbose flag, X, to compile breaks the pattern.

Also, I wanted to use unicode for whitespace_pattern recognition (supposedly, we'll get patterns that use non-unicode spaces and we need to explicitly check for that one unicode character as a valid space), but the pattern keeps breaking.

asked Apr 4, 2011 at 15:50

3 Answers 3

1

VERBOSE gives you the ability to write comments in your regex to document it.

In order to do so, it ignores spaces, since you need to use line breaks to write comments.

Replace all spaces in your regex by \s to specify they are spaces you want to match in your pattern, and not just some spaces to format your comments.

What's more, you may want to use the r prefix for the string you use as a pattern. It tells Python not to interpret special notations such as \n and let you use backslashes without escaping them.

prafulfillment
9112 gold badges11 silver badges26 bronze badges
answered Apr 4, 2011 at 16:00
Sign up to request clarification or add additional context in comments.

Comments

0

Always define regexes with the r prefix to indicate they are raw strings.

 r"""^%(ws)s*time signature:%(ws)s*(?P<top>\d+)%(ws)s*\/%(ws)s*(?P<bottom>\d+)%(ws)s*$""" %{"ws": whitespace_pattern}
answered Apr 4, 2011 at 15:55

Comments

0

When creating a regex to match unicode characters you do not want to use a Python unicode string. In your example regular expression needs to see the literal characters \u0020, so you should use whitespace_pattern = r"\u0020" instead of u"\u0020".

As other answers have mentioned, you should also use the r prefix for time_sig_pattern, after those two changes your code should work fine.

For VERBOSE to work correctly you need to escape all whitespace in the pattern, so towards the beginning of the pattern replace the space in time signature with "\ " (quotes for clarity), \s, or [ ] as documented here.

answered Apr 4, 2011 at 16:45

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.