Improvment of and looping in a regular expression pattern

Asked 12 years, 5 months ago

Viewed 125 times

\$\begingroup\$

My implemented regex pattern contains two repeating symbols: \d{2}\. and <p>(.*)</p>. I want to get rid of this repetition and asked myself if there is a way to loop in Python's regular expression implementation.

Note: I do not ask for help to parse a XML file. There are many great tutorials, howtos and libraries. I am looking for means to implement repetition in regex patterns.

My code:

import re
pattern = '''
<menu>
<day>\w{2} (\d{2}\.\d{2})\.</day>
<description>
<p>(.*)</p>
<p>(.*)</p>
<p>(.*)</p>
</description>
'''
my_example_string = '''
<menu>
<day>Mi 03.04.</day>
<description>
<p>Knoblauchcremesuppe</p>
<p>Rindsbraten "Esterhazy" (Gem&uuml;serahmsauce)</p>
<p>mit H&ouml;rnchen und Salat</p>
</description>
</menu>
'''
re.findall(pattern, my_example_string, re.MULTILINE)

edited Apr 1, 2013 at 15:39

PhilippPhilipp

asked Apr 1, 2013 at 13:38

Philipp's user avatar

Philipp Philipp

3212 silver badges8 bronze badges

\$\endgroup\$

1

\$\begingroup\$ Parsing XML with regex is usually wrong, what are you really trying to accomplish? \$\endgroup\$

konijn
– konijn

2013年04月01日 14:30:35 +00:00
Commented Apr 1, 2013 at 14:30
\$\begingroup\$ The XML is malformed what prevents a usage of LXML and Xpath. I easily can retrieve the deserved data, but I want to find a way to avoid these repetitions in any regex patterns. \$\endgroup\$

Philipp
– Philipp

2013年04月01日 14:37:01 +00:00
Commented Apr 1, 2013 at 14:37

Add a comment |

1 Answer 1

Sorted by: Reset to default

\$\begingroup\$

Firstly, just for anyone who might read this: DO NOT take this as an excuse to parse your XML with regular expressions. It generally a really really bad idea! In this case the XML is malformed, so its the best we can do.

The regular expressions looping constructs are * and {4} which you already using. But this is python, so you can construct your regular expression using python:

expression = """
<menu>
<day>\w{2} (\d{2}\.\d{2})\.</day>
<description>
"""
for x in xrange(3):
 expression += "<p>(.*)</p>"
expression += """
</description>
</menu>
"""

answered Apr 1, 2013 at 17:45

Winston Ewert's user avatar

Winston Ewert Winston Ewert

30.7k4 gold badges52 silver badges79 bronze badges

\$\endgroup\$

\$\begingroup\$ What about expression += "<p>(.*)</p>\n" * 3 ? \$\endgroup\$

Gareth Rees
– Gareth Rees

2013年04月01日 18:40:05 +00:00
Commented Apr 1, 2013 at 18:40

Add a comment |

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

Stack Exchange Network

Improvment of and looping in a regular expression pattern

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Improvment of and looping in a regular expression pattern

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions