Is there any way I could select all the <option>s in the following HTML form <select> into a python list, like so, ['a','b','c','d']?
<select name="sel">
<option value="a">a</option>
<option value="b">b</option>
<option value="c">c</option>
<option value="d">d</option>
</select>
Many thanks in advance.
2 Answers 2
import re
text = '''<select name="sel">
<option value="a">a</option>
<option value="b">b</option>
<option value="c">c</option>
<option value="d">d</option>
</select>'''
pattern = re.compile(r'<option value="(?P<val>.*?)">(?P=val)</option>')
handy_list = pattern.findall(text)
print handy_list
will output
['a', 'b', 'c', 'd']
Disclaimer: Parsing HTML with regular expressions does not work in the general case.
answered Dec 6, 2010 at 19:14
nmichaels
51.2k12 gold badges113 silver badges137 bronze badges
Sign up to request clarification or add additional context in comments.
Comments
You might want to look at BeautifulSoup if you want to parse other HTML data also
from BeautifulSoup import BeautifulSoup
text = '''<select name="sel">
<option value="a">a</option>
<option value="b">b</option>
<option value="c">c</option>
<option value="d">d</option>
</select>'''
soup = BeautifulSoup(text)
print [i.string for i in soup.findAll('option')]
Comments
default