12
\$\begingroup\$

I am working on an HTML document to which I need to add certain classes to some elements. In the following code, I am adding class img-responsive.

def add_img_class1(img_tag):
 try:
 img_tag['class'] = img_tag['class']+' img-responsive' 
 except KeyError:
 img_tag['class'] = 'img-responsive'
 return img_tag
def add_img_class2(img_tag):
 if img_tag.has_attr('class'):
 img_tag['class'] = img_tag['class']+' img-responsive'
 else:
 img_tag['class'] = 'img-responsive'
 return img_tag
soup = BeautifulSoup(myhtml)
for img_tag in soup.find_all('img'): 
 img_tag = add_img_class1(img_tag) #or img_tag = add_img_class2(img_tag)
html = soup.prettify(soup.original_encoding)
with open("edited.html","wb") as file:
 file.write(html)
  1. Both functions do same, however one uses exceptions and another has_attr from BS4. Which is better and why?
  2. Am I doing the right way of writing back to HTML? Or shall convert entire soup to UTF-8 (by string.encode('UTF-8')) and write it?
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Sep 19, 2013 at 7:01
\$\endgroup\$

1 Answer 1

14
\$\begingroup\$

The second option is better, because the possible error is explicit. However, in lots of case in Python, you should follow EAFP and go for the try statement. However, we can do better.

get(value, default)

In BeautifulSoup, attributes behave like dictionaries. This means you can write img_tag.get('class', '') to get the class if it exists, or the empty string if it doesn't.

def add_img_class(img_tag):
 img_tag = img_tag.get('class', '') + ' img-responsive'

You don't need to return the new img_tag as it is passed by reference. Now that your function is a one-liner, you might as well use the one-liner directly.

Multi-valued attributes

Note that the above code doesn't work! class is a multi-valued attribute in HTML4 and HTML5, so at least BeautifulSoup 4 returns a list instead of a string. The correct code becomes:

img_tag['class'] = img_tag.get('class', []) + ['img-responsive']

Wich is nicer as you don't have to worry about the extra space between the two values.

Encoding

You don't need to convert to UTF-8 before writing the file back. What's wrong with  ?

answered Feb 11, 2014 at 20:09
\$\endgroup\$
2
  • \$\begingroup\$ Using img['class'] = img.get('class', []) + ['img-responsive'] results in TypeError: coercing to Unicode: need string or buffer, list found but img['class'] = img.get('class', []) + ' img-responsive does the trick. \$\endgroup\$ Commented Apr 29, 2015 at 9:11
  • \$\begingroup\$ FredCampos, did you use BeautifulSoup4? Did you parse your document as HTML? The BeautifulSoup 4 docs mentions that img[class] should always return a list: crummy.com/software/BeautifulSoup/bs4/doc/… \$\endgroup\$ Commented Apr 29, 2015 at 13:16

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.