Adding a new class to HTML tag and writing it back with Beautiful Soup

Question 1

I am working on an HTML document to which I need to add certain classes to some elements. In the following code, I am adding class img-responsive.

def add_img_class1(img_tag):
 try:
 img_tag['class'] = img_tag['class']+' img-responsive' 
 except KeyError:
 img_tag['class'] = 'img-responsive'
 return img_tag
def add_img_class2(img_tag):
 if img_tag.has_attr('class'):
 img_tag['class'] = img_tag['class']+' img-responsive'
 else:
 img_tag['class'] = 'img-responsive'
 return img_tag
soup = BeautifulSoup(myhtml)
for img_tag in soup.find_all('img'): 
 img_tag = add_img_class1(img_tag) #or img_tag = add_img_class2(img_tag)
html = soup.prettify(soup.original_encoding)
with open("edited.html","wb") as file:
 file.write(html)

Both functions do same, however one uses exceptions and another has_attr from BS4. Which is better and why?
Am I doing the right way of writing back to HTML? Or shall convert entire soup to UTF-8 (by string.encode('UTF-8')) and write it?

Question 2

The second option is better, because the possible error is explicit. However, in lots of case in Python, you should follow EAFP and go for the try statement. However, we can do better.

get(value, default)

In BeautifulSoup, attributes behave like dictionaries. This means you can write img_tag.get('class', '') to get the class if it exists, or the empty string if it doesn't.

def add_img_class(img_tag):
 img_tag = img_tag.get('class', '') + ' img-responsive'

You don't need to return the new img_tag as it is passed by reference. Now that your function is a one-liner, you might as well use the one-liner directly.

Multi-valued attributes

Note that the above code doesn't work! class is a multi-valued attribute in HTML4 and HTML5, so at least BeautifulSoup 4 returns a list instead of a string. The correct code becomes:

img_tag['class'] = img_tag.get('class', []) + ['img-responsive']

Wich is nicer as you don't have to worry about the extra space between the two values.

Encoding

You don't need to convert to UTF-8 before writing the file back. What's wrong with  ?

Question 3

Using img['class'] = img.get('class', []) + ['img-responsive'] results in TypeError: coercing to Unicode: need string or buffer, list found but img['class'] = img.get('class', []) + ' img-responsive does the trick.

Question 4

FredCampos, did you use BeautifulSoup4? Did you parse your document as HTML? The BeautifulSoup 4 docs mentions that img[class] should always return a list: crummy.com/software/BeautifulSoup/bs4/doc/…

Quentin Pradet Quentin Pradet 7,0641 gold badge25 silver badges44 bronze badges · Answer 1 · 2014-02-11 20:09:18Z

The second option is better, because the possible error is explicit. However, in lots of case in Python, you should follow EAFP and go for the try statement. However, we can do better.

get(value, default)

In BeautifulSoup, attributes behave like dictionaries. This means you can write img_tag.get('class', '') to get the class if it exists, or the empty string if it doesn't.

def add_img_class(img_tag):
 img_tag = img_tag.get('class', '') + ' img-responsive'

You don't need to return the new img_tag as it is passed by reference. Now that your function is a one-liner, you might as well use the one-liner directly.

Multi-valued attributes

Note that the above code doesn't work! class is a multi-valued attribute in HTML4 and HTML5, so at least BeautifulSoup 4 returns a list instead of a string. The correct code becomes:

img_tag['class'] = img_tag.get('class', []) + ['img-responsive']

Wich is nicer as you don't have to worry about the extra space between the two values.

Encoding

You don't need to convert to UTF-8 before writing the file back. What's wrong with  ?

Using img['class'] = img.get('class', []) + ['img-responsive'] results in TypeError: coercing to Unicode: need string or buffer, list found but img['class'] = img.get('class', []) + ' img-responsive does the trick.
FredCampos, did you use BeautifulSoup4? Did you parse your document as HTML? The BeautifulSoup 4 docs mentions that img[class] should always return a list: crummy.com/software/BeautifulSoup/bs4/doc/…

Stack Exchange Network

Adding a new class to HTML tag and writing it back with Beautiful Soup

1 Answer 1

get(value, default)

Multi-valued attributes

Encoding

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Adding a new class to HTML tag and writing it back with Beautiful Soup

1 Answer 1

get(value, default)

Multi-valued attributes

Encoding

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions