Error when encoding UTF-8

Question 1

I am trying to fetch text data from a website, but this code shows some error. Please let me know where is the error.

import requests
from bs4 import BeautifulSoup
def getportions(soup):
for p in soup.find_all("p", {"class": ""}): 
 yield p.text
def readpage(address): 
 page = requests.get(address) 
 soup = BeautifulSoup(page.text, "html.parser")
 output_text = ''
 for s in getportions(soup):
 output_text += s.encode("utf8")
 output_text += "\n"
 print (output_text)
 print ("End of article")
 fp = open("content.txt", "w")
 fp.write(output_text)
if __name__ == "__main__":
 readpage("http://yahoo.com")

The error is shown below:

output_text += s.encode("utf8"). TypeError: Can't convert 'bytes' object to str implicitly

Question 2

.encode returns a bytes object. What are you trying to do?

Question 3

@MorganThrapp I am trying to write contents in a file

Question 4

Do you maybe mean decode? Why do you think you need to do anything with utf-8?

Question 5

@MorganThrapp if I make the object as string then it contains unnecessary chracter

Question 6

If you use Python 3, all strings are natively in unicode, and you can specify the encoding when opening a file. You code could become:

def readpage(address): 
 ...
 output_text = ''
 for s in getportions(soup):
 output_text += s
 output_text += "\n"
 print (output_text)
 print ("End of article")
 fp = open("content.txt", "w", encoding='utf8')
 fp.write(output_text)

If you simply want to sanitize the text by replacing all non ascii characters with a ? open the file that way:

 fp = open("content.txt", "w", encoding='ascii', errors='replace')

Question 7

It shows error agin: return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u03a3' in position 350: character maps to <undefined>

Question 8

@NARAYANCHANGDER: Cannot reproduce. Show the code that produces the error and the stacktrace. Utf8 is meant to be able to encode any unicode character...

Question 9

@NARAYANCHANGDER: ... and I can confirm that I could successfully process u03a3 (Σ)

Serge Ballesta 150k13 gold badges137 silver badges267 bronze badges · Accepted Answer · 2016-11-04 14:59:05Z

If you use Python 3, all strings are natively in unicode, and you can specify the encoding when opening a file. You code could become:

def readpage(address): 
 ...
 output_text = ''
 for s in getportions(soup):
 output_text += s
 output_text += "\n"
 print (output_text)
 print ("End of article")
 fp = open("content.txt", "w", encoding='utf8')
 fp.write(output_text)

If you simply want to sanitize the text by replacing all non ascii characters with a ? open the file that way:

 fp = open("content.txt", "w", encoding='ascii', errors='replace')

It shows error agin: return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u03a3' in position 350: character maps to <undefined>
@NARAYANCHANGDER: Cannot reproduce. Show the code that produces the error and the stacktrace. Utf8 is meant to be able to encode any unicode character...
@NARAYANCHANGDER: ... and I can confirm that I could successfully process u03a3 (Σ)

CollectivesTM on Stack Overflow

Error when encoding UTF-8

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related