1

I'm tired of searching and trying codes that give repetitive errors, I really hope someone will help me figure this out. my probleme is so simple I'm trying to save an html code in a txt file using python, here's the code I'm using:

from urllib.request import urlopen as uReq
url1 = 'http://www.marmiton.org/recettes/menu-de-la-semaine.aspx'
page = uReq(url1).read().decode()
f = open("test.html", "w")
f.write(page)
f.close()

but it's giving me the following error:

UnicodeEncodeError: 'charmap' codec can't encode character '\u2665' in position 416224: character maps to

asked Nov 27, 2017 at 4:11
7
  • 1
    Why are you using .decode() ? Why not just take the output of the reader and stream it into the file? Commented Nov 27, 2017 at 4:13
  • I tried not using it but it gives me another error TypeError: write() argument must be str, not bytes Commented Nov 27, 2017 at 4:14
  • What is uReq?? Commented Nov 27, 2017 at 4:24
  • oh! sorry i forgot the import statement, i will add it right away, here's what the uReq is from urllib.request import urlopen as uReq Commented Nov 27, 2017 at 4:26
  • you are reading a webpage without even considering the charset - no handling for it?? Commented Nov 27, 2017 at 4:28

3 Answers 3

1

Here is the updated solution:

Python 2.x:

import urllib
url1 = 'http://www.marmiton.org/recettes/menu-de-la-semaine.aspx'
page = urllib.urlopen(url1).read()
f = open("./test1.html", "w")
f.write(page)
f.close()

Python 3.x:

import urllib.request
import shutil
url1 = 'http://www.marmiton.org/recettes/menu-de-la-semaine.aspx'
page = urllib.request.urlopen(url1)
print(page)
f = open("./test2.html", "wb")
shutil.copyfileobj(page, f)
f.close()

You need to use urllib to help you achieve this task.

Pang
10.2k146 gold badges87 silver badges126 bronze badges
answered Nov 27, 2017 at 4:26
Sign up to request clarification or add additional context in comments.

2 Comments

Which version of python are you using?
sorry i forgot to mention that my version 3.6
0

You mention that by not using the .decode() method gives you A Type Error. Have you try to take the HTML content and pass it to the write() method as a string. You may find the way to enclose the HTML content with triple quotes, so you pass it as a multiline string.

answered Nov 27, 2017 at 4:46

1 Comment

that didn't work either, I mentioned it in a previous comment up there
0

You should try with requests and bs4 (BeautifulSoup)

from bs4 import BeautifulSoup
import requests
r = requests.get("https://stackoverflow.com/questions/47503845/save-html-content-into-a-txt-file-using-python")
data = r.text
soup = BeautifulSoup(data)
print(soup)
with open ('/tmp/test.html', 'a') as f:
 f.write(str(soup))
answered Nov 27, 2017 at 4:34

2 Comments

that was a bit hopeful but it's still showing the same error on my ide UnicodeEncodeError: 'charmap' codec can't encode character '\u2665' in position 360940: character maps to <undefined>
weird, I just tried with your link and worked well for me

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.