I know this question has been asked countless times before, but I can't seem to get any of the solutions working. I've tried using the codecs module, the io module. Nothing seems to work.
I'm scraping some stuff off the web, then logging the details of each item to a text file, yet the script breaks as soon as it first encounters a Unicode character.
AHIMSA Centro de Sanación Pránica, Pranic Healing
Further, I'm not sure where and or when Unicode characters might pop up, which adds an extra level of complexity, so I need an overarching solution and I'm not exactly sure how to deal with potential non-ASCII characters.
I'm not sure if I'll have Python 3.6.5 in the production environment, so the solution has to work with 2.7.
What can I do here? How can I deal with this?
# -*- coding: utf-8 -*-
...
with open('test.txt', 'w') as f:
f.write(str(len(discoverable_cards)) + '\n\n')
for cnt in range(0, len(discoverable_cards)):
t = get_time()
f.write('[ {} ] {}\n'.format(t, discoverable_cards[cnt]))
f.write('[ {} ] {}\n'.format(t, cnt + 1))
f.write('[ {} ] {}\n'.format(t, product_type[cnt].text))
f.write('[ {} ] {}\n'.format(t, titles[cnt].text))
...
Any help would be appreciated!
1 Answer 1
Given that you are in python2.7 you will probably want to explicitly encode all of your strings with a unicode compatible character set like "utf8" before passing them to write, you can do this with a simple encode method:
def safe_encode(str_or_unicode):
# future py3 compatibility: define unicode, if needed:
try:
unicode
except NameError:
unicode = str
if isinstance(str_or_unicode, unicode):
return str_or_unicode.encode("utf8")
return str_or_unicode
You would then use it like this:
f.write('[ {} ] {}\n'.format(safe_encode(t), safe_encode(discoverable_cards[cnt])))
6 Comments
return str_or_unicode.encode('utf8')? why isnt utf8 utf-8? maybe thats what i was doing wrong. you see both versions being used everywhere and i had just assumed that utf8 was a typoASCII charset, and then calls encode with your passed encoding. That's the reason why safe_encode checks to make sure it's a unicode object before encoding it. Python 3 solves this by only defining encode on unicode objects and decode on strings, so you get an AttributeError if you try to encode a string rather than a weird unicode bugExplore related questions
See similar questions with these tags.
wbrather thanwmode, you can write to the file as a bytes string.f.write(bytes('[ {} ] {}\n'.format(t, discoverable_cards[cnt]))). That way, your encoding won't get angrywbbefore, then switched towsince you can't append as you normally would towbfiles :/ how do u append to files created withwb?wb:/