-
Couldn't load subscription status.
- Fork 18
Fix for UnicodeDecodeError #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi alexjj,
Thanks for sending this PR!
Encodings are a nightmare, and I think you're right this is definitely the right way to do it. I think UTF-8 should be the default Python 3 encoding on OS X and Linux if the locale is set to utf-8 (it usually is). But Windows and some Linux configs, and maybe Python 2, will probably all get errors like you did. Yay!
The only qualm I have about merging the change is that we should probably explain in an aside what's happening here, at least pointing out that we've added this encoding argument and roughly what Unicode is and why we have to care.
Are you able to add something like that to the PR?
Cheers,
Angus
Is there any way we could dodge the topic of encodings here? I think encodings are too important to squeeze them in here. This chapter is about CSV techniques — any explanation is either going to be too short to do encodings justice, or too long and confuse learners. And I'm generally not a friend of "here is this boilerplate — copy it and everything will be happy sparkles."
One possible way is rehosting the CSV files with all non-ASCII characters stripped, so that it should work in almost any encoding by accident. (Fun fact: OpenFlights.org claims the file is "ISO 8859-1 (Latin-1) encoded," so any reasonable learner might be doubly confused by encoding="utf8".)
Python 2 users will just get another error with this fix by the way, since its open function does not accept the encoding parameter. The legacy way to do this would be through the codecs module.
Good points. For me on Windows with Python 3 I needed to specify the
encoding.
However I just stuck UTF8 in as that was the top Google result to the error
and
it worked.
Probably can just add a footer saying if you get the error use encoding,
and
hopefully specifying ISO 8859-1 works too!
ISO 8859-1 works in the way that it doesn't throw an exception. The file is encoded in UTF-8, so you will get weird, scrambled, erroneous output though. I have filed jpatokal/openflights#405 to fix the OpenFlights docs, but my other points remain.
I was getting UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 error, and this resolved it.