Timeline for file.read() UnicodeDecodeError - Devuan Daedalus (Debian 12 w/o systemd)
Current License: CC BY-SA 4.0
Post Revisions
8 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Nov 19, 2023 at 14:20 | comment | added | Mark Ransom |
No, the script would not work on .doc and .docx files directly. Those are compressed files and the plain text characters don't exist in them directly. P.S. the fact that those were the files you were trying to process should have been part of the question from the start.
|
|
| Nov 18, 2023 at 21:25 | answer | added | vrgovinda | timeline score: 0 | |
| Nov 18, 2023 at 21:22 | comment | added | vrgovinda |
BTW, if I delete the encoding="utf-8" argument and use with open(path, "rb") as file: will this script work on .doc and .docx?
|
|
| Nov 18, 2023 at 21:19 | comment | added | vrgovinda | Oh yes. I tried this script on .doc and .docx files which aren't UTF-8 encoded I guess. When I did try the script on plain text files, the script works flawlessly. | |
| Nov 18, 2023 at 20:30 | answer | added | frederic laurencin | timeline score: 0 | |
| Nov 18, 2023 at 20:29 | comment | added | mohamed martini |
It’s hard to guess what’s happening without looking at the file. But if you want to open the file as binary, you need to delete the encoding="utf-8" argument: with open(path, "rb") as file:
|
|
| S Nov 18, 2023 at 20:06 | review | First questions | |||
| Nov 18, 2023 at 21:27 | |||||
| S Nov 18, 2023 at 20:06 | history | asked | vrgovinda | CC BY-SA 4.0 | created from wizard |