I am trying to convert an email in a readable format from a chunk of an mbox file but get a mojibake (wrong da?s instead of das) when opening it with neovim (with set encoding=utf-8 and set fileencodings=utf-8 in the .vimrc)
Here is the email file emailfile.txt:
From 9999999999999999@xxx Tue Mar 09 17:00:00 +0500 2019
X-GM-THRID: 99999999999999999
X-mail-Labels: Archived,Sent,Opened
MIME-Version: 1.0
Date: 2019年3月09日 17:00:00 +0500
Message-ID: <[email protected]>
Subject: THETITLE
From: My Name <[email protected]>
To: [email protected]
Content-Type: multipart/alternative; boundary="0000000000009999999999999"
--0000000000009999999999999
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ da=
s ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
--0000000000009999999999999
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div>ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ=
ZZ das ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ=
>
--0000000000009999999999999--
I unpack it with the command
$ munpack -t emailfile.txt
part1 (text/plain)
part2 (text/html)
$ cat part1
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ das ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
However here is what I get when I open it with vim (notice the superfluous ?:
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ da?s ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
might it come from the come the a= from plain text of email?
How can I open the file with vim so that it displays das without the ??
EDIT
As requested in the comments:
$ locale
LANG=""
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
Here is the output of vim -Nu NONE part1
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ daÿs ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ
^M
Now tje ? charcter is changed in a ÿ
1 Answer 1
I could finally find the answer in another SO question (Encoding issue : decode Quoted-Printable string in Python)
Here is the python script:
import quopri
with open("emailfile.txt") as f:
files = quopri.decodestring(f.read().rstrip())
print(files.decode('latin-1'))
$ vim -Nu NONE part1opens the file correctly, here. Could you add the output of$ localeto the body of your question? Also,utf-8is a bad value for:help 'fileencodings'.$ vim -Nu NONE part1did not solve the problem. I edited the question accordingly