I am trying to convert a huge csv file from utf-16 to utf-8 format using python and below is the code:
with open(r'D:\_apps\aaa\output\srcfile, 'rb') as source_file:
with open(r'D:\_apps\aaa\output\destfile, 'w+b') as dest_file:
contents = source_file.read()
dest_file.write(contents.decode('utf-16').encode('utf-8'))
But this code uses lots of memory and fails with Memoryerror. Please help me with an alternate method.
-
Split the file? Perhaps it would help to specify the encoding when opening the files? Then, if possible, you could perhaps stream directly from one file to the other.Ulrich Eckhardt– Ulrich Eckhardt2022年03月17日 06:57:48 +00:00Commented Mar 17, 2022 at 6:57
1 Answer 1
an option is to convert the file line by line:
with open(r'D:\_apps\aaa\output\srcfile', 'rb') as source_file, \
open(r'D:\_apps\aaa\output\destfile', 'w+b') as dest_file:
for line in source_file:
dest_file.write(line.decode('utf-16').encode('utf-8'))
or you could open the files with your desired encoding:
with open(r'D:\_apps\aaa\output\srcfile', 'r', encoding='utf-16') as source_file, \
open(r'D:\_apps\aaa\output\destfile', 'w+', encoding='utf-8') as dest_file:
for line in source_file:
dest_file.write(line)
answered Mar 17, 2022 at 6:57
hiro protagonist
47.4k17 gold badges93 silver badges119 bronze badges
Sign up to request clarification or add additional context in comments.
3 Comments
hiro protagonist
glad to hear! happy pythoning!
lenz
In the second solution, you have to use modes without "b", ie.
'r' and 'w+' as the second args to open().Tessy
@lenz Yes, you are right. I did the same.
lang-py