Linked Questions
40 questions linked to/from Unicode (UTF-8) reading and writing to files in Python
0
votes
0
answers
61
views
How to write strings in Unicode to a text file in Python? [duplicate]
I was trying to write a translate function that could automatically convert some text file in English into Chinese. There is a dictionary saved and the key values are in Chinese characters. However, ...
0
votes
1
answer
61
views
Gettin non-english text from html doc [duplicate]
I'm trying to get a title of html document in python, but getting weird symbols. I guess that's because of encoding, but the html doc in utf-8 encoding.
Is there any way I can get normal letters?
Here ...
1
vote
0
answers
43
views
FPDF encoding error when reading a UTF8 txt file in Python [duplicate]
I am creating PDF files when FPDF. The content is written in Traditional Chinese. So I have added a snippet :
pdf.add_font('TC', '', '/Users/yeung/Library/Fonts/TaipeiSansTCBeta-Regular.ttf', uni=True)...
50
votes
8
answers
271k
views
How to open html file that contains Unicode characters?
I have html file called test.html it has one word בדיקה.
I open the test.html and print it's content using this block of code:
file = open("test.html", "r")
print file.read()
but it prints ??????, ...
10
votes
5
answers
16k
views
How to write Chinese characters to file by python
I'm walking through a directory and want to write all files names into a file. Here's the piece of code
with open("c:/Users/me/filename.txt", "a") as d:
for dir, subdirs, files in os.walk("c:/...
Bomin's user avatar
- 1,657
4
votes
4
answers
19k
views
Python 2.7 UnicodeDecodeError: 'ascii' codec can't decode byte
I've been parsing some docx files (UTF-8 encoded XML) with special characters (Czech alphabet). When I try to output to stdout, everything goes smoothly, but I'm unable to output data to the file,
...
4
votes
3
answers
13k
views
codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 318: ordinal not in range(128)
I am trying to open and readlines a .txt file that contains a large amount of text. Below is my code, i dont know how to solve this problem. Any help would be very appreciated.
file = input("Please ...
10
votes
2
answers
10k
views
Handling french letters in Python
I am reading data from a file which contains words with french and english letters. I am attempting to construct a list of all of the possible english and french letters (stored as strings). I do ...
2
votes
2
answers
3k
views
How to handle unknow encoding
I'm having some issues with a Python script that needs to open files with different encoding.
I'm usually using this:
with open(path_to_file, 'r') as f:
first_line = f.readline()
And that works ...
5
votes
2
answers
16k
views
Networkx : How to create graph edges from a csv file?
I am trying to create a graph using networkx and so far I have created nodes from the following text files :
File 1(user_id.txt) sample data :
user_000001
user_000002
user_000003
user_000004
...
2
votes
1
answer
7k
views
Yaml safe load special character ° from file
I am trying to read a yml config file using PyYAML. See below example of such a file. The desired field may contain a special character, such as °. The resulting string in below example is not the ...
6
votes
1
answer
5k
views
codecs.open(utf-8) fails to read plain ASCII file
I have a plain ASCII file. When I try to open it with codecs.open(..., "utf-8"), I am unable to read single characters. ASCII is a subset of UTF-8, so why can't codecs open such a file in UTF-8 mode?
...
3
votes
1
answer
9k
views
Python: Special characters encoding
This is the code i am using in order to replace special characters in text files and concatenate them to a single file.
# -*- coding: utf-8 -*-
import os
import codecs
dirpath = "C:\\...
user avatar
user1834437
0
votes
1
answer
2k
views
UTF-8 in python issues
meh, I'm not a fan of utf-8 in python; can't seem to figure out how to solve this. As you can see I'm already trying to B64 encode the value, but it looks like python is trying to convert it from utf-...
0
votes
2
answers
4k
views
Print Unicode string containing both accented characters and emoticons
I'm reading a file with Python that contains exactly the following line
à è ì ò ù ç @ \U0001F914
where \U0001F914 is the unicode code for an emoticon.
if interpret the string as
string=string....