I'm trying to make a script that converts japanese katakana to romaji ("シ" to "shi"). Here's what I'm trying:
x = u''
x = raw_input('Enter katakana: ')
x = x.replace(u'\u30B7', u'shi')
Enter Katakana: シ
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)
As long as I have the unicode in my script written as u'\u30B7' and not シ, it should be able to handle it, right?
1 Answer 1
raw_input returns the entered string in a byte-encoded form that varies depending on the terminal used. Try decoding the input explicitly to Unicode first with:
import sys
x = raw_input('Enter katakana: ').decode(sys.stdin.encoding)
The error you get is from replace trying to naively convert the byte-encoded x to Unicode via the default ascii codec.