3

I'm trying to make a script that converts japanese katakana to romaji ("シ" to "shi"). Here's what I'm trying:

x = u''
x = raw_input('Enter katakana: ')
x = x.replace(u'\u30B7', u'shi')

Enter Katakana: シ
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)

As long as I have the unicode in my script written as u'\u30B7' and not シ, it should be able to handle it, right?

dda
6,2212 gold badges27 silver badges37 bronze badges
asked Nov 25, 2012 at 23:12

1 Answer 1

8

raw_input returns the entered string in a byte-encoded form that varies depending on the terminal used. Try decoding the input explicitly to Unicode first with:

import sys
x = raw_input('Enter katakana: ').decode(sys.stdin.encoding)

The error you get is from replace trying to naively convert the byte-encoded x to Unicode via the default ascii codec.

answered Nov 25, 2012 at 23:16
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.