Skip to main content
Stack Overflow
  1. About
  2. For Teams

Return to Answer

added 1283 characters in body
Source Link
metatoaster
  • 19.2k
  • 5
  • 65
  • 74

Use the latin1 codec.

>>> '\xe1BA\x06\xbe\x084'.encode('latin1')
b'\xe1BA\x06\xbe\x084'

The reason why this works (and is the way it is) because originally those bytes sequences were defined to be those characters by the ISO-8859-1 standard, and thus encoding them down back using that encoding well, gets you back those exact bytes.

While the other answer is useful (the loop through all available codecs to get all possible output is great), do keep in mind that while other specific codecs will work for some specific strings, it may or may not end up mapping to the identical base "byte" sequence.

>>> '\xfe'.encode('iso8859_9')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/lib/python3.5/encodings/iso8859_9.py", line 12, in encode
 return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>
>>> '\xfe'.encode('latin1')
b'\xfe'
>>> 

Of course, the raw_unicode_escape can be useful if your intent is to encode everything to a form of base byte encoding that also allow anything > \xff to be represented through the \\uXXXX form:

>>> 'あ'.encode('latin1')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character '\u3042' in position 0: ordinal not in range(256)
>>> 'あ'.encode('raw_unicode_escape')
b'\\u3042'
>>> 

Naturally, pick the strategy that makes the most sense for your intent.

Use the latin1 codec.

>>> '\xe1BA\x06\xbe\x084'.encode('latin1')
b'\xe1BA\x06\xbe\x084'

The reason why this works (and is the way it is) because originally those bytes sequences were defined to be those characters by the ISO-8859-1 standard, and thus encoding them down back using that encoding well, gets you back those exact bytes.

Use the latin1 codec.

>>> '\xe1BA\x06\xbe\x084'.encode('latin1')
b'\xe1BA\x06\xbe\x084'

The reason why this works (and is the way it is) because originally those bytes sequences were defined to be those characters by the ISO-8859-1 standard, and thus encoding them down back using that encoding well, gets you back those exact bytes.

While the other answer is useful (the loop through all available codecs to get all possible output is great), do keep in mind that while other specific codecs will work for some specific strings, it may or may not end up mapping to the identical base "byte" sequence.

>>> '\xfe'.encode('iso8859_9')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/lib/python3.5/encodings/iso8859_9.py", line 12, in encode
 return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>
>>> '\xfe'.encode('latin1')
b'\xfe'
>>> 

Of course, the raw_unicode_escape can be useful if your intent is to encode everything to a form of base byte encoding that also allow anything > \xff to be represented through the \\uXXXX form:

>>> 'あ'.encode('latin1')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character '\u3042' in position 0: ordinal not in range(256)
>>> 'あ'.encode('raw_unicode_escape')
b'\\u3042'
>>> 

Naturally, pick the strategy that makes the most sense for your intent.

added 293 characters in body
Source Link
metatoaster
  • 19.2k
  • 5
  • 65
  • 74

Use the latin1 codec.

>>> '\xe1BA\x06\xbe\x084'.encode('latin1')
b'\xe1BA\x06\xbe\x084'

The reason why this works (and is the way it is) because originally those bytes sequences were defined to be those characters by the ISO-8859-1 standard , and thus encoding them down back using that encoding well, gets you back those exact bytes.

Use the latin1 codec.

>>> '\xe1BA\x06\xbe\x084'.encode('latin1')
b'\xe1BA\x06\xbe\x084'

Use the latin1 codec.

>>> '\xe1BA\x06\xbe\x084'.encode('latin1')
b'\xe1BA\x06\xbe\x084'

The reason why this works (and is the way it is) because originally those bytes sequences were defined to be those characters by the ISO-8859-1 standard , and thus encoding them down back using that encoding well, gets you back those exact bytes.

Source Link
metatoaster
  • 19.2k
  • 5
  • 65
  • 74

Use the latin1 codec.

>>> '\xe1BA\x06\xbe\x084'.encode('latin1')
b'\xe1BA\x06\xbe\x084'
lang-py

AltStyle によって変換されたページ (->オリジナル) /