19

I have searched many times online and I have not been able to find a way to convert my binary string variable, X

X = "1000100100010110001101000001101010110011001010100"

into a UTF-8 string value.

I have found that some people are using methods such as

b'message'.decode('utf-8')

however, this method has not worked for me, as 'b' is said to be nonexistent, and I am not sure how to replace the 'message' with a variable. Not only, but I have not been able to comprehend how this method works. Is there a better alternative?

So how could I convert a binary string into a text string?

EDIT: I also do not mind ASCII decoding

CLARIFICATION: Here is specifically what I would like to happen.

def binaryToText(z):
 # Some code to convert binary to text
 return (something here);
X="0110100001101001"
print binaryToText(X)

This would then yield the string...

hi
vvvvv
32.9k19 gold badges70 silver badges103 bronze badges
asked Nov 11, 2016 at 22:41
7
  • Since ASCII is effectively a subset of UTF-8 you'll find that your string X is already a UTF8 string. What is your expected output? Commented Nov 11, 2016 at 22:43
  • +mhawke I am looking for a returned value of a UTF-8 string. The binary is initially a string, and I want to be able to convert that binary, into a UTF-8 string. Please ask me if you need more clarification! Commented Nov 11, 2016 at 22:46
  • Are you using Python 2 or 3? Why did you tag BOTH? In Python 3, strings are utf by default. Commented Nov 11, 2016 at 22:48
  • +juanpa.arrivillaga I have the flexibility to use both, dependant upon which option is best for me to use. I can accept solutions for both versions. Commented Nov 11, 2016 at 22:50
  • Well, if you use Python 3, all strings are unicode, so that seems to be the most straightforward solution... Commented Nov 11, 2016 at 22:57

6 Answers 6

17

It looks like you are trying to decode ASCII characters from a binary string representation (bit string) of each character.

You can take each block of eight characters (a byte), convert that to an integer, and then convert that to a character with chr():

>>> X = "0110100001101001"
>>> print(chr(int(X[:8], 2)))
h
>>> print(chr(int(X[8:], 2)))
i

Assuming that the values encoded in the string are ASCII this will give you the characters. You can generalise it like this:

def decode_binary_string(s):
 return ''.join(chr(int(s[i*8:i*8+8],2)) for i in range(len(s)//8))
>>> decode_binary_string(X)
hi

If you want to keep it in the original encoding you don't need to decode any further. Usually you would convert the incoming string into a Python unicode string and that can be done like this (Python 2):

def decode_binary_string(s, encoding='UTF-8'):
 byte_string = ''.join(chr(int(s[i*8:i*8+8],2)) for i in range(len(s)//8))
 return byte_string.decode(encoding)
answered Nov 12, 2016 at 2:33
Sign up to request clarification or add additional context in comments.

3 Comments

Could you also add the reverse code? For converting string to binary. That would be great :)
@Dan: ''.join([bin(ord(c))[2:].rjust(8,'0') for c in 'hi'])
I'm way, way late to this solution but I'm curious. When I run the last of the code snippets above I get 'str' object has no attribute 'decode'. I bring this up because this solution appears perfect for what I need but the encoding (or rather decoding) part doesn't seem to work.
6

To convert bits given as a "01"-string (binary digits) into the corresponding text in Python 3:

>>> bits = "0110100001101001"
>>> n = int(bits, 2)
>>> n.to_bytes((n.bit_length() + 7) // 8, 'big').decode()
'hi'

For Python 2/3 solution, see Convert binary to ASCII and vice versa.

answered Nov 12, 2016 at 18:20

Comments

1

In Python 2, an ascii-encoded (byte) string is also a utf8-encoded (byte) string. In Python 3, a (unicode) string must be encoded to utf8-encoded bytes. The decoding example was going the wrong way.

>>> X = "1000100100010110001101000001101010110011001010100"
>>> X.encode()
b'1000100100010110001101000001101010110011001010100'

Strings containing only the digits '0' and '1' are a special case and the same rules apply.

answered Nov 11, 2016 at 22:57

1 Comment

So how could I decode X? X.decode() does not seem to work.
0

Provide the optional base argument to int to convert:

>> x = "1000100100010110001101000001101010110011001010100"
>> int(x, 2)
301456912901716
answered Nov 11, 2016 at 22:46

Comments

0
# Simple not elegant, used for a CTF challenge, did the trick
# Input of Binary, Seperated in Bytes
binary = "01000011 01010100 01000110 01111011 01000010 01101001 01110100 01011111 01000110 01101100 01101001 01110000 01110000 01101001 01101110 01111101"
# Add each item to a list at spaces
binlist = binary.split(" ")
# List to Hold Characters
chrlist = []
# Loop to convert
for i in binlist:
 chrlist.append(chr(int(i,2)))
# Print The list a joined string
print("".join(chrlist))
answered Sep 15, 2023 at 0:42

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.
-1

A working code for python 3

Binstr = '00011001 00001000'
Binstr.split(' ')
s = []
for i in Binstr:
 s.append(chr(i))
print(''.join(s))
LeopardShark
4,4964 gold badges21 silver badges37 bronze badges
answered Mar 19, 2022 at 16:34

1 Comment

Code syntax is invalid

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.