1

I have built two parsers which are working for everything except some binary blocks. The one goes from a proprietary format into standard json and the other one brings it from json back into the proprietary format.

When I wrote the one which went to json I was just happy to get it all to parse into valid json but worried that I may not have been able to bring the binary sections back, and the concern seems to have come true.

The base64 I believe is necessary or one possible solution as the binary is otherwise too full of characters which json does not like, I think trying to escape them etc would be more challenging than this base64 solution.

So here is a binary block from the original file:

cleanbinary = "0\x82\x02\xd80\x82\x01\xc0\xa0"

It is brought into base64 like this:

import base64 
out = base64.encodebytes(cleanbinary.encode('utf-8'))
print(out)
>> b'MMKCAsOYMMKCAcOAwqA=\n'

That, you can turn back into binary:

z = base64.decodebytes(out).decode('utf-8')
print(z == cleanbinary)
>> True

It needs a step in the middle however which I just can't work out to get it into json in the middle of a loop. Have tried the following:

wrapped = '"' + str(out) + '"'

So now you have the double quotations which json needs and it is a str rather than bytes:

print(wrapped)
>> '"b\'MMKCAsOYMMKCAcOAwqA=\\n\'"'

Now lets say you had plucked this string value out of a json file with Python's json parser. How do you turn it back into a bytes value:

b'MMKCAsOYMMKCAcOAwqA=\n'

..so that it can be parsed back into binary?

Martin Stone
13.1k3 gold badges43 silver badges54 bronze badges
asked Apr 30, 2019 at 12:00
2
  • the simplest way to do this(probably not the best way) is to use literal_eval Commented Apr 30, 2019 at 12:05
  • bytes("some_text")? Commented Apr 30, 2019 at 12:05

1 Answer 1

2

I'd suggest converting your bytes to a string and back by explicitly decoding and encoding it:

out = b'MMKCAsOYMMKCAcOAwqA='
wrapped = f'"{out.decode()}"'
print(wrapped) # ---> "MMKCAsOYMMKCAcOAwqA="
unwrapped = wrapped.strip('"').encode()
print(unwrapped) # ---> b'MMKCAsOYMMKCAcOAwqA='
answered Apr 30, 2019 at 12:19
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.