I have string data that look like bytes reprs of JSON in Python
>>> data = """b'{"a": 1, "b": 2}\n'"""
So on the inside of that, we have valid JSON that looks like it's been byte-encoded. I want to decode the bytes and loads the JSON on the inside, but since its a string I cannot.
>>> data.decode() # nope
AttributeError: 'str' object has no attribute 'decode'
Encoding the string doesn't seem to help either:
>>> data.encode() # wrong
b'b\'{"a": 1, "b": 2}\n\''
There are oodles of string-to-bytes questions on stackoverflow, but for the life of me I cannot find anything about this particular issue. Does anyone know how this can be accomplished?
Things that I do not want to do and/or will not work:
evalthe data into a bytes object- strip the
band\n(inside of my JSON there's all sorts of other escaped data).
This is the only working solution I have found, and there is a lot not to like about it:
from ast import literal_eval
data = """b'{"a": 1, "b": 2}\n'"""
print(literal_eval(data[:-2] + data[-1:]).decode('utf-8'))
1 Answer 1
I know you said you didn't want to strip the b inside the string due to other escaped data, but can't we assume that whatever generated this only output ascii (hence the b), and we can re-encode that. So I was thinking you can use a simple regexp (https://regex101.com/r/M0ratk/1) which you then encode as bytes.
import json
import re
match = re.match(r"\Ab'(.*)'\Z", data, re.DOTALL)
data = json.loads(bytes(match[1], 'ascii'))
Will this work? I am not sure how it compares to the literal_eval solution.
ast.literal_eval. There's probably a good dupe target somewhere around here.literal_evalattempt is almost certainly due to a bug you introduced while attempting to write a string literal fordata- you've got an actual newline in the middle of your bytes literal, which is invalid syntax for a bytes literal. You probably meant for that to be an actual backslash and n - that, or the newline was supposed to be outside the bytes literal.data = r"""b'{"a": 1, "b": 2}\n'"""is likely more representative of the actual kinds of values you're working with. If it's not, then that's going to be an issue.