Decoding JSON that contains Base64

Question 1

I'm sending a request for a set of images to one of my API's. The API returns these images in a JSON format. This format contains data about the resource together with a single property that represents the image in Base64.

An example of the JSON being returned.

{
 "id": 548613,
 "filename": "00548613.png",
 "pictureTaken": "2020-03-30T11:38:21.003",
 "isVisible": true,
 "lotcode": 23,
 "company": "05",
 "concern": "46",
 "base64": "..."
}

The correct content of the Base64
The incorrectly parsed Base64

This is done with the Python3 requests library. When i receive a successful response from the API i attempt to decode the body to JSON using:

url = self.__url__(f"/rest/all/V1/products/{sku}/images")
headers = self.__headers__()
r = requests.get(url=url, headers=headers)
if r.status_code == 200:
 return r.json()
elif r.status_code == 404:
 return None
else:
 raise IOError(
 f"Error retrieving product '{sku}', got {r.status_code}: '{r.text}'")

Calling .json() results in the Base64 content being messed up, some parts are not there, and some are replaced with other characters. I tried manually decoding the content using r.content.decode() with the utf-8 and ascii options to see if this was the problem after seeing this post. Sadly this didn't work. I know the response from the server is correct, it works with Postman, and calling print(r.content) results in a JSON document containing the valid Base64.

How would i go about de-serializing the response from the API to get the valid Base64?

Question 2

@Trenton I assume you mean the Base64, sadly i cannot share it because i do not have ownership of the serialized resources.

Question 3

@Harjan Take a random image of a duck. Convert it to base64. Put that base64 in a request like the one you provided and see if the problem arises. If yes, post that request so we can try.

Question 4

@Trenton I have added some Base64, it should be a 1024x1024 picture of a pink and white box when parsed correctly.

Question 5

import base64
import re
...
b64text = re.search(b"\"base64\": \"(?P<base>.*)\"", r.content, flags=re.MULTILINE).group("base")
decode = base64.b64decode(b64text).decode(utf-8)

Since you're saying "calling print(r.content) results in the valid Base64", it's just a matter of decoding the base64.

Question 6

Good suggestion, i think this might have worked if it was just Base64 that was being returned. Calling this on my content results in the entire JSON response being decoded from Base64.

Question 7

@Harjan then it's just a matter of extracting the base64 data from the text directly, see my answer for an example implementation.

Question 8

I tried your edited solution. But calling r.content or r.text results in the same corrupted Base64. Extracting works, but parsing is not possible because it still contains the illegal characters.

Question 9

@Harjan Check your content-type and charset, the default in requests is text/html, you can set a charset utf-8, that's probably not what your API is using, set the appropriate value using r.encoding and retry. Have you tried using urrlib and reproducing this behaviour?

Question 10

follow stackoverflow.com/questions/37225035/…

qedk 5286 silver badges19 bronze badges · Accepted Answer · 2020-06-15 17:19:19Z

1

import base64
import re
...
b64text = re.search(b"\"base64\": \"(?P<base>.*)\"", r.content, flags=re.MULTILINE).group("base")
decode = base64.b64decode(b64text).decode(utf-8)

Since you're saying "calling print(r.content) results in the valid Base64", it's just a matter of decoding the base64.

Share

Improve this answer

edited Jun 17, 2020 at 7:58

answered Jun 15, 2020 at 17:19

qedk's user avatar

qedk

5286 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Harjan

Harjan Over a year ago

Good suggestion, i think this might have worked if it was just Base64 that was being returned. Calling this on my content results in the entire JSON response being decoded from Base64.

2020年06月15日T17:58:38Z+00:00

qedk

qedk Over a year ago

@Harjan then it's just a matter of extracting the base64 data from the text directly, see my answer for an example implementation.

2020年06月15日T20:00:53.617Z+00:00

Harjan

Harjan Over a year ago

I tried your edited solution. But calling r.content or r.text results in the same corrupted Base64. Extracting works, but parsing is not possible because it still contains the illegal characters.

2020年06月16日T07:43:33.907Z+00:00

qedk

qedk Over a year ago

@Harjan Check your content-type and charset, the default in requests is text/html, you can set a charset utf-8, that's probably not what your API is using, set the appropriate value using r.encoding and retry. Have you tried using urrlib and reproducing this behaviour?

2020年06月16日T10:21:22.86Z+00:00

PruthviRaj Reddy

PruthviRaj Reddy Over a year ago

follow stackoverflow.com/questions/37225035/…

2020年06月16日T22:02:15.957Z+00:00

|

CollectivesTM on Stack Overflow

Decoding JSON that contains Base64

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related