101

Currently I have this dictionary, printed using pprint:

{'AlarmExTempHum': '\x00\x00\x00\x00\x00\x00\x00\x00', 
'AlarmIn': 0, 
'AlarmOut': '\x00\x00', 
'AlarmRain': 0, 
'AlarmSoilLeaf': '\x00\x00\x00\x00', 
'BarTrend': 60, 
'BatteryStatus': 0, 
'BatteryVolts': 4.751953125, 
'CRC': 55003,
'EOL': '\n\r',
'ETDay': 0,
'ETMonth': 0,
'ETYear': 0,
'ExtraHum1': None,
'ExtraHum2': None,
'ExtraHum3': None,
'ExtraHum4': None,
'ExtraHum5': None,
'ExtraHum6': None,
'ExtraHum7': None,
'ExtraTemp1': None,
'ExtraTemp2': None,
'ExtraTemp3': None,
'ExtraTemp4': None,
'ExtraTemp5': None,
'ExtraTemp6': None,
'ExtraTemp7': None,
'ForecastIcon': 2,
'ForecastRuleNo': 122,
'HumIn': 31,
'HumOut': 94,
'LOO': 'LOO',
'LeafTemps': '\xff\xff\xff\xff',
'LeafWetness': '\xff\xff\xff\x00',
'NextRec': 37,
'PacketType': 0,
'Pressure': 995.9363359295631,
'RainDay': 0.0,
'RainMonth': 0.0,
'RainRate': 0.0,
'RainStorm': 0.0,
'RainYear': 2.8,
'SoilMoist': '\xff\xff\xff\xff',
'SoilTemps': '\xff\xff\xff\xff',
'SolarRad': None,
'StormStartDate': '2127-15-31',
'SunRise': 849,
'SunSet': 1611,
'TempIn': 21.38888888888889,
'TempOut': 0.8888888888888897,
'UV': None,
'WindDir': 219,
'WindSpeed': 3.6,
'WindSpeed10Min': 3.6}

When I do this:

import json
d = (my dictionary above)
jsonarray = json.dumps(d)

I get this error: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte

Tim Pietzcker
338k59 gold badges521 silver badges572 bronze badges
asked Feb 2, 2013 at 10:44
1
  • Your problem lies here : \xff Commented Feb 2, 2013 at 10:50

4 Answers 4

169

If you are fine with non-printable symbols in your json, then add ensure_ascii=False to dumps call.

>>> json.dumps(your_data, ensure_ascii=False)

If ensure_ascii is false, then the return value will be a unicode instance subject to normal Python str to unicode coercion rules instead of being escaped to an ASCII str.

answered Feb 2, 2013 at 10:50
Sign up to request clarification or add additional context in comments.

1 Comment

add indent=n to the options to pretty print, where n is the number of spaces to indent
18

ensure_ascii=False really only defers the issue to the decoding stage:

>>> dict2 = {'LeafTemps': '\xff\xff\xff\xff',}
>>> json1 = json.dumps(dict2, ensure_ascii=False)
>>> print(json1)
{"LeafTemps": "����"}
>>> json.loads(json1)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/lib/python2.7/json/__init__.py", line 328, in loads
 return _default_decoder.decode(s)
 File "/usr/lib/python2.7/json/decoder.py", line 365, in decode
 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
 File "/usr/lib/python2.7/json/decoder.py", line 381, in raw_decode
 obj, end = self.scan_once(s, idx)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte

Ultimately you can't store raw bytes in a JSON document, so you'll want to use some means of unambiguously encoding a sequence of arbitrary bytes as an ASCII string - such as base64.

>>> import json
>>> from base64 import b64encode, b64decode
>>> my_dict = {'LeafTemps': '\xff\xff\xff\xff',} 
>>> my_dict['LeafTemps'] = b64encode(my_dict['LeafTemps'])
>>> json.dumps(my_dict)
'{"LeafTemps": "/////w=="}'
>>> json.loads(json.dumps(my_dict))
{u'LeafTemps': u'/////w=='}
>>> new_dict = json.loads(json.dumps(my_dict))
>>> new_dict['LeafTemps'] = b64decode(new_dict['LeafTemps'])
>>> print new_dict
{u'LeafTemps': '\xff\xff\xff\xff'}
answered Feb 2, 2013 at 11:36

3 Comments

You could pass arbitrary binary data (inefficiently) in json using 'latin1' encoding
You could, I suppose, but json is designed/intended to use utf-8.
@J.F.Sebastian: Indeed, very inefficient as compared to b64encode. For example, for the 256 character string s = ''.join(chr(i) for i in xrange(256)), len(json.dumps(b64encode(s))) == 346 vs len(json.dumps(s.decode('latin1'))) == 1045.
10

If you use Python 2, don't forget to add the UTF-8 file encoding comment on the first line of your script.

# -*- coding: UTF-8 -*-

This will fix some Unicode problems and make your life easier.

KPrince36
2,9462 gold badges23 silver badges30 bronze badges
answered Sep 30, 2015 at 21:27

Comments

1

One possible solution that I use is to use python3. It seems to solve many utf issues.

Sorry for the late answer, but it may help people in the future.

For example,

#!/usr/bin/env python3
import json
# your code follows
answered Jul 3, 2015 at 2:04

1 Comment

Sure, you're damn right, Python 3 solved many encoding issues. But that's not the answer to that question. It is explicitly tagged with python-2.7. So what you are saying is something like this: There is no built-in vacuum cleaner in your old car. So please buy a new car instead of adding a vacuum cleaner in your old car.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.