0

meh, I'm not a fan of utf-8 in python; can't seem to figure out how to solve this. As you can see I'm already trying to B64 encode the value, but it looks like python is trying to convert it from utf-8 to ascii first...

In general I'm trying to POST form data that has UTF-8 characters with urllib2. I guess in general its the same as How to send utf-8 content in a urllib2 request? though there is no valid answer on that. I'm trying to send only a byte string by base64 encoding it.

Traceback (most recent call last):
 File "load.py", line 165, in <module>
 main()
 File "load.py", line 17, in main
 beers()
 File "load.py", line 157, in beers
 resp = send_post("http://localhost:9000/beers", beer)
 File "load.py", line 64, in send_post
 connection.request ('POST', req.get_selector(), *encode_multipart_data (data, files))
 File "load.py", line 49, in encode_multipart_data
 lines.extend (encode_field (name))
 File "load.py", line 34, in encode_field
 '', base64.b64encode(u"%s" % data[field_name]))
 File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/base64.py", line 53, in b64encode
 encoded = binascii.b2a_base64(s)[:-1]
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 7: ordinal not in range(128)

Code:

def random_string (length):
 return ''.join (random.choice (string.ascii_letters) for ii in range (length + 1))
def encode_multipart_data (data, files):
 boundary = random_string (30)
 def get_content_type (filename):
 return mimetypes.guess_type (filename)[0] or 'application/octet-stream'
 def encode_field (field_name):
 return ('--' + boundary,
 'Content-Disposition: form-data; name="%s"' % field_name,
 'Content-Transfer-Encoding: base64',
 '', base64.b64encode(u"%s" % data[field_name]))
 def encode_file (field_name):
 filename = files [field_name]
 file_size = os.stat(filename).st_size
 file_data = open(filename, 'rb').read()
 file_b64 = base64.b64encode(file_data)
 return ('--' + boundary,
 'Content-Disposition: form-data; name="%s"; filename="%s"' % (field_name, filename),
 'Content-Type: %s' % get_content_type(filename),
 'Content-Transfer-Encoding: base64',
 '', file_b64)
 lines = []
 for name in data:
 lines.extend (encode_field (name))
 for name in files:
 lines.extend (encode_file (name))
 lines.extend (('--%s--' % boundary, ''))
 body = '\r\n'.join (lines)
 headers = {'content-type': 'multipart/form-data; boundary=' + boundary,
 'content-length': str(len(body))}
 return body, headers
def send_post (url, data, files={}):
 req = urllib2.Request (url)
 connection = httplib.HTTPConnection (req.get_host())
 connection.request ('POST', req.get_selector(), *encode_multipart_data (data, files))
 return connection.getresponse()

The beer object's json is (this is the data being passed into encode_multipart_data):

 {
 "name" : "Yuengling Oktoberfest",
 "brewer" : "Yuengling Brewery",
 "description" : "America’s Oldest Brewery is proud to offer Yuengling Oktoberfest Beer. Copper in color, this medium bodied beer is the perfect blend of roasted malts with just the right amount of hops to capture a true representation of the style. Enjoy a Yuengling Oktoberfest Beer in celebration of the season, while supplies last!",
 "abv" : 5.2, 
 "ibu" : 26, 
 "type" : "Lager",
 "subtype" : "",
 "color" : "",
 "seasonal" : true,
 "servingTemp" : "Cold",
 "rating" : 3,
 "inProduction": true 
 }
asked Sep 17, 2013 at 4:30
8
  • 1
    How do you expect to base64 encode Unicode? Do you want to encode the raw UTF8 bytes as base64? Commented Sep 17, 2013 at 4:36
  • What is the value of beer? Commented Sep 17, 2013 at 4:39
  • 2
    @Robφ - Added beer to the question Commented Sep 17, 2013 at 4:48
  • 1
    The error is referring to '\u2019', which is a quote character '’', that I don't see in your data. Commented Sep 17, 2013 at 5:40
  • @Blckknght - Sorry, I grabbed the wrong beer. I've changed it to the one with the '’' in it. But still, there has to be an issue with the handling of UTF in my code. Commented Sep 17, 2013 at 5:42

1 Answer 1

4

You can't base64-encode Unicode, only byte strings. In Python 2.7, giving a Unicode string to a function that requires a byte string causes an implicit conversion to a byte string using the ascii codec, resulting in the error you see:

>>> base64.b64encode(u'America\u2019s')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "C:\Python27\lib\base64.py", line 53, in b64encode
 encoded = binascii.b2a_base64(s)[:-1]
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 7: ordinal not in range(128)

So encode it to a byte string using a valid encoding first:

>>> base64.b64encode(u'America\u2019s'.encode('utf8'))
'QW1lcmljYeKAmXM='
answered Sep 17, 2013 at 6:18
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.