Parameter encoded with encodeURIComponent, how to get it in a Python Bottle server?

Question 1

Let's say I send a request with JavaScript:

fetch("/query?q=" + encodeURIComponent("c'est un château? Yes & no!"));

to a Python Bottle server:

from bottle import Bottle, request
app = Bottle("")
@app.route("/query")
def query():
 q = request.query.get("q") # how to decode it?
 print(q)
app.run()

How to decode request.query.get("q") so that it extracts the encodeURIComponent encoding? In this example, the ? and & are correctly decoded, but the â is not: print(q) gives c'est un chÃ¢teau? Yes & no!

Question 2

You shouldn't need to decode it. The bottle module should decode it automatically.

Question 3

What do you see if you do print(q)?

Question 4

The general rule is that you can expect that middleware will automatically perform any necessary encoding/decoding. That's one of its jobs.

Question 5

Are you on Windows? Does print("château") even work?

Question 6

github.com/bottlepy/bottle/issues/1408 suggests requests.query.q, or I guess you could .encode("latin1").decode("utf8"), but I don't use Bottle so don't know what the common usage would be; I just looked at the code.

Question 7

Bottle implements its own query parsing, _parse_qsl

def _parse_qsl(qs):
 r = []
 for pair in qs.split('&'):
 if not pair: continue
 nv = pair.split('=', 1)
 if len(nv) != 2: nv.append('')
 key = urlunquote(nv[0].replace('+', ' '))
 value = urlunquote(nv[1].replace('+', ' '))
 r.append((key, value))
 return r

urlunquote is either urllib.unquote in Python 2.x, or urllib.parse.unquote with encoding preset to 'latin1' in Python 3.x:

 from urllib.parse import urlencode, quote as urlquote, unquote as urlunquote
 urlunquote = functools.partial(urlunquote, encoding='latin1')

It's this assumption that leads to the result you're seeing, whereas the default 'utf8' would work:

>>> from urllib.parse import unquote
>>> quoted = "c'est%20un%20ch%C3%A2teau%3F%20Yes%20%26%20no!"
>>> unquote(quoted, encoding="latin1")
"c'est un chÃ¢teau? Yes & no!"
>>> unquote(quoted)
"c'est un château? Yes & no!"

The unquoted values are then fed into a FormsDict, where either attribute access or calling the getunicode method would give you the UTF-8 version, whereas the get method and index access give Latin-1:

>>> from bottle import FormsDict
>>> fd = FormsDict()
>>> fd["q"] = "c'est un chÃ¢teau? Yes & no!"
>>> fd["q"]
"c'est un chÃ¢teau? Yes & no!"
>>> fd.get("q")
"c'est un chÃ¢teau? Yes & no!"
>>> fd.q
"c'est un château? Yes & no!"
>>> fd.getunicode("q")
"c'est un château? Yes & no!"

Alternatively, you can decode a version where everything is UTF-8:

>>> utfd = fd.decode("utf8")
>>> utfd["q"]
"c'est un château? Yes & no!"
>>> utfd.get("q")
"c'est un château? Yes & no!"
>>> utfd.q
"c'est un château? Yes & no!"
>>> utfd.getunicode("q")
"c'est un château? Yes & no!"

This is covered in the docs:

Additionally to the normal dict-like item access methods (which return unmodified data as native strings), [FormsDict] also supports attribute-like access to its values. Attributes are automatically de- or recoded to match input_encoding (default: ‘utf8’).

and:

To simplify dealing with lots of unreliable user input, FormsDict exposes all its values as attributes, but with a twist: These virtual attributes always return properly encoded unicode strings, even if the value is missing or character decoding fails. They never return None or throw an exception, but return an empty string instead:
name = request.query.name # may be an empty string
...
>>> request.query['city']
'GÃ¶ttingen' # An utf8 string provisionally decoded as ISO-8859-1 by the server
>>> request.query.city
'Göttingen' # The same string correctly re-encoded as utf8 by bottle

Question 8

Thanks a lot! Do you think there is an option when creating the Bottle server app = Bottle(), or when importing Bottle, to set request.query.get("q") to always use utf8 instead of latin1?

Question 9

I don't know; again, I don't use Bottle. "A word on unicode and character encodings" suggests not. Certainly the 'latin1' passed in the partial isn't configurable. Just use request.query.q as documented (or getunicode if you need the default, or decode with an explicit encoding).

Question 10

Thanks! By reading your answer, I see it's really not trivial, and rather tricky. (Once again, I don't see why the downvotes for such a non-trivial question/answer)

jonrsharpe 123k31 gold badges278 silver badges489 bronze badges · Accepted Answer · 2024-09-20 22:23:40Z

Bottle implements its own query parsing, _parse_qsl

def _parse_qsl(qs):
 r = []
 for pair in qs.split('&'):
 if not pair: continue
 nv = pair.split('=', 1)
 if len(nv) != 2: nv.append('')
 key = urlunquote(nv[0].replace('+', ' '))
 value = urlunquote(nv[1].replace('+', ' '))
 r.append((key, value))
 return r

urlunquote is either urllib.unquote in Python 2.x, or urllib.parse.unquote with encoding preset to 'latin1' in Python 3.x:

 from urllib.parse import urlencode, quote as urlquote, unquote as urlunquote
 urlunquote = functools.partial(urlunquote, encoding='latin1')

It's this assumption that leads to the result you're seeing, whereas the default 'utf8' would work:

>>> from urllib.parse import unquote
>>> quoted = "c'est%20un%20ch%C3%A2teau%3F%20Yes%20%26%20no!"
>>> unquote(quoted, encoding="latin1")
"c'est un chÃ¢teau? Yes & no!"
>>> unquote(quoted)
"c'est un château? Yes & no!"

The unquoted values are then fed into a FormsDict, where either attribute access or calling the getunicode method would give you the UTF-8 version, whereas the get method and index access give Latin-1:

>>> from bottle import FormsDict
>>> fd = FormsDict()
>>> fd["q"] = "c'est un chÃ¢teau? Yes & no!"
>>> fd["q"]
"c'est un chÃ¢teau? Yes & no!"
>>> fd.get("q")
"c'est un chÃ¢teau? Yes & no!"
>>> fd.q
"c'est un château? Yes & no!"
>>> fd.getunicode("q")
"c'est un château? Yes & no!"

Alternatively, you can decode a version where everything is UTF-8:

>>> utfd = fd.decode("utf8")
>>> utfd["q"]
"c'est un château? Yes & no!"
>>> utfd.get("q")
"c'est un château? Yes & no!"
>>> utfd.q
"c'est un château? Yes & no!"
>>> utfd.getunicode("q")
"c'est un château? Yes & no!"

This is covered in the docs:

Additionally to the normal dict-like item access methods (which return unmodified data as native strings), [FormsDict] also supports attribute-like access to its values. Attributes are automatically de- or recoded to match input_encoding (default: ‘utf8’).

and:

To simplify dealing with lots of unreliable user input, FormsDict exposes all its values as attributes, but with a twist: These virtual attributes always return properly encoded unicode strings, even if the value is missing or character decoding fails. They never return None or throw an exception, but return an empty string instead:
name = request.query.name # may be an empty string
...
>>> request.query['city']
'GÃ¶ttingen' # An utf8 string provisionally decoded as ISO-8859-1 by the server
>>> request.query.city
'Göttingen' # The same string correctly re-encoded as utf8 by bottle

Thanks a lot! Do you think there is an option when creating the Bottle server app = Bottle(), or when importing Bottle, to set request.query.get("q") to always use utf8 instead of latin1?
I don't know; again, I don't use Bottle. "A word on unicode and character encodings" suggests not. Certainly the 'latin1' passed in the partial isn't configurable. Just use request.query.q as documented (or getunicode if you need the default, or decode with an explicit encoding).
Thanks! By reading your answer, I see it's really not trivial, and rather tricky. (Once again, I don't see why the downvotes for such a non-trivial question/answer)

CollectivesTM on Stack Overflow

Parameter encoded with encodeURIComponent, how to get it in a Python Bottle server?

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related