Let's say I send a request with JavaScript:
fetch("/query?q=" + encodeURIComponent("c'est un château? Yes & no!"));
to a Python Bottle server:
from bottle import Bottle, request
app = Bottle("")
@app.route("/query")
def query():
q = request.query.get("q") # how to decode it?
print(q)
app.run()
How to decode request.query.get("q") so that it extracts the encodeURIComponent encoding? In this example, the ? and & are correctly decoded, but the â is not: print(q) gives c'est un château? Yes & no!
1 Answer 1
Bottle implements its own query parsing, _parse_qsl
def _parse_qsl(qs):
r = []
for pair in qs.split('&'):
if not pair: continue
nv = pair.split('=', 1)
if len(nv) != 2: nv.append('')
key = urlunquote(nv[0].replace('+', ' '))
value = urlunquote(nv[1].replace('+', ' '))
r.append((key, value))
return r
urlunquote is either urllib.unquote in Python 2.x, or urllib.parse.unquote with encoding preset to 'latin1' in Python 3.x:
from urllib.parse import urlencode, quote as urlquote, unquote as urlunquote
urlunquote = functools.partial(urlunquote, encoding='latin1')
It's this assumption that leads to the result you're seeing, whereas the default 'utf8' would work:
>>> from urllib.parse import unquote
>>> quoted = "c'est%20un%20ch%C3%A2teau%3F%20Yes%20%26%20no!"
>>> unquote(quoted, encoding="latin1")
"c'est un château? Yes & no!"
>>> unquote(quoted)
"c'est un château? Yes & no!"
The unquoted values are then fed into a FormsDict, where either attribute access or calling the getunicode method would give you the UTF-8 version, whereas the get method and index access give Latin-1:
>>> from bottle import FormsDict
>>> fd = FormsDict()
>>> fd["q"] = "c'est un château? Yes & no!"
>>> fd["q"]
"c'est un château? Yes & no!"
>>> fd.get("q")
"c'est un château? Yes & no!"
>>> fd.q
"c'est un château? Yes & no!"
>>> fd.getunicode("q")
"c'est un château? Yes & no!"
Alternatively, you can decode a version where everything is UTF-8:
>>> utfd = fd.decode("utf8")
>>> utfd["q"]
"c'est un château? Yes & no!"
>>> utfd.get("q")
"c'est un château? Yes & no!"
>>> utfd.q
"c'est un château? Yes & no!"
>>> utfd.getunicode("q")
"c'est un château? Yes & no!"
This is covered in the docs:
Additionally to the normal dict-like item access methods (which return unmodified data as native strings), [
FormsDict] also supports attribute-like access to its values. Attributes are automatically de- or recoded to matchinput_encoding(default: ‘utf8’).
and:
To simplify dealing with lots of unreliable user input,
FormsDictexposes all its values as attributes, but with a twist: These virtual attributes always return properly encoded unicode strings, even if the value is missing or character decoding fails. They never returnNoneor throw an exception, but return an empty string instead:name = request.query.name # may be an empty string...
>>> request.query['city'] 'Göttingen' # An utf8 string provisionally decoded as ISO-8859-1 by the server >>> request.query.city 'Göttingen' # The same string correctly re-encoded as utf8 by bottle
3 Comments
app = Bottle(), or when importing Bottle, to set request.query.get("q") to always use utf8 instead of latin1?'latin1' passed in the partial isn't configurable. Just use request.query.q as documented (or getunicode if you need the default, or decode with an explicit encoding).Explore related questions
See similar questions with these tags.
print(q)?print("château")even work?requests.query.q, or I guess you could.encode("latin1").decode("utf8"), but I don't use Bottle so don't know what the common usage would be; I just looked at the code.