-1

Let's say I send a request with JavaScript:

fetch("/query?q=" + encodeURIComponent("c'est un château? Yes & no!"));

to a Python Bottle server:

from bottle import Bottle, request
app = Bottle("")
@app.route("/query")
def query():
 q = request.query.get("q") # how to decode it?
 print(q)
app.run() 

How to decode request.query.get("q") so that it extracts the encodeURIComponent encoding? In this example, the ? and & are correctly decoded, but the â is not: print(q) gives c'est un château? Yes & no!

jonrsharpe
123k31 gold badges278 silver badges489 bronze badges
asked Sep 20, 2024 at 21:32
7
  • You shouldn't need to decode it. The bottle module should decode it automatically. Commented Sep 20, 2024 at 21:34
  • What do you see if you do print(q)? Commented Sep 20, 2024 at 21:35
  • The general rule is that you can expect that middleware will automatically perform any necessary encoding/decoding. That's one of its jobs. Commented Sep 20, 2024 at 21:41
  • 1
    Are you on Windows? Does print("château") even work? Commented Sep 20, 2024 at 21:54
  • 1
    github.com/bottlepy/bottle/issues/1408 suggests requests.query.q, or I guess you could .encode("latin1").decode("utf8"), but I don't use Bottle so don't know what the common usage would be; I just looked at the code. Commented Sep 20, 2024 at 22:02

1 Answer 1

2

Bottle implements its own query parsing, _parse_qsl

def _parse_qsl(qs):
 r = []
 for pair in qs.split('&'):
 if not pair: continue
 nv = pair.split('=', 1)
 if len(nv) != 2: nv.append('')
 key = urlunquote(nv[0].replace('+', ' '))
 value = urlunquote(nv[1].replace('+', ' '))
 r.append((key, value))
 return r

urlunquote is either urllib.unquote in Python 2.x, or urllib.parse.unquote with encoding preset to 'latin1' in Python 3.x:

 from urllib.parse import urlencode, quote as urlquote, unquote as urlunquote
 urlunquote = functools.partial(urlunquote, encoding='latin1')

It's this assumption that leads to the result you're seeing, whereas the default 'utf8' would work:

>>> from urllib.parse import unquote
>>> quoted = "c'est%20un%20ch%C3%A2teau%3F%20Yes%20%26%20no!"
>>> unquote(quoted, encoding="latin1")
"c'est un château? Yes & no!"
>>> unquote(quoted)
"c'est un château? Yes & no!"

The unquoted values are then fed into a FormsDict, where either attribute access or calling the getunicode method would give you the UTF-8 version, whereas the get method and index access give Latin-1:

>>> from bottle import FormsDict
>>> fd = FormsDict()
>>> fd["q"] = "c'est un château? Yes & no!"
>>> fd["q"]
"c'est un château? Yes & no!"
>>> fd.get("q")
"c'est un château? Yes & no!"
>>> fd.q
"c'est un château? Yes & no!"
>>> fd.getunicode("q")
"c'est un château? Yes & no!"

Alternatively, you can decode a version where everything is UTF-8:

>>> utfd = fd.decode("utf8")
>>> utfd["q"]
"c'est un château? Yes & no!"
>>> utfd.get("q")
"c'est un château? Yes & no!"
>>> utfd.q
"c'est un château? Yes & no!"
>>> utfd.getunicode("q")
"c'est un château? Yes & no!"

This is covered in the docs:

Additionally to the normal dict-like item access methods (which return unmodified data as native strings), [FormsDict] also supports attribute-like access to its values. Attributes are automatically de- or recoded to match input_encoding (default: ‘utf8’).

and:

To simplify dealing with lots of unreliable user input, FormsDict exposes all its values as attributes, but with a twist: These virtual attributes always return properly encoded unicode strings, even if the value is missing or character decoding fails. They never return None or throw an exception, but return an empty string instead:

name = request.query.name # may be an empty string

...

>>> request.query['city']
'Göttingen' # An utf8 string provisionally decoded as ISO-8859-1 by the server
>>> request.query.city
'Göttingen' # The same string correctly re-encoded as utf8 by bottle
answered Sep 20, 2024 at 22:23
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks a lot! Do you think there is an option when creating the Bottle server app = Bottle(), or when importing Bottle, to set request.query.get("q") to always use utf8 instead of latin1?
I don't know; again, I don't use Bottle. "A word on unicode and character encodings" suggests not. Certainly the 'latin1' passed in the partial isn't configurable. Just use request.query.q as documented (or getunicode if you need the default, or decode with an explicit encoding).
Thanks! By reading your answer, I see it's really not trivial, and rather tricky. (Once again, I don't see why the downvotes for such a non-trivial question/answer)

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.