I have recently seen a few URIs containing the query parameter "utf8=β". My first impression (after thinking "mmm, looks cool") was that this could be used to detect a broken character encoding.
So, is this a better way to resolve potential problems with character encoding, or is it just a developer having fun with a hack?
-
7I disagree. There are schemes out there that look like URNs and that take query parameters - such as Bitcoin. URIs are not confined to browsers. See en.wikipedia.org/wiki/URI_scheme. This question may also address the general case where character encoding is required when a browser accesses a protocol handler.Gary– Gary10/19/2012 08:29:12Commented Oct 19, 2012 at 8:29
-
3Give examples of these URLs or didn't happen.hakre– hakre10/22/2012 12:59:07Commented Oct 22, 2012 at 12:59
-
11Off topic, but OK. Here's my personal donation Bitcoin URI: bitcoin:1KzTSfqjF2iKCduwz59nv2uqh1W2JsTxZH?amount=0.5&label=Agile%20Stack. Notice that the scheme is essentially a URN with query parameters, but it hands off to a protocol handler. This kind of URI could probably benefit from the "utf8=β" workaround as well.Gary– Gary10/22/2012 17:47:16Commented Oct 22, 2012 at 17:47
-
3@GaryRowe So did you ever get any donations off that link?Kyralessa– Kyralessa09/18/2018 10:30:48Commented Sep 18, 2018 at 10:30
-
@Gary I can't image possibly being a millionaire because of an off-hand comment on stackexchange 12 years ago. You're insanely lucky.stickynotememo– stickynotememo01/16/2025 08:10:55Commented Jan 16 at 8:10
1 Answer 1
By default, older versions of IE (<=8) will submit form data in Latin-1 encoding if possible. By including a character that can't be expressed in Latin-1, IE is forced to use UTF-8 encoding for its form submissions, which simplifies various backend processes, for example database persistence.
If the parameter was instead utf8=true
then this wouldn't trigger the UTF-8 encoding in these browsers.
-
8@LarsViklund I should have been clearer with my comment. I meant that the validation associated with character encoding is simplified, not bypassed.Gary– Gary10/13/2012 13:48:54Commented Oct 13, 2012 at 13:48
-
3@Lars Correct, it doesn't absolve you from having to check your input. But it does mean that encoding tweaks only become part of your security handling and don't taint the concept of your "standard processing" pathGareth– Gareth10/14/2012 10:08:18Commented Oct 14, 2012 at 10:08
-
40Also see stackoverflow.com/questions/3222013/…. Apparently Ruby on Rails used to use a snowman character, and was changed to a checkmark which was less ambiguous but less funny.Jack V.– Jack V.10/17/2012 10:06:03Commented Oct 17, 2012 at 10:06
-
11@JohnLBevan it's ignored by the receiving end, it's done it's job to force the browser to send things in utf8 instead of latin1. I've also seen it as ie=π© (that's the 'pile of poo' code point, looks like it's not rendering in comments.)cabbey– cabbey10/18/2012 19:54:13Commented Oct 18, 2012 at 19:54
-
3@Gareth: Can you back-up the statement that IE <= 8 forms do not support the document and/or form encoding?hakre– hakre10/22/2012 13:00:19Commented Oct 22, 2012 at 13:00
Explore related questions
See similar questions with these tags.