585

I have recently seen a few URIs containing the query parameter "utf8=βœ“". My first impression (after thinking "mmm, looks cool") was that this could be used to detect a broken character encoding.

So, is this a better way to resolve potential problems with character encoding, or is it just a developer having fun with a hack?

asked Oct 13, 2012 at 11:57
5
  • 7
    I disagree. There are schemes out there that look like URNs and that take query parameters - such as Bitcoin. URIs are not confined to browsers. See en.wikipedia.org/wiki/URI_scheme. This question may also address the general case where character encoding is required when a browser accesses a protocol handler. Commented Oct 19, 2012 at 8:29
  • 3
    Give examples of these URLs or didn't happen. Commented Oct 22, 2012 at 12:59
  • 11
    Off topic, but OK. Here's my personal donation Bitcoin URI: bitcoin:1KzTSfqjF2iKCduwz59nv2uqh1W2JsTxZH?amount=0.5&label=Agile%20Stack. Notice that the scheme is essentially a URN with query parameters, but it hands off to a protocol handler. This kind of URI could probably benefit from the "utf8=βœ“" workaround as well. Commented Oct 22, 2012 at 17:47
  • 3
    @GaryRowe So did you ever get any donations off that link? Commented Sep 18, 2018 at 10:30
  • @Gary I can't image possibly being a millionaire because of an off-hand comment on stackexchange 12 years ago. You're insanely lucky. Commented Jan 16 at 8:10

1 Answer 1

827

By default, older versions of IE (<=8) will submit form data in Latin-1 encoding if possible. By including a character that can't be expressed in Latin-1, IE is forced to use UTF-8 encoding for its form submissions, which simplifies various backend processes, for example database persistence.

If the parameter was instead utf8=true then this wouldn't trigger the UTF-8 encoding in these browsers.

ChrisF
39k11 gold badges129 silver badges169 bronze badges
answered Oct 13, 2012 at 12:45
14
  • 8
    @LarsViklund I should have been clearer with my comment. I meant that the validation associated with character encoding is simplified, not bypassed. Commented Oct 13, 2012 at 13:48
  • 3
    @Lars Correct, it doesn't absolve you from having to check your input. But it does mean that encoding tweaks only become part of your security handling and don't taint the concept of your "standard processing" path Commented Oct 14, 2012 at 10:08
  • 40
    Also see stackoverflow.com/questions/3222013/…. Apparently Ruby on Rails used to use a snowman character, and was changed to a checkmark which was less ambiguous but less funny. Commented Oct 17, 2012 at 10:06
  • 11
    @JohnLBevan it's ignored by the receiving end, it's done it's job to force the browser to send things in utf8 instead of latin1. I've also seen it as ie=πŸ’© (that's the 'pile of poo' code point, looks like it's not rendering in comments.) Commented Oct 18, 2012 at 19:54
  • 3
    @Gareth: Can you back-up the statement that IE <= 8 forms do not support the document and/or form encoding? Commented Oct 22, 2012 at 13:00

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.