-
-
Notifications
You must be signed in to change notification settings - Fork 353
Description
I've been implementing the format assertions recently and I realized that the idn-*
formats are problematic. The specification that's used for idn-hostname
is known as IDNA2008. I've discovered that there are no JavaScript implementations. Normally, I wouldn't shy away from implementing it myself, but I've learned enough to know that any implementation would be prohibitively large (many times the size of my whole JSON Schema implementation). In most programming languages, including several hundred KB of unicode tables wouldn't be an issue, but it is in JavaScript.
There's another specification called UTS #46
and that's what all the browsers implement. It's a slightly lighter weight version of IDNA2008. That does have JavaScript implementations, but even the best ones are still prohibitively large and/or aren't complete.
So, there isn't a reasonable path to supporting idn-hostname
in JavaScript. That means that there's no reasonable path for JavaScript implementations to support the format-assertion vocabulary which states, that "implementations MUST provide full validation support for all of the formats defined by this specificaion." So, implementations are required to reject schemas that use the format-assertion vocabulary even if the schema doesn't use the format it doesn't support.
The next release fixes that problem by changing the requirement to only reject the schema if it encounters a format it doesn't fully support, but there's still no reasonable path for JavaScript implementations to support the three IDNA2008-based formats. That begs the question, should we change the requirements for the idn-*
keywords to make it less strict so every implementation can support it?
I said there were three formats that were affected. We talked about idn-hostname
already. idn-email
is affected because it specifies that the domain portion of the address is valid according to IDNA2008. Surprisingly, iri
isn't affected. It doesn't limit its hostnames to IDNA2008 compatible values. The third one that's affected is hostname
. In draft-07, it was added that hostnames converted to ASCII form (called A-Labels in IDNA2008) were valid hostname
s as well as traditional hostnames. So, in order to fully validate hostname
, implementations need to support IDNA2008.
Here are some proposals to consider. Not all of them are necessarily mutually compatible or mutually exclusive.
- Move all IDNA2008 dependent formats to the format registry. Although we encourage implementation of registry formats, it might be better to deprioritize formats that aren't universally supportable.
- Reduce the validation requirements for
idn-hostname
andidn-email
. We could require the ABNF used for IRI hostnames as minimum requirements. - Revert
hostname
to its draft-06 definition when it didn't require checking that it's a valid IDNA2008 A-Label.
I think we need to do (3) and either (1) or (2).