Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

The idn-* formats are problematic #1636

Open
@jdesrosiers

Description

I've been implementing the format assertions recently and I realized that the idn-* formats are problematic. The specification that's used for idn-hostname is known as IDNA2008. I've discovered that there are no JavaScript implementations. Normally, I wouldn't shy away from implementing it myself, but I've learned enough to know that any implementation would be prohibitively large (many times the size of my whole JSON Schema implementation). In most programming languages, including several hundred KB of unicode tables wouldn't be an issue, but it is in JavaScript.

There's another specification called UTS #46 and that's what all the browsers implement. It's a slightly lighter weight version of IDNA2008. That does have JavaScript implementations, but even the best ones are still prohibitively large and/or aren't complete.

So, there isn't a reasonable path to supporting idn-hostname in JavaScript. That means that there's no reasonable path for JavaScript implementations to support the format-assertion vocabulary which states, that "implementations MUST provide full validation support for all of the formats defined by this specificaion." So, implementations are required to reject schemas that use the format-assertion vocabulary even if the schema doesn't use the format it doesn't support.

The next release fixes that problem by changing the requirement to only reject the schema if it encounters a format it doesn't fully support, but there's still no reasonable path for JavaScript implementations to support the three IDNA2008-based formats. That begs the question, should we change the requirements for the idn-* keywords to make it less strict so every implementation can support it?

I said there were three formats that were affected. We talked about idn-hostname already. idn-email is affected because it specifies that the domain portion of the address is valid according to IDNA2008. Surprisingly, iri isn't affected. It doesn't limit its hostnames to IDNA2008 compatible values. The third one that's affected is hostname. In draft-07, it was added that hostnames converted to ASCII form (called A-Labels in IDNA2008) were valid hostnames as well as traditional hostnames. So, in order to fully validate hostname, implementations need to support IDNA2008.

Here are some proposals to consider. Not all of them are necessarily mutually compatible or mutually exclusive.

  1. Move all IDNA2008 dependent formats to the format registry. Although we encourage implementation of registry formats, it might be better to deprioritize formats that aren't universally supportable.
  2. Reduce the validation requirements for idn-hostname and idn-email. We could require the ABNF used for IRI hostnames as minimum requirements.
  3. Revert hostname to its draft-06 definition when it didn't require checking that it's a valid IDNA2008 A-Label.

I think we need to do (3) and either (1) or (2).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /