Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Describe syntax with regex. #1229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ioggstream wants to merge 11 commits into json-schema-org:main from ioggstream:ioggstream-regex

Conversation

@ioggstream
Copy link
Contributor

@ioggstream ioggstream commented May 16, 2022

I suggest

  • describing syntax using regexp
  • remove non-actionable informative content since it distracts the reader from the normative part (it can be added to a FAQ section)

Note

The spec already references ECMA regexp

Copy link
Member

If we change this I think I'd prefer ABNF instead, as it's more common in specifications for describing syntax.

karenetheridge reacted with thumbs up emoji

Copy link
Member

What purpose does this achieve? Do you think this paragraph is actually confusing or could be improved? Could you elaborate on your experience a bit?

Like @gregsdennis points out, if the rationale is to make the description machine-readable, ABNF would be preferred. While we technically could use the ECMA RegExp syntax (they describe the same thing), the ABNF is more accessible to the standards community at large; and this usage of the ECMA syntax is too ambiguous (it lacks the slashes that quote the RegExp, and it lacks start and end anchors that are implicit in ABNF, but not in RegExp... we would want to write an actual line of ECMAScript).

And the implementation guidance should be left in-line; explanations like this are critical for helping readers make connections between related concepts before they move on to the next section. In this case, readers will make the connection that the syntax is the same as in XML; even if they haven't read the XML spec, they're likely to have seen this pattern before.

karenetheridge reacted with thumbs up emoji

Copy link
Member

karenetheridge commented May 16, 2022
edited
Loading

This regex is not correct.

  • it must be anchored with ^ and $
  • "any number" can be zero, so + should be *
  • literal hyphen characters must be at the start or the end of the bracketed character class, so they are not interpreted as part of a range.

Therefore: ^[A-Za-z_][A-Za-z0-9._-]*$

ioggstream reacted with thumbs up emoji

Copy link
Contributor Author

I don't have preferences between abnf and regexp.
Here and elsewhere in this document, both are ok as long as it's not in prose :)

we would want to write an actual line of ECMAScript

+1 Since this spec doesn't use ABNF, I used ecma to avoid adding normative references.

About the reference to xml, I think it does not help that much since the regexp is easy.

In general I find this document very long: this results in readers just skipping this and just look for secondary sources :)

Copy link
Contributor

@ioggstream the point of the reference to XML is that the WC3 Best Practices for Fragment Identifiers and Media Type Definitions (which is also cited elsewhere in JSON Schema Core) references the XML NCName production as part of their best practices for plain name frag ids (see the large quote below for details).

There is more going on in this section than just how to represent the syntax (I agree with @gregsdennis and @awwright that ABNF is preferable). The point of this part of the spec is to tie JSON Schema's plain name fragments to the larger standards ecosystem of such things. That could be done more clearly and concisely, but it should not be removed.

Plain names are a common type of fragid structure. A plain name fragid is a fragid that is used to identify a named structure within a document, such as one identified by an @id attribute in HTML, a @xml:id attribute in XML or the name of a function within a Python program. These fragids are opaque to processors and as such they do not normally include punctuation characters, though this depends on the language: in XML, for example, they usually match the NCName production from XML Namespaces [XML-NAMES11] which means they can contain hyphens (-) and periods (.).

Plain name fragids are usually created by human authors but may also be generated by applications. They provide a good method of identifying content that is equivalent across content-negotiated variants of a document, for example paragraphs of text in French and Chinese that contain the same semantic content. Plain name fragids that do not identify a portion of a document are frequently used in Semantic Web applications as a way of providing an identifier for something described by the document.

Best Practice 3: Reserve Plain Name Fragids

If the media type includes structures that can be given local names or identifiers, plain name fragids should be reserved for addressing those structures.

awwright and jdesrosiers reacted with thumbs up emoji

Copy link
Contributor Author

... e WC3 Best Practices for Fragment Identifiers and Media Type Definitions (which is also cited elsewhere in JSON Schema Core) references the XML NCName

Ok, it wasn't clear to me when reading the section, and I think it should be clarified e.g. in a referenced Appendix (e.g. JSON Schema and XML Schema ...) so that the reader can better identify normative parts.

Copy link
Contributor

handrews commented Jun 1, 2022
edited
Loading

@ioggstream I don't think there's any need for an appendix here. All this section needs to do is:

  1. xref the Best Practices document to explain why this syntax (XML's NCName) is used (not currently done in this section)
  2. xref the XML Namesaces spec for the normative NCName production (currently done, just not clear why or whether it's normative)
  3. Reproduce the NCName ABNF in a simplified form (without the layers of other ABNF that are irrelevant for JSON Schema) for informative convenience (currently done in prose, and presented as normative)
ioggstream, jdesrosiers, and awwright reacted with thumbs up emoji

Copy link
Member

awwright commented Jun 2, 2022

  1. (not currently done in this section)

We do discuss it elsewhere though, in https://json-schema.org/draft/2020-12/json-schema-core.html#rfc.section.5.

just not clear why or whether it's normative

XML technically supports Unicode characters and for ease of getting the first draft out we opted not to support that. We can change that now.

Even if XML only specified the same ASCII set that we do, it might still be easier to explain "use this pattern ... which is selected to match other popular syntaxes including XML" rather than "refer to the XML definition."

Copy link
Contributor

handrews commented Jun 2, 2022

@awwright

We do discuss it elsewhere though

yes, that's why I said "in this section" 🙂

Even if XML only specified the same ASCII set that we do, it might still be easier to explain "use this pattern ... which is selected to match other popular syntaxes including XML" rather than "refer to the XML definition."

Again, the important reference is the W3C Best Practices document, specifically the section on plain name fragments. It just so happens that that spec references the XML Namespaces specification, so there is a connection there. But the motivation here is not "align with XML." It is "align with W3C Best Practices regarding fragments. (I think it would be fine to expand that to Unicode because it's 2022 FFS).

The reference to XML Namespace's "NCName" production only makes sense in the context of "We are following the WC3 Best Practices for plain name fragments, which suggest using the NCName production from the XML namespace spec." (tidied up for clarity regarding what is normative, of course).

The reason to reference the XML Fragment spec is to make it clear what we are trying to do, so that if the ABNF in our spec does not match, someone will be able to tell that that was an unintentional error and not an intentional deviation.

Copy link
Contributor

handrews commented Jun 2, 2022

And given the number of times various people have messed with the plain name syntax description, keeping that external reference (both of them, really) as a correctness check feels valuable to me.

awwright reacted with thumbs up emoji

@jdesrosiers jdesrosiers changed the base branch from draft-next to main July 8, 2022 15:29
Copy link
Member

The draft-next branch has been merged and is now closed. The merge target for this PR has been changed to main. Here are the recommended steps to get your branch reabsed properly.

  1. Make sure your remote for the json-schema-org/json-schema-spec repo is up-to-date. (Example: git fetch upstream).
  2. Rebase your commits onto main. (Example: git rebase --onto upstream/main abcd123~1 (replace abcd123 with the commit hash of the first commit in your PR)).
  3. Force push the rebased branch to your fork. (Example: git push --force origin my-branch).

Copy link
Member

I think for this particular change, I would prefer leaving the text but using a pattern (which uses regex) in the meta-schema. If the syntax really must be specified in some common format, I'd still prefer ABNF over regex, but I think it's overkill.

Copy link
Contributor

@ioggstream this PR doesn't seem to have a consensus around what is needed, and the change in branching has made it difficult to review. If you would still like something done here, could you please file an issue summarizing the options that have come up in this discussion so that we can debate it properly?

Since no one has spoken up in favor of a regex in the text, I'm going to go ahead and close this. If, after discussion in an issue, we agree that we should use a regex for this, it can be re-submitted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /