-
-
Couldn't load subscription status.
- Fork 358
Describe syntax with regex. #1229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Change URIs to IRIs where appropriate
If we change this I think I'd prefer ABNF instead, as it's more common in specifications for describing syntax.
What purpose does this achieve? Do you think this paragraph is actually confusing or could be improved? Could you elaborate on your experience a bit?
Like @gregsdennis points out, if the rationale is to make the description machine-readable, ABNF would be preferred. While we technically could use the ECMA RegExp syntax (they describe the same thing), the ABNF is more accessible to the standards community at large; and this usage of the ECMA syntax is too ambiguous (it lacks the slashes that quote the RegExp, and it lacks start and end anchors that are implicit in ABNF, but not in RegExp... we would want to write an actual line of ECMAScript).
And the implementation guidance should be left in-line; explanations like this are critical for helping readers make connections between related concepts before they move on to the next section. In this case, readers will make the connection that the syntax is the same as in XML; even if they haven't read the XML spec, they're likely to have seen this pattern before.
This regex is not correct.
- it must be anchored with
^and$ - "any number" can be zero, so
+should be* - literal hyphen characters must be at the start or the end of the bracketed character class, so they are not interpreted as part of a range.
Therefore: ^[A-Za-z_][A-Za-z0-9._-]*$
I don't have preferences between abnf and regexp.
Here and elsewhere in this document, both are ok as long as it's not in prose :)
we would want to write an actual line of ECMAScript
+1 Since this spec doesn't use ABNF, I used ecma to avoid adding normative references.
About the reference to xml, I think it does not help that much since the regexp is easy.
In general I find this document very long: this results in readers just skipping this and just look for secondary sources :)
@ioggstream the point of the reference to XML is that the WC3 Best Practices for Fragment Identifiers and Media Type Definitions (which is also cited elsewhere in JSON Schema Core) references the XML NCName production as part of their best practices for plain name frag ids (see the large quote below for details).
There is more going on in this section than just how to represent the syntax (I agree with @gregsdennis and @awwright that ABNF is preferable). The point of this part of the spec is to tie JSON Schema's plain name fragments to the larger standards ecosystem of such things. That could be done more clearly and concisely, but it should not be removed.
Plain names are a common type of fragid structure. A plain name fragid is a fragid that is used to identify a named structure within a document, such as one identified by an @id attribute in HTML, a @xml:id attribute in XML or the name of a function within a Python program. These fragids are opaque to processors and as such they do not normally include punctuation characters, though this depends on the language: in XML, for example, they usually match the NCName production from XML Namespaces [XML-NAMES11] which means they can contain hyphens (-) and periods (.).
Plain name fragids are usually created by human authors but may also be generated by applications. They provide a good method of identifying content that is equivalent across content-negotiated variants of a document, for example paragraphs of text in French and Chinese that contain the same semantic content. Plain name fragids that do not identify a portion of a document are frequently used in Semantic Web applications as a way of providing an identifier for something described by the document.
Best Practice 3: Reserve Plain Name Fragids
If the media type includes structures that can be given local names or identifiers, plain name fragids should be reserved for addressing those structures.
... e WC3 Best Practices for Fragment Identifiers and Media Type Definitions (which is also cited elsewhere in JSON Schema Core) references the XML NCName
Ok, it wasn't clear to me when reading the section, and I think it should be clarified e.g. in a referenced Appendix (e.g. JSON Schema and XML Schema ...) so that the reader can better identify normative parts.
@ioggstream I don't think there's any need for an appendix here. All this section needs to do is:
xrefthe Best Practices document to explain why this syntax (XML's NCName) is used (not currently done in this section)xrefthe XML Namesaces spec for the normative NCName production (currently done, just not clear why or whether it's normative)- Reproduce the NCName ABNF in a simplified form (without the layers of other ABNF that are irrelevant for JSON Schema) for informative convenience (currently done in prose, and presented as normative)
- (not currently done in this section)
We do discuss it elsewhere though, in https://json-schema.org/draft/2020-12/json-schema-core.html#rfc.section.5.
just not clear why or whether it's normative
XML technically supports Unicode characters and for ease of getting the first draft out we opted not to support that. We can change that now.
Even if XML only specified the same ASCII set that we do, it might still be easier to explain "use this pattern ... which is selected to match other popular syntaxes including XML" rather than "refer to the XML definition."
We do discuss it elsewhere though
yes, that's why I said "in this section" 🙂
Even if XML only specified the same ASCII set that we do, it might still be easier to explain "use this pattern ... which is selected to match other popular syntaxes including XML" rather than "refer to the XML definition."
Again, the important reference is the W3C Best Practices document, specifically the section on plain name fragments. It just so happens that that spec references the XML Namespaces specification, so there is a connection there. But the motivation here is not "align with XML." It is "align with W3C Best Practices regarding fragments. (I think it would be fine to expand that to Unicode because it's 2022 FFS).
The reference to XML Namespace's "NCName" production only makes sense in the context of "We are following the WC3 Best Practices for plain name fragments, which suggest using the NCName production from the XML namespace spec." (tidied up for clarity regarding what is normative, of course).
The reason to reference the XML Fragment spec is to make it clear what we are trying to do, so that if the ABNF in our spec does not match, someone will be able to tell that that was an unintentional error and not an intentional deviation.
And given the number of times various people have messed with the plain name syntax description, keeping that external reference (both of them, really) as a correctness check feels valuable to me.
The draft-next branch has been merged and is now closed. The merge target for this PR has been changed to main. Here are the recommended steps to get your branch reabsed properly.
- Make sure your remote for the
json-schema-org/json-schema-specrepo is up-to-date. (Example:git fetch upstream). - Rebase your commits onto
main. (Example:git rebase --onto upstream/main abcd123~1(replaceabcd123with the commit hash of the first commit in your PR)). - Force push the rebased branch to your fork. (Example:
git push --force origin my-branch).
I think for this particular change, I would prefer leaving the text but using a pattern (which uses regex) in the meta-schema. If the syntax really must be specified in some common format, I'd still prefer ABNF over regex, but I think it's overkill.
@ioggstream this PR doesn't seem to have a consensus around what is needed, and the change in branching has made it difficult to review. If you would still like something done here, could you please file an issue summarizing the options that have come up in this discussion so that we can debate it properly?
Since no one has spoken up in favor of a regex in the text, I'm going to go ahead and close this. If, after discussion in an issue, we agree that we should use a regex for this, it can be re-submitted.
I suggest
Note
The spec already references ECMA regexp