Search code, repositories, users, issues, pull requests...

Copy link

Contributor

@ioggstream ioggstream commented May 16, 2022

I suggest

describing syntax using regexp
remove non-actionable informative content since it distracts the reader from the normative part (it can be added to a FAQ section)

Note

The spec already references ECMA regexp

jdesrosiers and others added 10 commits

June 4, 2021 10:58


 Allow contains to apply to objects as well as arrays

f5f6dec


 Move contains to "other" applicator section

408ec65


 Clarify that "length" applies to objects as well as arrays

40669d8


 Add change log for bhutton-next

b5c4108


 Add to change log "contains" applies to objects

1cec0c2


 Document new branching process ( json-schema-org#1115 )

1ac6282

@Relequestual


 Add Code of Conduct badge

5f783b8


 Add "contains" to keywords that effect "unevaluatedProperties"

13a58c7

@jakubklimek


 Support for IRI references ( json-schema-org#1137 )

685d84e

* Change URIs to IRIs where appropriate


 Describe syntax with regex.

2337ccf

@gregsdennis

Copy link

Member

gregsdennis commented May 16, 2022

If we change this I think I'd prefer ABNF instead, as it's more common in specifications for describing syntax.

@awwright

Copy link

Member

awwright commented May 16, 2022

What purpose does this achieve? Do you think this paragraph is actually confusing or could be improved? Could you elaborate on your experience a bit?

Like @gregsdennis points out, if the rationale is to make the description machine-readable, ABNF would be preferred. While we technically could use the ECMA RegExp syntax (they describe the same thing), the ABNF is more accessible to the standards community at large; and this usage of the ECMA syntax is too ambiguous (it lacks the slashes that quote the RegExp, and it lacks start and end anchors that are implicit in ABNF, but not in RegExp... we would want to write an actual line of ECMAScript).

And the implementation guidance should be left in-line; explanations like this are critical for helping readers make connections between related concepts before they move on to the next section. In this case, readers will make the connection that the syntax is the same as in XML; even if they haven't read the XML spec, they're likely to have seen this pattern before.

@karenetheridge

Copy link

Member

karenetheridge commented May 16, 2022 •

edited

Loading

This regex is not correct.

it must be anchored with ^ and $
"any number" can be zero, so + should be *
literal hyphen characters must be at the start or the end of the bracketed character class, so they are not interpreted as part of a range.

Therefore: ^[A-Za-z_][A-Za-z0-9._-]*$


 fix suggested by Karen

19e0938

Copy link

Contributor Author

ioggstream commented May 17, 2022

I don't have preferences between abnf and regexp.
Here and elsewhere in this document, both are ok as long as it's not in prose :)

we would want to write an actual line of ECMAScript

+1 Since this spec doesn't use ABNF, I used ecma to avoid adding normative references.

About the reference to xml, I think it does not help that much since the regexp is easy.

In general I find this document very long: this results in readers just skipping this and just look for secondary sources :)

Copy link

Contributor

handrews commented May 30, 2022

@ioggstream the point of the reference to XML is that the WC3 Best Practices for Fragment Identifiers and Media Type Definitions (which is also cited elsewhere in JSON Schema Core) references the XML NCName production as part of their best practices for plain name frag ids (see the large quote below for details).

There is more going on in this section than just how to represent the syntax (I agree with @gregsdennis and @awwright that ABNF is preferable). The point of this part of the spec is to tie JSON Schema's plain name fragments to the larger standards ecosystem of such things. That could be done more clearly and concisely, but it should not be removed.

Plain names are a common type of fragid structure. A plain name fragid is a fragid that is used to identify a named structure within a document, such as one identified by an @id attribute in HTML, a @xml:id attribute in XML or the name of a function within a Python program. These fragids are opaque to processors and as such they do not normally include punctuation characters, though this depends on the language: in XML, for example, they usually match the NCName production from XML Namespaces [XML-NAMES11] which means they can contain hyphens (-) and periods (.).

Plain name fragids are usually created by human authors but may also be generated by applications. They provide a good method of identifying content that is equivalent across content-negotiated variants of a document, for example paragraphs of text in French and Chinese that contain the same semantic content. Plain name fragids that do not identify a portion of a document are frequently used in Semantic Web applications as a way of providing an identifier for something described by the document.

Best Practice 3: Reserve Plain Name Fragids

If the media type includes structures that can be given local names or identifiers, plain name fragids should be reserved for addressing those structures.

Copy link

Contributor Author

ioggstream commented Jun 1, 2022

... e WC3 Best Practices for Fragment Identifiers and Media Type Definitions (which is also cited elsewhere in JSON Schema Core) references the XML NCName

Ok, it wasn't clear to me when reading the section, and I think it should be clarified e.g. in a referenced Appendix (e.g. JSON Schema and XML Schema ...) so that the reader can better identify normative parts.

Copy link

Contributor

handrews commented Jun 1, 2022 •

edited

Loading

@ioggstream I don't think there's any need for an appendix here. All this section needs to do is:

xref the Best Practices document to explain why this syntax (XML's NCName) is used (not currently done in this section)
xref the XML Namesaces spec for the normative NCName production (currently done, just not clear why or whether it's normative)
Reproduce the NCName ABNF in a simplified form (without the layers of other ABNF that are irrelevant for JSON Schema) for informative convenience (currently done in prose, and presented as normative)

@awwright

Copy link

Member

awwright commented Jun 2, 2022

(not currently done in this section)

We do discuss it elsewhere though, in https://json-schema.org/draft/2020-12/json-schema-core.html#rfc.section.5.

just not clear why or whether it's normative

XML technically supports Unicode characters and for ease of getting the first draft out we opted not to support that. We can change that now.

Even if XML only specified the same ASCII set that we do, it might still be easier to explain "use this pattern ... which is selected to match other popular syntaxes including XML" rather than "refer to the XML definition."

Copy link

Contributor

handrews commented Jun 2, 2022

@awwright

We do discuss it elsewhere though

yes, that's why I said "in this section" 🙂

Even if XML only specified the same ASCII set that we do, it might still be easier to explain "use this pattern ... which is selected to match other popular syntaxes including XML" rather than "refer to the XML definition."

Again, the important reference is the W3C Best Practices document, specifically the section on plain name fragments. It just so happens that that spec references the XML Namespaces specification, so there is a connection there. But the motivation here is not "align with XML." It is "align with W3C Best Practices regarding fragments. (I think it would be fine to expand that to Unicode because it's 2022 FFS).

The reference to XML Namespace's "NCName" production only makes sense in the context of "We are following the WC3 Best Practices for plain name fragments, which suggest using the NCName production from the XML namespace spec." (tidied up for clarity regarding what is normative, of course).

The reason to reference the XML Fragment spec is to make it clear what we are trying to do, so that if the ABNF in our spec does not match, someone will be able to tell that that was an unintentional error and not an intentional deviation.

Copy link

Contributor

handrews commented Jun 2, 2022

And given the number of times various people have messed with the plain name syntax description, keeping that external reference (both of them, really) as a correctness check feels valuable to me.

@jdesrosiers jdesrosiers changed the base branch from draft-next to main

July 8, 2022 15:29

Copy link

Member

jdesrosiers commented Jul 8, 2022

The draft-next branch has been merged and is now closed. The merge target for this PR has been changed to main. Here are the recommended steps to get your branch reabsed properly.

Make sure your remote for the json-schema-org/json-schema-spec repo is up-to-date. (Example: git fetch upstream).
Rebase your commits onto main. (Example: git rebase --onto upstream/main abcd123~1 (replace abcd123 with the commit hash of the first commit in your PR)).
Force push the rebased branch to your fork. (Example: git push --force origin my-branch).

@gregsdennis

Copy link

Member

gregsdennis commented Aug 1, 2022

I think for this particular change, I would prefer leaving the text but using a pattern (which uses regex) in the meta-schema. If the syntax really must be specified in some common format, I'd still prefer ABNF over regex, but I think it's overkill.