-
-
Couldn't load subscription status.
- Fork 358
$schema can change across embedded resources #914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jsonschema-core.xml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the wording of "subschemas" is a bit confusing to me. if I can attempt to clarify with an example
$schema: draftN $id: root items: $schema: draftM $id: items
items is a root schema object because it's got an $id. but is items no longer a "subschema" of the root? I feel like saying it's not a subschema isn't consistent with how the term subschema is used in the rest of the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "non-root schemas"? Or just "other schemas"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(IMO) "subschema" only has context when in reference to another schema. the schema at id "items" is a subschema of the schema at id "root". The schema at id "root" is not a subschema of anything (it has no parent schema).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically, we've used "subschema" to indicate containment but not reference. Referenced schemas have not been historically labelled "subschemas."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it comes down to how we define a "schema document." Is that the specific file, and that's it? Or is it the file, and all of its external references? (This pertains to the change below as well.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Some implementations provide an interface to extract these - either as multiple documents or "bundled" together in one (potentially renaming conflicting $refs if needed). In one of my web apps I do this in a GET /json_schema/:schema_name endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karenetheridge yeah, the bundling use case was ultimately what we decided use when figuring out what to do with $id (splitting the $anchor case out and cutting a bunch of nonsensical but syntactically legal values). And that led to the idea of $id as identifying resources as opposed to just random otherwise unremarkable schema objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be useful to have a name for that "document plus all external references, transitively" concept
I would call this a "trancluded de-referenced bundle".
- Transclusion is what is done to the schemas
- De-referenced is the result of the process
- Bundle is the end product descriptor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Relequestual I'll think on this. Does it need to go in now or can we file an issue for this terminology? If we adopt it (and I'm cautiously supportive), it should probably go in everywhere and I'd rather not add all of that in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I've gone back over all of this. I am about to push a commit to address @notEthan's original question about the usage of "subschema" (I agree it is unclear).
For the transcluded de-reference bundle thing, I have filed issue #935. Note that the discussion involved a lot more than the bundle use case, so it really needs to be discussed separately from this PR. We can add more terminology later, assuming my most recent commit addresses the actual confusion in the PR text.
this seems to contradict the resolution of the schemas which describe a schema, as defined by the metaschema.
oof, this is hard to pick all the right words to describe properly. I'll try not to get it too wrong.
referring to this schema with two specifications (draftN, draftM) describing bits of it:
$id: bar $schema: draftN $defs: foo: $id: foo $schema: draftM
when I refer to the object at #/$defs/foo - before I know what it is or that it's even a schema - I start at the root (#), described by the draftN metaschema. in there I see that #/properties/$defs/additionalProperties is a reference resolving to draftN itself, so the schema #/$defs/foo is an instance of the draftN metaschema. but I have a $schema saying it is draftM. the metaschema has been made incorrect.
maybe the metaschema needs to change in some manner to allow either a schema which instantiates the metaschema itself, or a thing with a $schema which does not instantiate the metaschema.
$id: draftN properties: $defs: additionalProperties: oneOf: - $recursiveRef: '#' - required: ['$schema']
that's not perfect; it doesn't actually say what kind of thing the object with a $schema is. but I think it's at least an improvement in that it's not giving an incorrect reference to a schema (the draftN metaschema) which does not apply to the instance (the draftM schema).
and you can't do a $ref to the $schema since that's in the instance (the schema instantiating the metaschema), and schemas (or metaschemas) can't reference instance data.
except, of course, the keyword $schema is only defined for a schema, so if the subschema takes the second oneOf option in my weird modification above (where it has a $schema but isn't an instance of the metaschema), it's not recognized as a schema at all and $schema has no meaning. I think I must retract that idea ...
in order for the implementation to recognize that subschema #/$defs/foo is an instance of the draftM metaschema, it first has to be recognized as an instance of the draftN metaschema, and then change to no longer be that.
@notEthan What you learn about #/$defs/foo from the draftN schema is that it is a schema object.
ALL meta-schemas, even if they don't explicitly list it, ALWAYS include the core vocabulary. This is in part because the core vocabulary is the bootstrapping vocabulary.
Processing a schema should always start with a check for $id, and if present, next a check for $schema. That is under the rules of the core vocabulary so it is correct regardless of the meta-schema.
$schema basically executes a "switch rules" and evaluation continues.
If that doesn't help, think of it this way: We defined $id-containing subschemas to be a schema resource, just as if it were a separate document. When it is a separate document, you already had to check for a $schema after resolving the reference.
If we're supporting changing meta-schemas while following a $ref in the middle of processing, we can do it when the schema is inline. There's no real difference.
Processing a schema should always start with a check for $id, and if present, next a check for $schema
Two counter-points to this:
$idis not required, even at the schema root - but$schemacan still occur there (yes? my understanding was that $schema can only occur in subschemas where there is an$id, but either or both of$schemaand$idcan appear at the root)$idused to be known asidin earlier drafts, so one might have to peek at$schemafirst to know whetheridor$idshould be looked at
If you're at the document root you don't need those rules to figure out if it's a resource root because you already know it's the document root- that's why $id is not required. The check for $id is how you tell if a subschema is a resource root.
So techincally not all schemas, true 😛
Regarding id, ugh. I hate draft-04. I would not be averse to saying that you can't use draft-04 or earlier in an embedded resource. That's not a horrible restriction. OpenAPI doesn't use id or $id so they won't be embeddable exactly as-is either.
Otherwise I'd say implementations would have to opt-in to supporting id as a resource identifier because it's not acceptable to reserve id in other draft's rules.
@karenetheridge thinking about id and draft-04, if we wanted to we could require a check for $id and/or $schema, and if $schema is present and one of the standard draft-04 or earlier meta-schemas (regular or hyper-schema), check for an id. This is convoluted and annoying but could be made to work without otherwise reserving id. It would not work with custom draft-04 or earlier meta-schemas, but custom meta-schemas were never all that useful in draft-04 and at some point we have to give up on cobbling together support for things that old. id without the dollar was an endless source of confusing given the prominence of properties named "id".
Hang on... is this explicitly allowing $schema to be used internally (not at the root), or is this only across an external reference boundary?
@notEthan's comment suggests the former, and that worries me. I think a single resource should follow a single schema draft, allowing $schema only at the root.
@gregsdennis it can be used in the resource root, which can be "internal" in the sense of not being a document root. But an embedded resource is still a different resource, it's just stuffed into the document. Here is the use case:
I have some large number of schemas. They look like this (assume that they $ref each other somewhere, as well):
{
"$id": "https://example.com/schema/aaa",
"$schema": "https://json-schema.org/draft/2020-06",
...
}{
"$id": "https://example.com/schema/bbb",
"$schema": "http://json-schema.org/draft-06",
...
}{
"$id": "https://example.com/schema/ccc",
"$schema": "http://json-schema.org/draft-07",
...
}etc.
I want to bundle them in a single document for ease of distribution, which (as @karenetheridge notes, is something that there are tools for now). The result would be:
{
"$id": "https://example.com/schema/bundled",
"$schema": "https://json-schema.org/draft/2020-06",
"$defs": {
"aaa": {
"$id": "https://example.com/schema/aaa",
"$schema": "https://json-schema.org/draft/2020-06",
...
},
"bbb": {
"$id": "https://example.com/schema/bbb",
"$schema": "http://json-schema.org/draft-06",
...
},
"ccc": {
"$id": "https://example.com/schema/ccc",
"$schema": "http://json-schema.org/draft-07",
...
}
}
}This should work. The presence of an $id in a non-document-root schema means that that schema is a resource root, and therefore $schema is usable. @karenetheridge identified a problem with supporting draft-04 in such a context, but I think we can sort that out separately.
Note, however, that this is NOT VALID:
{
"$id": "https://example.com/schema/whatever",
"$schema": "https://json-schema.org/draft/2020-06",
"properties": {
"foo": {
"$schema": "http://json-schema.org/draft-07",
...
}
}
}In this example, there is no $id indicating that "#/properties/foo" is a resource root. I would consider it a bad practice to embed a resource there in the first place, but you could if you really wanted to.
I was fairly sure we had an extensive conversation around this stuff but admittedly it would have been quite a while ago. But it's all about the bundling use case. If we need to have the whole discussion on this again then we should do it in slack. PRs are not the place to debate fundamental direction- I wrote a PR because it had been settled.
I think allowing $schema to change across referenced or embedded schemas is absolutely the right way to go, but it occurred to me that meta-schemas can't fully express such a thing. Meta-schema references are recursive. In other words, they reference themselves. This means that the meta-schema will validate sub-schemas the same as the root schema.
{
"$id": "https://example.com/schema1",
"$schema": "http://json-schema.org/draft-06/schema#",
"type": "object",
"properties": {
"foo": {
"$id": "https://example.com/schema2",
"$schema": "http://json-schema.org/draft-07/schema#",
"if": "asdf"
}
}
}When you validate this schema against the draft-06 meta-schema, semantically, it's not a schema, it's just arbitrary JSON. The neither the inner $id nor the inner $schema have any meaning, so the meta-schema doesn't know to validate the if as draft-07.
@jdesrosiers People have in the past asked for a way to restrict the draft of the $ref target as well. There are a couple of options:
- meta-schemas don't capture everything, and once you notice an
$idyou should treat that the same as if you hit a$refand tried to validate the referenced schema, and validate it separately against its own meta-schema. This would formalize the notion of crossing a resource boundary. - Include a special case for root subschemas that, if
$idis present, only further validates$schema, and doesn't check anything else. This also formalizes the resource crossing and indicates that you need to start processing separately. But it would avoid a naive application of the meta-schema to the entire document from causing a failure. - Do some weird
anyOfto however many past meta-schemas we care to support, pinning$schemaon each branch withconst(but this doesn't help with future unknown custom meta-schemas, so it's not really a very good option) - Hope that An alternate approach to meta-schemas #911 or Declare keyword use with $schema #918 produces a better option before we publish 😝
Note that part of the point of #849 is to give a clear description of how to process schemas and meta-schemas. Which is why all of that sort of stuff has been pulled out of where it was scattered all over and consolidated into what's now section 9.
So if we want to formalize stuff around crossing a resource boundary, that's where that goes. And to some degree I'm doing that anyway. If I can ever get back to that issue, which I've been trying to do for 2 weeks now.
For reference, #850 is the issue for this change, and #808 is the $ref-oriented one that @gregsdennis mentioned.
The only reason meta-validation works in my implementation is because I modify the schemas when they are loaded to separate embedded schemas. The schema then get validated separately and there is no problem. But, if I validate that schema I gave before as the instance and meta-schema as the schema, it doesn't work as expected. That's why I don't think just giving guidance on how to process schemas sufficiently solves the problem.
#918 Introduces a way for a schema to declare that a value is a schema without saying what kind of schema. It adds a new type value "schema". In Hyperjump Validation, I'm currently using a new keyword, validation, as a boolean flag to indicate that value is a schema. I'm not fond of either of those options, but they do solve the problem.
I've agreed with one suggested change.
I need a little more time (2 days) to review all of the comments. Broadly I think this is good.
f58a4ee to
1243a8f
Compare
The push just now is a rebase to fix conflicts- nothing else has changed, waiting on Relequestual's feedback.
@jdesrosiers I somehow didn't notice this before:
The only reason meta-validation works in my implementation is because I modify the schemas when they are loaded to separate embedded schemas. The schema then get validated separately and there is no problem. But, if I validate that schema I gave before as the instance and meta-schema as the schema, it doesn't work as expected. That's why I don't think just giving guidance on how to process schemas sufficiently solves the problem.
I'd actually say that's exactly how it should be processed. The embedding is essentially a... I dunno, ?transport layer? convenience. The real unit of schema-ness is the schema resource, which we didn't quiet settle on for 2019-09 b/c the schema resource idea appeared near the end of that process as a solution to other things.
So that makes loading schemas a little more challenging, but once that is done, working with schema resources works just fine. In this way, the schema document with embedded schema resources is not, itself, really a schema. It's a package containing schemas, and the ideal option might be:
- separate documents into schema resources, propagating
$schemavalues when they are not explicitly set - validate schema resources separately based on their own meta-schema
There's something in there about resolving relative $id against base URIs when splitting out embedded schemas, which might make it slightly more complex, but I think a "how to load a schema document" guidance could formalize what you're doing here in a way that doesn't contradict anything else, or require a specific implementation. You could copy the spit schemas, or maintain some sort of external map into the original structures, etc.
Just brainstorming on this.
I can understand @jdesrosiers concerns about meta schema validation.
My feeling on this is we should add requirements to the following:
- When processing a schema document, embedded schema resources (which I think you've covered how they are identified) which provide a different JSON Schema feature set identifier using
$schema, an MUST cause an error to be thrown if the JSON Schema feature set is unknown or not supported. - When processing a schema document with any embedded schema resources, for the purposes of schema validation against meta-schemas (confirming the JSON Schema document is likely to be processable), embedded schema resources SHOULD be validated within their own JSON Schema feature set (using the appropriate meta-schema). For enclosing schema resources (which is likely the document root schema), an embedded resource SHOULD be considered as a valid schema document, with the value of
true, for the purposes of validating the enclosing schema resource as a valid JSON Schema.
I think @jdesrosiers approach in #918 is interesting, but we need a LOT more time to flesh that out, and we have time pressures to deliver THIS draft sooner.
@Relequestual at this point all of the small change requests have been addressed. Regarding the main conversation about how to handle a switched meta-schema:
When processing a schema document, embedded schema resources (which I think you've covered how they are identified) which provide a different JSON Schema feature set identifier using $schema, an MUST cause an error to be thrown if the JSON Schema feature set is unknown or not supported.
JSON Schema has never, in any draft, required an error on an unrecognized meta-schema. As of 2019-09, you can cause an error through $vocabulary in the meta-schema, but there is already language from 2019-09 that preserved the pre-existing behavior when a schema is not recognized. This PR does not change that, so it should be out of scope for the PR.
The key principle here is that $ref-ing a schema resource and embedding it produces functionally identical behavior. (The error reporting / annotation output will look slightly different to show that a $ref was crossed, so it's technically not completely identical, but that's the only difference).
We cannot have a scenario where bundling an external reference as an embedded schema resource changes the behavior from best effort ("I have no idea what this is but I'll pretend it's the standard core+validation and give it a shot") to an error.
When processing a schema document with any embedded schema resources, for the purposes of schema validation against meta-schemas (confirming the JSON Schema document is likely to be processable), embedded schema resources SHOULD be validated within their own JSON Schema feature set (using the appropriate meta-schema). For enclosing schema resources (which is likely the document root schema), an embedded resource SHOULD be considered as a valid schema document, with the value of true, for the purposes of validating the enclosing schema resource as a valid JSON Schema.
I'm not entirely sure that I follow, and I think we should be having this debate in an issue so I'm going to file that. This PR is effectively blocked.
$schema is now definitively resource-scoped rather than document-scoped, as crossing a resource boundary is the same as following a $ref to an external resource.
b6f4b37 to
56288b4
Compare
...but there is already language from 2019-09... - @handrews
Yeah, I mean that should have been obvious. Of course.
I'll take a look at the associated issue in relation to the other parts of your response to avoid further chatter here.
My feeling is @jdesrosiers broadly approved of the suggested change.
It looks like all other comments felt their concerns have been answered (or at leas they haven't followed up.
This is a small change, identifying behaviour which was previously just 🤷♂️ (not defined), so I'm merging it.
Closes #808, closes #850
$schemais now definitively resource-scoped rather thandocument-scoped, as crossing a resource boundary is the same as
following a
$refto an external resource.