-
Notifications
You must be signed in to change notification settings - Fork 136
request JSON-LD from Link rel=alternate #129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
JulienPalard
commented
Oct 9, 2020
This PR does not fixes the following test:
from pyld import jsonld
jsonld.expand(
{
"@context": "http://schema.org/",
"@type": "Person",
"name": "Jane Doe",
"jobTitle": "Professor",
"telephone": "(425) 123-4567",
"url": "http://www.janedoe.com",
}
)
because https://schema.org gives Content-Type: text/html, which is in headers["Accept"]:
(Pdb) p headers["Accept"] 'application/ld+json;profile=http://www.w3.org/ns/json-ld#context, application/ld+json, application/json;q=0.5, text/html;q=0.8, application/xhtml+xml;q=0.8'
it's here due to pyld/jsonld.py adding it:
6573 # FIXME: only if html5lib loaded? 6574 headers['Accept'] = headers['Accept'] + ', text/html;q=0.8, application/xhtml+xml;q=0.8'
According to json-ld:
A processor seeing a non-JSON result will note the presence of the link header and load that document instead.
So even if we accept HTML, if it's HTML and there's a json-ld alternative, let's use it.
To do this, I'll move doc['document'] = response.json() right before returning, and just drop the if content_type not in headers['Accept']: you added, it works for me.
In other words (and with a correct indentation) I mean:
--- a/lib/pyld/documentloader/requests.py +++ b/lib/pyld/documentloader/requests.py @@ -69,7 +69,6 @@ def requests_document_loader(secure=False, **kwargs): 'contentType': content_type, 'contextUrl': None, 'documentUrl': response.url, - 'document': response.json() if content_type in headers['Accept'] else None } link_header = response.headers.get('link') if link_header: @@ -77,15 +76,15 @@ def requests_document_loader(secure=False, **kwargs): LINK_HEADER_REL) # only 1 related link header permitted if linked_context and content_type != 'application/ld+json': - if isinstance(linked_context, list): - raise JsonLdError( - 'URL could not be dereferenced, ' - 'it has more than one ' - 'associated HTTP Link Header.', - 'jsonld.LoadDocumentError', - {'url': url}, - code='multiple context link headers') - doc['contextUrl'] = linked_context['target'] + if isinstance(linked_context, list): + raise JsonLdError( + 'URL could not be dereferenced, ' + 'it has more than one ' + 'associated HTTP Link Header.', + 'jsonld.LoadDocumentError', + {'url': url}, + code='multiple context link headers') + doc['contextUrl'] = linked_context['target'] linked_alternate = parse_link_header(link_header).get('alternate') # if not JSON-LD, alternate may point there if (linked_alternate and @@ -93,9 +92,8 @@ def requests_document_loader(secure=False, **kwargs): not re.match(r'^application\/(\w*\+)?json$', content_type)): doc['contentType'] = 'application/ld+json' doc['documentUrl'] = prepend_base(url, linked_alternate['target']) - if content_type not in headers['Accept']: - # Original was not JSON/JSON-LD; fetch linked JSON-LD - return loader(doc['documentUrl'], options=options) + return loader(doc['documentUrl'], options=options) + doc['document'] = response.json() return doc except JsonLdError as e: raise e
Hmm, that's a bit odd. Yes, Pyld adds text/html (in the line you pointed out, i.e. headers['Accept'] = headers['Accept'] + ', text/html;q=0.8, application/xhtml+xml;q=0.8') but that's a default header value and it shouldn't affect the server response as long as application/json+ld is included in the Accept header value with a higher precedence. That check
if content_type not in headers['Accept']: # Original was not JSON/JSON-LD; fetch linked JSON-LD return loader(doc['documentUrl'], options=options)
is important, it checks whether the server responded with application/json+ld, and if not, attempts to fetch the linked resource. There's no value doing this if the response is already JSON-LD.
JulienPalard
commented
Oct 11, 2020
it shouldn't affect the server response as long as
application/json+ldis included in theAcceptheader value with a higher precedence.
Totally agree. But looks like https://schema.org/ does not have a application/json-ld variant at all, so it always reply with text/html, independently of the Accept header. But this text/html response links to the ld+json, see:
$ curl -I -H "Accept: application/ld+json" https://schema.org
HTTP/2 200
[...]
link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json"
[...]
content-type: text/html
[...]
Which looks legit to me, even though I still didn't read and understood rfc8288 entierly.
That check
if content_type not in headers['Accept']: # Original was not JSON/JSON-LD; fetch linked JSON-LD return loader(doc['documentUrl'], options=options)is important, it checks whether the server responded with
application/json+ld, and if not, attempts to fetch the linked resource. There's no value doing this if the response is already JSON-LD.
IIRC It looks already covered by the:
not re.match(r'^application/(\w*+)?json$', content_type)):
check a few line before.
Ah I see. Sorry it's been a while since I've looked at this. I think your version of the PR makes sense, and as long as both our tests pass, it gets my vote.
JulienPalard
commented
Oct 11, 2020
and as long as both our tests pass, it gets my vote.
I'd like to add both tests to the test suite, but I don't understand how to do so. If someone can explain before merging I'd gladly do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the braces are redundant.
Uh oh!
There was an error while loading. Please reload this page.
Ref #128
With the following test cases defined:
context.jsonldmanifest.jsonldsample.jsonldsample2.jsonldoutput.jsonldThe tests both fail before the changes. The tests both passs after the changes. Since all the test cases of this repository are remote, I am not sure whether or where to contribute these test cases.
However, there is one regression, which I do not currently know how to resolve. This test does not have an error before the changes, and does have an error after the changes. The report is:
All other tests are unaffected. However, as mentioned in #128, running the tests as described with no changes elicits 5 failures, 2 errors, and 34 skipped tests.