Re: RDF Datasets with provenance data from Chris Mungall on 2016年09月23日 (semantic-web@w3.org from September 2016)

From: Chris Mungall <cjmungall@lbl.gov>
Date: 2016年9月23日 16:47:24 -0700
To: "Michel Dumontier" <michel.dumontier@gmail.com>
Cc: "Sebastian Hellmann" <hellmann@informatik.uni-leipzig.de>, "Mark Wallace" <mwallace@modusoperandi.com>, "David Booth" <david@dbooth.org>, "Kay Müller" <kay.mueller@informatik.uni-leipzig.de>, "semantic-web@w3.org" <semantic-web@w3.org>, "Johannes Frey" <frey@informatik.uni-leipzig.de>
Message-ID: <8B97EE6C-00A2-44CD-A257-9938EBA7D259@lbl.gov>

There is also the Wikidata approach:
https://meta.wikimedia.org/wiki/Wikidata/Development/RDF#Statements_with_qualifiers
This paper compares different approaches:
http://ceur-ws.org/Vol-1457/SSWS2015_paper3.pdf
On 23 Sep 2016, at 15:14, Michel Dumontier wrote:
> Hi Sebastian,
> Bio2RDF provides its data in nquads, in which the graph name is
> annotated with dataset metadata.
> see http://download.bio2rdf.org/release/3/drugbank/ , where the .nq
> file is the provenance data as an example
>
> m.
> Michel Dumontier
> Associate Professor of Medicine (Biomedical Informatics), Stanford 
> University
> Chair, W3C Semantic Web for Health Care and the Life Sciences Interest 
> Group
> http://dumontierlab.com
>
> On Fri, Sep 23, 2016 at 2:58 PM, <hellmann@informatik.uni-leipzig.de> 
> wrote:
>> Hi David and Mark,
>> both your answer were not helpful, sorry.
>> We are looking for triple datasets that have Metadata, i.e. 
>> serialized
>> downloadable files in any format (N3, nquad, trix, etc) that come 
>> with
>> sensible metadata (provenance, last updated/update frequncy) or as an
>> alternative triples converted from a legacy source where we could 
>> extend the
>> extractor software easily to spew out useful metadata per triple.
>>
>> An example would be the datasets in the meta section here:
>> http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/
>>
>> Thanks,
>> Sebastian
>>
>> Am 23. September 2016 17:16:43 MESZ, schrieb Mark Wallace
>> <mwallace@modusoperandi.com>:
>>>
>>> I like David's guidance.
>>>
>>> We have projects which require provenance on individual 
>>> facts/triples (as
>>> opposed to groups of them). As David mentions, one alternative is 
>>> to use a
>>> named graph for each triple (it acts like a statement ID in this 
>>> case). An
>>> alternative is to use RDF Reification[1] to create a statement ID 
>>> (resource)
>>> to which provenance can be "attached." The reification approach 
>>> requires
>>> lots more triples, but it has the advantage in our case of leaving 
>>> named
>>> graphs for other uses. In such cases, provenance triples can be 
>>> 10x larger
>>> than the data set. For performance reasons, we sometimes put the 
>>> provenance
>>> triples in a separate repository/store, and query/join them (using 
>>> federated
>>> queries) only when the provenance is needed.
>>>
>>> [1] https://www.w3.org/TR/rdf11-mt/#whatnot
>>>
>>> --
>>> Mark Wallace
>>> PRINCIPAL ENGINEER, SEMANTIC APPLICATIONS
>>> MODUS OPERANDI,
>>> INC.
>>>
>>> -----Original Message-----
>>> From: David Booth [mailto:david@dbooth.org]
>>> Sent: Friday, September 23, 2016 10:45 AM
>>> To: Kay Müller <kay.mueller@informatik.uni-leipzig.de>;
>>> semantic-web@w3.org
>>> Cc: Johannes Frey <frey@informatik.uni-leipzig.de>; Sebastian 
>>> Hellmann
>>> <hellmann@informatik.uni-leipzig.de>
>>> Subject: Re: RDF Datasets with provenance data
>>>
>>> On 09/23/2016 10:07 AM, Kay Müller wrote:
>>>>
>>>> Dear Sir/Madam,
>>>>
>>>> My name is Kay Mueller and I am a researcher at the University of
>>>> Leipzig. Currently we are planing to evaluate whether it is 
>>>> feasible
>>>> to store provenance and meta data for each triple in a graph, 
>>>> hence we
>>>> are wondering whether you are aware of any dataset which either 
>>>> stores
>>>> data at the triple level or which could be converted into this 
>>>> format
>>>> (e.g.
>>>>
>>>> Yago, Wikidata).
>>>
>>>
>>> The usual technique for associating provenance or other metadata 
>>> with
>>> certain triples is to put those triples into a named graph, and make 
>>> the
>>> provenance/metadata assertions about that named graph. A named 
>>> graph can
>>> hold any number of triples, so it could hold a single triple if you 
>>> want to
>>> be that fine grained. But triples are not usually created 
>>> individually --
>>> they are usually created in bunches -- so for efficiency one would 
>>> usually
>>> create a named graph containing multiple triples that all have the 
>>> same
>>> provenance.
>>>
>>> All major "triplestores" -- quad stores really -- and SPARQL servers
>>> support named graphs.
>>>
>>> David Booth
>>>
>>>>
>>>> We would be very grateful, if you could give us any pointers to
>>>> datasets, related work, etc.
>>>>
>>>> Thank you very much in advance.
>>>> --
>>>> Kind
>>>> regards / Mit freundlichem Gruß
>>>>
>>>> Kay Müller
>>>>
>>>> AKSW/KILT <http://aksw.org/Groups/KILT.html>
>>>> Office: InfAI e.V., Hainstr. 11, Room 101a, 04109 Leipzig, 
>>>> Germany
>>>> Homepage: http://aksw.org/KayMueller.html My Twitter
>>>> <https://twitter.com/mullekay> My LinkedIn
>>>> <https://de.linkedin.com/in/mullerkay> My Xing
>>>> <https://www.xing.com/profile/Kay_Mueller12> My GitHub
>>>> <https://github.com/mullekay> My Google Scholar
>>>> <https://scholar.google.de/citations?user=8tFijv0AAAAJ>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail 
>> gesendet.
>

Received on Friday, 23 September 2016 23:48:00 UTC