NIF ITS roundtripping (Re: How to put an annotation in HTML?)

Hi Sebastian, all,
coming back to an old thread.
Am 26.04.13 20:57, schrieb Felix Sasaki:
> Am 26.04.13 17:15, schrieb Sebastian Hellmann:
>> Hi Denny,
>> they are just several months away of becoming a recommendation, so it 
>> will happen soon. They are starting implementation within some weeks.
>> For exact details you would have to ask the mailing list or just wait 
>> for a while ;)
>>
>> There should be an xslt stylesheet somewhere, that retrieves NIF RDF 
>> from ITS within HTML.
>
> Thanks for the ping, Sebastian - you encouraged me to finally put that 
> online. See
> http://www.w3.org/People/fsasaki/its20-general-processor/tools/its-ta-2-nif.xsl
Above is now updated to do better white space handling. There is now 
also a stylesheet to go back from NIF to an HTML document and generate 
its-ta-ident-ref etc.
How to use this
1) Sample input doc
http://www.w3.org/People/fsasaki/its20-general-processor/sample/nif-conversion/inputfile-without-ta-annotations.html
2) Output of generating NIF from 1), and of generating entity 
annotations in the NIF wrapper (here done manually)
http://www.w3.org/People/fsasaki/its20-general-processor/sample/nif-conversion/its-ta-2-nif-output.rdf
3) XSLT Stylesheet to go back from 2) to 1) and to add the entity 
annotations to the HTML
http://www.w3.org/People/fsasaki/its20-general-processor/tools/nif-2-its-ta.xsl
4) Output of 3)
http://www.w3.org/People/fsasaki/its20-general-processor/sample/nif-conversion/nit-2-its-ta-output.html
with some javascript to show the annotations.
Comments welcome. At Sebastian: the NIF RDF/XML is not yet up to date 
wrt to the comments you gave during the MWL-LT f2f call 8 May, I'll do 
that later.
Felix
> with some mini documentation in the stylesheet and a sample 
> transformation of an HTML document
> http://www.w3.org/People/fsasaki/its20-general-processor/sample/nif-conversion/inputfile.html
> here:
> http://tinyurl.com/clwd64n
> I think it provides the right triples http://tinyurl.com/btkvkvy
>
> Let me know if you need more. I saw that in this thread there was also 
> discussion about "term annotation" - this table
> http://www.w3.org/TR/its20/#textAnalysis-info-pieces
> and the note below the table might be helpful for you as well.
>
>
> Felix
>
>>
>> All the best,
>> Sebastian
>>
>>
>> Am 26.04.2013 16:05, schrieb Denny Vrandečić:
>>> Sebastian,
>>>
>>> thanks! its-ta-ident-ref is perfect! That's exactly what I have been 
>>> looking for.
>>>
>>> Only drawbacks are, that it is not a Recommendation yet (what's the 
>>> timeline here?), but that's not so terrible, and that this is the 
>>> possibly worst attribute name I have seen so far in HTML.
>>>
>>> Still, that's what I am going to use! Thanks,
>>> Cheers,
>>> Denny
>>>
>>>
>>>
>>>
>>>
>>> 2013年4月26日 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de 
>>> <mailto:hellmann@informatik.uni-leipzig.de>>
>>>
>>> Hi John and Denny,
>>> the problem is well known and RDFa has its limits. Please see
>>> the new ITS 2.0 spec [1], which provides a solution for this.
>>> ITS 2.0 will likely be widely adopted by CMS and translation
>>> industry and it has an RDF transition using NIF[2] .
>>>
>>> @Denny: For your request RDFa should be fine, if you just want
>>> to include:
>>> <http://sws.geonames.org/4951788>
>>> <http://sws.geonames.org/4951788> a owl:Thing .
>>>
>>> Note that the resulting RDF does not contain any provenance
>>> information, so I am unsure, whether calling it an "annotation"
>>> is appropriate. It is rather an inclusion of extra triples in HTML.
>>> You are loosing any reference to "Springfield" as RDFa parsers
>>> don't support this.
>>> Turtle in HTML would also be an easy option:
>>> http://www.w3.org/TR/turtle/#xhtml
>>>
>>> ITS 2.0 example:
>>> <p>It is well known, that <span
>>> its-ta-ident-ref="http://sws.geonames.org/4951788"
>>> <http://sws.geonames.org/4951788> >Springfield</span> has mild
>>> summers and short, but hard winters.</p>
>>> NIF:
>>> ...
>>> <http://example.com/doc.html#xpath(/p[1]/span[1]/text()[1])>
>>> <http://example.com/doc.html#xpath%28/p[1]/span[1]/text%28%29[1]%29>
>>>
>>> itsrdf:xpath2nif <http://example.com/doc.html#char=23,34>
>>> <http://example.com/doc.html#char=23,34> .
>>> <http://example.com/doc.html#char=23,34>
>>> <http://example.com/doc.html#char=23,34>
>>> rdf:type nif:RFC5147String ;
>>> itsrdf:taIdentRef <http://sws.geonames.org/4951788>
>>> <http://sws.geonames.org/4951788> ;
>>> ...
>>>
>>> Well, NIF is more for natural language processing tools and
>>> middleware, so it's overkill for just including the occasional
>>> triple now and then ...
>>>
>>> All the best,
>>> Sebastian
>>>
>>>
>>>
>>> [1] http://www.w3.org/TR/its20/
>>> [2] http://www.w3.org/TR/its20/#conversion-to-nif
>>>
>>> Am 24.04.2013 22 <tel:24.04.2013%2022>:08, schrieb John Flynn:
>>>>
>>>> I have long thought that a clean and simple method for
>>>> identifying terms in HTML that are instances of a specific
>>>> ontology would be a very valuable adjunct to the growth of the
>>>> Semantic Web. A number of years ago I proposed an approach to a
>>>> solution I called Instance Markup Language (1) which gained no
>>>> traction. The consensus at the time was that RDFa would provide
>>>> the solution for this need and also that it wasn't really
>>>> important because the great bulk of instance data would come
>>>> from large data bases and not from HTML. I don't think RDFa has
>>>> in fact provided a "clean and simple" way to identify specific
>>>> terms in HTML text and link those terms to classes or
>>>> properties in a specific ontology. I never thought my proposed
>>>> approach was exactly right, but I did have hope it would
>>>> inspire someone come forward with a similar, but cleaner, way
>>>> to do this. Even though the subject still occasionally come up,
>>>> after all these years it's pretty clear I was wrong about this
>>>> being an important component of Semantic Web technology.
>>>>
>>>> (1) http://mysite.verizon.net/jflynn12/IML.htm
>>>>
>>>> *From:*Denny Vrandečić [mailto:denny.vrandecic@wikimedia.de]
>>>> *Sent:* Wednesday, April 24, 2013 1:59 PM
>>>> *To:* semantic-web at W3C
>>>> *Subject:* How to put an annotation in HTML?
>>>>
>>>> Sorry, probably a stupid questions:
>>>>
>>>> Let us say, I have some HTML like this...
>>>>
>>>> <p>It is well known, that Springfield has mild summers and
>>>> short, but hard winters.</p>
>>>>
>>>> And now, for example in order to simplify extraction, I want to
>>>> annotate Springfield with an URI, maybe like this, to make sure
>>>> that the computer understands I mean the Springfield
>>>> in Massachusetts:
>>>>
>>>> <p>It is well known, that <span
>>>> about="http://sws.geonames.org/4951788/">Springfield</span> has
>>>> mild summers and short, but hard winters.</p>
>>>>
>>>> How do I actually do that?
>>>>
>>>> Mind you, I don't want to add whole triples, but just annotate
>>>> the HTML and say "this element refers to the following URI".
>>>>
>>>> Cheers,
>>>>
>>>> Denny
>>>>
>>>> -- 
>>>> Project director Wikidata
>>>> Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
>>>> Tel. +49-30-219 158 26-0 <tel:%2B49-30-219%20158%2026-0> |
>>>> http://wikimedia.de
>>>>
>>>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien
>>>> Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts
>>>> Berlin-Charlottenburg unter der Nummer 23855 B. Als
>>>> gemeinnützig anerkannt durch das Finanzamt für Körperschaften I
>>>> Berlin, Steuernummer 27/681/51985 <tel:27%2F681%2F51985>.
>>>>
>>>
>>>
>>> -- 
>>> Dipl. Inf. Sebastian Hellmann
>>> Department of Computer Science, University of Leipzig
>>> Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
>>> http://dbpedia.org/Wiktionary , http://dbpedia.org
>>> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
>>> Research Group: http://aksw.org
>>>
>>>
>>>
>>>
>>> -- 
>>> Project director Wikidata
>>> Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
>>> Tel. +49-30-219 158 26-0 | http://wikimedia.de
>>>
>>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens 
>>> e.V. Eingetragen im Vereinsregister des Amtsgerichts 
>>> Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig 
>>> anerkannt durch das Finanzamt für Körperschaften I Berlin, 
>>> Steuernummer 27/681/51985.
>>
>>
>> -- 
>> Dipl. Inf. Sebastian Hellmann
>> Department of Computer Science, University of Leipzig
>> Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, 
>> Deadline: *July 8th*)
>> Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
>> http://dbpedia.org/Wiktionary , http://dbpedia.org
>> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
>> Research Group: http://aksw.org
>

Received on Thursday, 16 May 2013 14:54:15 UTC

AltStyle によって変換されたページ (->オリジナル) /