- From: Brian McBride <bwm@hplb.hpl.hp.com>
- Date: 25 Jul 2003 09:35:58 +0100
- To: Martin Duerst <duerst@w3.org>
- Cc: "Peter F. "Patel-Schneider <pfps@research.bell-labs.com>, jjc@hplb.hpl.hp.com, Pat Hayes <phayes@ai.uwf.edu>, www-rdf-comments@w3.org, i18n <w3c-i18n-ig@w3.org>
- Message-Id: <1059122157.2201.6.camel@dhcp-91-136.hpl.hp.com>
Thank you Martin, particularly for the specific answer to the question I asked and the references. I tried searching for the answer in the specs myself, but wasn't sure I'd uncovered enough evidence to convince Peter. I also note your broader concerns. However, I think we were trying to nail down precisely the formal semantics of the present design, rather than debate the merits of that design. Thanks again. Brian On Thu, 2003年07月24日 at 21:06, Martin Duerst wrote: > Hello Brian, others, > > At 16:54 03/07/24 +0100, Brian McBride wrote: > >On Thu, 2003年07月24日 at 16:31, Peter F. Patel-Schneider wrote: > > > > So the question boils down to whether octets and Unicode characters are > > > disjoint. > > > >I believe they are. From > > > > http://www.unicode.org/book/uc20ch1.html > > > >[[ > >The character identified by a Unicode code value is an abstract entity, > >such as "LATIN CAPITAL LETTER A" or "BENGALI DIGIT 5". > >]] > > > >i.e. characters are distinct from their encodings. > > > >Martin, Jeremy: confirm? > > > I have looked at > http://www.w3.org/2001/sw/RDFCore/20030123-issues/#pfps-04 > http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0091.html > > and wasn't sure why the question below is relevant for adressing issue pfps-04. > > Based on a conversation with Brian that I had a week or two ago, > I suspect that it may be related to some technical issue of how > to distinguish between the values of plain literals, string, and > XML literals. Looking at > http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0064.html > seems to confirm this suspicion: > > >>>>>>>> > Peter: > > > > Therefore for the RDF entailment rules to be complete, no XML > Literal can > > > > have a character string as its denotation. > > Brian: > > > Right. The denotation of an XML Literal is an octet sequence, as > > > defined by the xml canonicalization spec, see the note in: > > > > > > > > > > http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-XMLLi > teral > > Peter: > > Unfortunately this does not answer the question. Octet sequence is > > undefined in http://www.w3.org/TR/2002/REC-xml-exc-c14n-20020718/. At > > least some places in this document appear to indicate that an octet > > sequence is just a sequence of (Unicode?) characters. > >>>>>>>> > > (the short and simple summary of the above discussion is: > "In order to be able to say that there is a difference between > plain text and XML, can we claim that plain text is sequences > of characters and XML is sequences of octets?" > > > My answer to the question that Brian asked is: Yes, octets and > Unicode characters are different. The Unicode standard certainly > explains that, as does the Character Model: > http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-Storage > > But this is the wrong question to ask. It is totally inappropriate > to use different layers of an encoding model to make semantic > distinctions that are not related to this encoding model. > Although such a statement is not explicitly made in the Character > Model (because, frankly speaking, we didn't immagine that anybody > would come up with such an idea), it should be quite clear from > Section 3.5 Reference Processing Model > (http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-RefProcModel) > that this is very inappropriate. > > It seems that the encoding to UTF-8, inherited by Exclusive XML > Canonicalization from Canonical XML, and very suitable as a > preparation for digital signing and encryption or for parser > testing, is confusing. I will request a clarification to that > specification and will cc the RDF Core WG on that request. > > I am sure that a different and more appropriate way to make the > distinction can be found. > > > Regards, Martin. > >
Received on Friday, 25 July 2003 04:37:11 UTC