Re: What should we call RDF's ability to allow multiple models to peacefully coexist, interconnected? from Martynas Jusevičius on 2014N0309ś (semantic-web@w3.org from March 2014)

From: Martynas Jusevičius <martynas@graphity.org>
Date: Sun, 9 Mar 2014 22:12:33 +0100
To: "Timothy W. Cook" <tim@mlhim.org>
Cc: semantic-web <semantic-web@w3.org>, Michael Brunnbauer <brunni@netestate.de>
Message-ID: <CAE35Vmzok-q=3o0ZitF7bENPWPYC3jaxMuOm_5RNMrNxaBA3HA@mail.gmail.com>
Hey all,
Regarding RDF validation - I guess you all know about SPIN constraints,
right? They're SPARQL-based.
http://spinrdf.org/spin.html#spin-constraints
Martynas
graphityhq.com
On Mar 9, 2014 10:03 PM, "Timothy W. Cook" <tim@mlhim.org> wrote:
> On Sun, Mar 9, 2014 at 11:48 AM, Michael Brunnbauer <brunni@netestate.de>wrote:
>
>>
>> Hello Timothy,
>>
>> MLHIM seems to be annotated data models - with optional RDF annotations.
>>
>> Somewhat, but the models are are restrictions of a common reference
> model. Each model represents a concept that is as broad or narrow as the
> modeller chooses. The annotations must be optional. It is up to the
> domain experts/knowledge modellers to determine the resultant quality.
>
>
>
>
>> The claims regarding interoperability and semantics are a bit
>> exaggerated, IMO.
>>
>>
>
> I suppose your opinion will change when you decide to put some study into
> the matter.
>
>
>
>> If we had something like annotated portable RDB schemas, would they carry
>> less
>> meaning and would applications built with them be less interoperable than
>> with
>> MLHIM?
>>
>>
> If you were able to share those concept models between applications and
> they were restrictions of a common reference model; then yes they would be
> the same.
>
>
>
>> In order to make applications completely interoperable and remove all
>> implicit semantics from their code, you have to abolish them - replacing
>> them
>> with some standard component. This is probably as futile as the
>> ontology/data
>> model to rule them all.
>>
>
> Further study will show that there are paths to operate along in the
> interim. But yes, the eventual goal would be for a common healthcare
> reference model.
>
>
>>
>> I agree that the proposition of XML Schema is alluring: The information
>> about
>> the data model used and how to validate the data is always present and the
>> tools for validation are already there.
>>
>> You did not use RDF because it has no standard way to do this - which is
>> unfortunate.
>>
>
> It is unfortunate. After working with the openEHR Foundation on
> multi-level modelling for a decade using a domain specific language it was
> an easy realization that a relatively small group of people could not
> create high quality tools needed for a DSL; in any reasonable amount of
> time.
> I began looking for alternatives. OWL and RDF would be my first choices
> for implementation. They just weren't and still aren't mature enough to do
> everything needed. Remember as I stated before; the MLHIM reference model
> is a conceptual information model. I choose XML because I did not see
> anything with that capability and widespread adoption. I knew very little
> about XML Schema prior to this. So I did not choose it because it was my
> hammer already. I spent a lot of time on a lang learning curve and had to
> wait for tools to catch up to XML Schema 1.1
>
>>
>> You could have created a way and tools to do this in RDF. Did you fear the
>> necessary effort or the risk to adoption?
>>
>
> (see above)
> Given, time talent and money; openEHR could do it with the Archetype
> Definition Language. But it would never be as ubiquitous as XML.
>
>
>> It seems that XML Schema allows vocabulary reuse down to the
>> property/attribute
>> level - but the temptation to create own terms instead of reusing others
>> seems
>> to be greater than with RDF. Having some of the semantics in the XML
>> Schema
>> layer and more of it in the RDF layer on top of it definitely is a
>> drawback.
>>
>>
> There may be other/additional approaches that may help improve MLHIM. I
> am certainly open to and welcome dialog about it. The specifications (such
> that they are at this point) are openly available under a Creative Commons
> license. Feel free to join the discussion on social media (Google Plus
> preferred).
>
>
>
>> How many implementors will just ignore the optional RDF layer?
>>
>
> You must realize that software developers do not have control of the
> models in this approach. Domain experts that understand a little bit of
> how to use the CCD-Gen are the ones responsible for building the models.
> In the process of teaching them this activity, they are also taught the
> importance of the quality of their models and it ultimately decides the
> quality of their data.
>
> The MLHIM eco-system allows for closed loop concept models( CCDs) to be
> developed as well as openly licensed CCDs. There may eventually be 10,000
> blood pressure CCDs in the open. But like most things, we predict that
> most people will reuse a model that is good and openly available, instead
> of building their own.
>
> I can't decide for the experts nor do I want to control what is or is not
> a good model for any particular implementation. All I can do is offer them
> a real solution that is bottom up and under their control instead of slow
> moving international standards bodies that can't keep up with the changing
> science.
>
> Thanks for your feedback. Explaining MLHIM in words is always a learning
> experience for me.
>
> Regards,
> Tim
>
>
>
>
>
>>
>> Regards,
>>
>> Michael Brunnbauer
>>
>> On Sat, Mar 08, 2014 at 06:36:54PM -0300, Timothy W. Cook wrote:
>> > A very interesting and I think, foundational discussion. David, thanks
>> for
>> > bringing it up.
>> > Below is a discussion of why I believe that RDF should be considered a
>> > layer over data models or maybe as 'semantic glue'.
>> >
>> > David, we are working on the same type of problem but from slightly
>> > different perspectives. The presentation that you linked to
>> re:KnowMED, is
>> > very important and I recall seeing it before. I'll take this
>> opportunity
>> > to comment on it since it is in the context of this discussion. The
>> > indicates that you propse RDF as a language to be used in the exchange
>> of
>> > healthcare data. Then on slide #5 you say it isn't enough to 'get us
>> > there'. So I am not sure how much of this is marketing swagger and how
>> > much is hard fact.
>> >
>> > On slide #8 item #2 we are 100% in agreement. But then on slide #9 you
>> > are mixing apples and oranges. XML and RDF have two different purposes
>> > that work well together.
>> >
>> > On further slides, your Blue, Green and Red customers exactly indicate
>> > what I mean by RDF being an essential layer on top of multiple models.
>> >
>> > What happens further in the presentation is where we disagree. You
>> assert
>> > that RDF should be the language used to actually 'exchange' data. This
>> > where RDF and the tools around it (AFAIK) are not mature enough to
>> perform.
>> > Several times you have mentioned 'semantics and not syntax'. This is a
>> > huge mistake. You must have both in order to insure data quality and
>> > meaning. Secondly we know from history that top-down consensus in
>> > healthcare concept modelling is an impossibility.[1]
>> >
>> > In your post describing the BP screenshot you said:
>> > "Thus, although ex1:bp_023 and ex2:bp409 capture the same blood
>> pressure
>> > information, they represent that information differently. Nonetheless,
>> > both representations can peacefully coexist in the same merged RDF data
>> > without conflict, which might happen, for example, if one is derived
>> from
>> > the other through inference."
>> > I take this to mean that you are representing the exact same BP
>> measurement
>> > data in two different ways? Your use case, 'by inference' is a little
>> > fuzzy for me. If it is derivation by inference, it will just be an in
>> > memory representation and not persisted; correct? Irregardless, the
>> > existence of the same data instance, in the same application is in
>> complete
>> > contradiction to good data quality management. As you go on to explain,
>> > now you must add application intelligence to analyze whether or not two
>> > data instances are the same or not to avoid counting them as two
>> separate
>> > instances. This is approach is very dangerous, in addition to adding
>> > complexity and cost to the applications. However, having the ability
>> to
>> > determine if two different data instances exactly match the same
>> concept is
>> > essential. Minor differences such as the position of the patient
>> (stitting
>> > or prone) or the type of instrument used to perform the measurement or
>> the
>> > location on the body (left upper arm or right thigh, etc.) that the
>> > measurement was taken are all important. They may or may not rule in or
>> > out specific measurements, based on the intended use of the query
>> results.
>> > This is where RDF is essential, do these two instances point to exactly
>> > the same code in a controlled vocabulary, etc.? These questions are
>> > essential to having the ability to perform machine based reasoning over
>> the
>> > data repository; whether at the point of care or for research purposes.
>> >
>> > Refering back for a moment, to 'the same data instance' situation. It
>> is
>> > essential to have additional information (meta-data) to determine if two
>> > instances are are exactly the same. This can legitimately occur during
>> > aggregation for research or systemic quality analysis. Unique patient
>> > identifiers along with datetime stamps are ideal. However, the patient
>> > identifier issue is an ongoing problem that is actually implementation
>> > context and application specific. It is outside of the context of data
>> > quality and management.
>> >
>> > Slide #22 clearly indicates that there is an expectation that RDF is
>> used
>> > as a common format. However, as I said earlier, the current
>> implementation
>> > of RDF is not robust enough to perform this function, UNLESS, there is a
>> > global expert consensus on all healthcare concepts so that models may be
>> > created and distributed from a central authority. This is simply
>> > unrealistic as history has shown and is formalized in the Cavalini-Cook
>> > theory [1].
>> >
>> > The reason that I state that RDF is not capable, at this point of
>> maturity,
>> > is that it doesn't support the ability to represent syntactic
>> structures in
>> > a multi-level model environment. IOW: There is no ability (AFAIK) to
>> > express a common reference model and then derive concepts models that
>> issue
>> > further constraints. A multi-level model approach is essential in
>> order to
>> > abstract the syntax and semantics of each concept out of the application
>> > source code and repository schemas so that they can be shared between
>> > disparate applications. This is what provides for full syntactic and
>> > semantic interoperability.
>> >
>> > A multi-level model approach may or may not be useful in many domains.
>> > Specifically, human engineered domains that we fully understand can be
>> > modeled as one level representations. However, biological domains that
>> > involve evolutionary complexity are quite different. Primarily because
>> we
>> > do not fully understand them so our science and understanding is
>> constantly
>> > changing. Additionally, it appears that the data has a much longer
>> > lifetime of significance than other domains. Therefore the data should
>> be
>> > initially captured and represented in a manner that makes it as future
>> > proof and reusable as possible. In healthcare, the most semantically
>> rich
>> > point of any information is at the point of care. Every point of
>> > transition/translation after that will most assuredly lose context. As
>> a
>> > brief example; reference ranges for conditions change over time. It is
>> > essential that data captured today be expressed in the context of
>> today's
>> > knowledge, even 20 or more years from now. The concept model around
>> high
>> > blood pressure is different than it was 10 years ago.
>> >
>> > Where RDF shines is that in a syntactic model of a concept designed to
>> > capture reference ranges and other metadata, it can be used to provide
>> > external semantic context to that model. Whether that context exists
>> in a
>> > controlled vocabulary or even free text documents such as clinical
>> > guidelines.
>> >
>> > In the Multi-Level Healthcare Information Modelling (MLHIM) approach we
>> > developed a conceptual reference model to provide a basis for software
>> > implementations. While the MLHIM model doesn't preclude other
>> > serializations, we found that XML Schema 1.1 does provide the
>> prerequisites
>> > for implementation both a reference model and concepts models. This
>> means
>> > that we can have full validation of instance data back to the W3C
>> > specifications. By marking up the concept models (XML Schema 1.1
>> > annotations) with RDF providing the computable semantic links for each
>> > model as defined by the modeller. These models can now be created by
>> > domain experts (with additional knowledge modelling training) so that
>> > software developers do not have to interpret the meanings.
>> >
>> > The concept models are now fully detached from any specific
>> implementation
>> > and can be shared to use for validating instance data in the context in
>> > which it was recorded. I believe that this is the closest we have to
>> > semantic interoperability, to date. I am of course open for discussion
>> and
>> > debate on the issue. I used the acronym 'AFAIK' a few times above. I
>> used
>> > this because my last serious attempt to use RDF for this purpose was in
>> > 2010/2011. I know that there is a continuous maturing process going
>> on. I
>> > believe that there may come a day when RDF and OWL can be used
>> exclusively
>> > for syntactic and semantic representation and reasoning. But AFAIK, not
>> > today.
>> >
>> > We have a significant number of peer-reviewed publications about MLHIM
>> and
>> > academic as well as other implementations. I am happy to share those
>> with
>> > the group or you may peruse the links in my signature line as well as
>> > www.mlhim.org and the specs are openly downloadable from here[2] as a
>> > package and as source from here [3].
>> >
>> > We also have almost 2000 datatypes converted from other modeling
>> > approaches (such as the NIH CDE browser and HL7 FHIR) into reusable
>> > complexTypes to be used in concept models. You can review those as
>> well as
>> > download some example concept models from here[4]. Free registration is
>> > required to download the models.
>> >
>> > Kind Regards,
>> > Tim
>> >
>> >
>> > [1]
>> >
>> https://github.com/mlhim/specs/blob/2_4_3/graphics/cavalini_cook_theory.png
>> > [2]
>> >
>> https://launchpad.net/mlhim-specs/2.0/2.4.3/+download/mlhim-specs-2013N1015ś-2.4.3-Release.zip
>> > [3] https://github.com/mlhim/
>> > [4] http://www.ccdgen.com
>> >
>> >
>> >
>> >
>> > On Fri, Mar 7, 2014 at 5:00 PM, David Booth <david@dbooth.org> wrote:
>> >
>> > > Hi Alan,
>> > >
>> > >
>> > > On 03/07/2014 12:44 PM, Alan Ruttenberg wrote:
>> > >
>> > >> Can you explain what you mean by "RDF's ability to allow multiple
>> data
>> > >> models to peacefully coexist, interconnected, in the same data" ?
>> > >>
>> > >
>> > > Yes. Here is an imprecise illustration, on slides 10-17:
>> > >
>> http://dbooth.org/2013/semtech/slides/03-DavidBooth-rdf-as-universal.pdf
>> > > (I took some artistic liberties blurring class/instance distinctions
>> in
>> > > that diagram.)
>> > >
>> > > And here is a more precise example that cleanly distinguishes classes
>> from
>> > > instances:
>> > > http://tinyurl.com/pzsgf7f
>> > > (I've also attached the same illustration, for offline readers.)
>> > >
>> > > In this latter example (of a hypothetical systolic blood pressure
>> > > measurement), the same information is represented according to two
>> > > different models/schemas/vocabularies/ontologies, v1 (green) and v2
>> > > (red). (I am using the terms model, schema, vocabulary and ontology
>> > > loosely and somewhat interchangeably here.)
>> > >
>> > > In the v1 model, the systolic blood pressure is indicated in RDF like
>> this:
>> > >
>> > > ex:patient319 foaf:name "John Doe" ;
>> > > v1:bps ex1:bp_023 .
>> > >
>> > > ex1:bp_023 a v1:SystolicBPSitting_mmHg ;
>> > > v1:value 120 .
>> > >
>> > > Whereas in the v2 model, the same information is represented
>> differently,
>> > > in RDF like this:
>> > >
>> > > ex:patient319 foaf:name "John Doe" ;
>> > > v2:bps ex2:bp_409 .
>> > >
>> > > ex2:bp_409 a v2:SystolicBP ;
>> > > v2:pressure 120 ;
>> > > v2:units v2:mmHg ;
>> > > v2:bodyPosition v2:sitting .
>> > >
>> > > Thus, although ex1:bp_023 and ex2:bp409 capture the same blood
>> pressure
>> > > information, they represent that information differently.
>> Nonetheless,
>> > > both representations can peacefully coexist in the same merged RDF
>> data
>> > > without conflict, which might happen, for example, if one is derived
>> from
>> > > the other through inference.
>> > >
>> > > Furthermore, the relationship between these classes,
>> > > v1:SystolicBPSitting_mmHg and v2:SystolicBP, and hence the
>> relationship
>> > > between the corresponding v1 and v2 instance data, can also be
>> explicitly
>> > > captured in RDF, as the v1v2:SystolicBP_Transform (yellow)
>> relationship:
>> > >
>> > > v1:SystolicBPSitting_mmHg v1v2:SystolicBP_Transform v2:SystolicBP .
>> > >
>> > > Inference rules for v1v2:SystolicBP_Transform could therefore convert
>> a
>> > > v1:SystolicBPSitting_mmHg measurement to a v2:SystolicBP measurement
>> or
>> > > vice versa.
>> > >
>> > > This example only illustrated the case where the transformation from
>> one
>> > > model to the other is lossless and thus reversible. Usually that
>> isn't the
>> > > case. Relating models and transforming between them is *not* easy,
>> but at
>> > > least RDF makes it possible to explicitly indicate these
>> relationships.
>> > >
>> > > Obviously some intelligence must be exercised to avoid, for example,
>> > > accidentally thinking that ex:bp_023 and ex2:bp_409 represent two
>> distinct
>> > > blood pressure measurements, and thereby double counting them, but
>> that's
>> > > easy enough to do.
>> > >
>> > > Also, there isn't always a desire to relate or transform between
>> models.
>> > > Sometimes some data is related and other data is not, and it is all
>> still
>> > > merged into the same RDF graph. In fact, the point may be to connect
>> that
>> > > part of the data that *is* related and let the rest coexist without
>> being
>> > > connected (or at least not *directly* connected).
>> > >
>> > > The point is that these data models can peacefully coexist in RDF data
>> > > without conflict: applications using the v1 model against the merged
>> data
>> > > might only see v1 instance data, whereas applications using the v2
>> model
>> > > might only see the v2 data. That's qualitatively different than in
>> the
>> > > world of XML, for example, where one schema generally wants to be "on
>> top",
>> > > and when you merge XML of different schemas, you need to create a new
>> "top"
>> > > schema. That is the difference that I have so often tried to explain
>> to
>> > > people outside the RDF community, and what I am trying to capture
>> > > succinctly in a term or phrase. It isn't an easy idea to convey to
>> those
>> > > who are accustomed to a schema-centric approach. I think a catchy but
>> > > descriptive term or phrase could help.
>> > >
>> > > Thanks,
>> > > David
>> > >
>> > >
>> > >> -Alan
>> > >>
>> > >>
>> > >> On Fri, Mar 7, 2014 at 11:20 AM, David Booth <david@dbooth.org
>> > >> <mailto:david@dbooth.org>> wrote:
>> > >>
>> > >> I -- and I'm sure many others -- have struggled for years trying
>> to
>> > >> succinctly describe RDF's ability to allow multiple data models
>> to
>> > >> peacefully coexist, interconnected, in the same data. For data
>> > >> integration, this is a key strength of RDF that distinguishes it
>> > >> from other information representation languages such as XML. I
>> > >> have tried various terms over the years -- most recently "schema
>> > >> promiscuous" -- but have not yet found one that I think really
>> nails
>> > >> it, so I would love to get other people's thoughts.
>> > >>
>> > >> This google doc lists several candidate terms, some pros and
>> cons,
>> > >> and allows you to indicate which ones you like best:
>> > >> http://goo.gl/zrXQgj
>> > >>
>> > >> Please have a look and indicate your favorite(s). You may also
>> add
>> > >> more ideas and comments to it. The document can be edited by
>> anyone
>> > >> with the URL.
>> > >>
>> > >> Thanks!
>> > >> David Booth
>> > >>
>> > >>
>> > >>
>> >
>> >
>> > --
>> > MLHIM VIP Signup: http://goo.gl/22B0U
>> > ============================================
>> > Timothy Cook, MSc +55 21 994711995
>> > MLHIM http://www.mlhim.org
>> > Like Us on FB: https://www.facebook.com/mlhim2
>> > Circle us on G+: http://goo.gl/44EV5
>> > Google Scholar: http://goo.gl/MMZ1o
>> > LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
>>
>> --
>> ++ Michael Brunnbauer
>> ++ netEstate GmbH
>> ++ Geisenhausener Stra?e 11a
>> ++ 81379 M?nchen
>> ++ Tel +49 89 32 19 77 80
>> ++ Fax +49 89 32 19 77 89
>> ++ E-Mail brunni@netestate.de
>> ++ http://www.netestate.de/
>> ++
>> ++ Sitz: M?nchen, HRB Nr.142452 (Handelsregister B M?nchen)
>> ++ USt-IdNr. DE221033342
>> ++ Geschäftsf?hrer: Michael Brunnbauer, Franz Brunnbauer
>> ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
>>
>
>
>
> --
> MLHIM VIP Signup: http://goo.gl/22B0U
> ============================================
> Timothy Cook, MSc +55 21 994711995
> MLHIM http://www.mlhim.org
> Like Us on FB: https://www.facebook.com/mlhim2
> Circle us on G+: http://goo.gl/44EV5
> Google Scholar: http://goo.gl/MMZ1o
> LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
>
Received on Sunday, 9 March 2014 21:13:02 UTC