Showing posts with label ontology design patterns. Show all posts
Showing posts with label ontology design patterns. Show all posts
Monday, March 5, 2018
Using Ontology Design Patterns as Templates for Alignment
The paper, When owl:sameAs isn't the Same, has an interesting observation that I (often) have to address when working with my customers ...
So, clearly there is a problem. To solve it, I have started applying Ontology Design Patterns (ODPs) to define a consistent interpretation and usage of one or more ontologies for an application. I have found that ODPs are not just for ontology development and reuse!
The list below shows the information that I include in an ODP to inform people/teams that are using ontologies to define consistent semantics for their data:
P.S. If you aren't sure what competency questions are, please see the blog post, Breaking Down the "Documents and Policies" Project.
Andrea
Contrary to popular belief in some circles, formal semantics are not a silver bullet. Just because a construct in a knowledge representation language is prescribed a behavior using formal semantics does not necessarily mean that people will follow those semantics when actually using that language “in the wild.” This can be laid down to a wide variety of reasons. In particular, the language may not provide the facilities needed by people as they actually try to encode knowledge, so they may use a construct that seems close enough to their desired one. A combination of not reading specifications - especially formal semantics, which even most software developers and engineers lack training in - and the labeling of constructs with “English-like” mnemonics naturally will lead to the use of a knowledge representation language by actual users that varies from what its designers intended.A variant of this subject also came up in the opening keynote address (From Artwork to Cyber Attacks) at the US. Semantics Symposium 2018 held last week at Wright State University. Craig Knoblock (USC Information Sciences Institute) gave the keynote and said that the mapping of information in the American Art Collaborative Project (slides 32-38) was difficult due to the various interpretations that different students applied to the backing ontology. Although the students who mapped the data all used the CIDOC CRM cultural heritage ontology, at the end of the day, there was much clean-up and coordination required. On slide 35, the statistics are reviewed - although only 76 files were mapped, there were 4636 commits required to get consistency!
So, clearly there is a problem. To solve it, I have started applying Ontology Design Patterns (ODPs) to define a consistent interpretation and usage of one or more ontologies for an application. I have found that ODPs are not just for ontology development and reuse!
The list below shows the information that I include in an ODP to inform people/teams that are using ontologies to define consistent semantics for their data:
- Description of why the pattern is needed and what problems are being solved - This includes:
- A short statement describing what the pattern is used for and/or what is being mapped
- The considerations that influenced the pattern or required the pattern to be created - for example:
- When defining whole-part relationships, how deeply should the mereology hierarchy go?
- What is needed to identify "the same" individuals?
- How to determine the most appropriate type(s) for an individual?
- What meta-data is needed?
- Design sources for the ODP (this is not be needed if the source is a single ontology)
- High level (block) diagram of the main concepts and relationships in the ODP
- Overview of each of the concepts/classes and relationships shown in the block diagram, as well as important sub-classes and sub-properties
- Include in the overview or in another section of the ODP any differences in approaches - for example, why a detailed class hierarchy might be used in one place but not in another
- Class and property diagrams related to the overview-ed classes and properties (such as can be generated by OntoGraph)
- Information on using the ODP
- One or more examples illustrating the use of the ODP, along with instance/individual diagrams
- A list of competency questions with SPARQL queries representing the questions
- The results of applying the queries to the examples generated above (this step then also provides a set of test cases to gauge conformance)
P.S. If you aren't sure what competency questions are, please see the blog post, Breaking Down the "Documents and Policies" Project.
Andrea
Friday, December 15, 2017
Referencing Reused Classes and Properties When Working with Other Ontologies
While I am still toiling away on re-working OntoGraph to support diagramming RDF/RDFS (yes, it seems to be a major undertaking!), I thought that I would post a question that I received. Here it is ... "When reusing a bunch of different ontologies in a new ontology, how should reused classes and properties be referenced?" Should each of the reused ontologies be included "en masse", should individual entities be reused directly, should entities be redefined in the new ontology but using their original namespace, or should the entities be recreated? Unfortunately, this is a question that has no right answer, but I have some preferences.
First, let me explain the alternatives:
A namespace exists to establish the provenance of the entities defined within it, and to identify that the entities are related. Ontologies should have loose coupling and tight cohesion, just like code - and the namespace can (should?) indicate the purpose/domain-space of the ontology. You can certainly group everything under the umbrella of a namespace that represents "my overall application space" - but that seems a bit too broad. Also, you might have another application in the future where you re-use one or more of your own ontologies - and then, one might question the "my overall application space" namespace, or question which entities in that namespace are relevant to the new application.
Also, a namespace helps to disambiguate entities that might have the same name - but not necessarily the same semantics (or detail of semantics) - across different ontologies. For example, a Location entity in an Event ontology (or more correctly, ontology design pattern, ODP) should not go into detail about Locations (that is not the purpose of the ontology). Defining locations would be better served by other ontologies that specifically deal with network, spatial-temporal, latitude-longitude-altitude and/or other kinds of locations. So, an under-defined Location in an Event ODP can then link - as an equivalent class - to the more detailed location declarations in other "Location"-specific ODPs. In this way, you get loose coupling and tight cohesion. You can pull out one network location ODP and replace it by a better one - without affecting the Event ODP. In this case, you would only change the equivalentClass definition. :-)
As for re-creating entities in the ODP namespace, that is really done for convenience. I can actually argue both sides of this issue (keeping the entities with their namespaces/provenance versus recreating them). But, erring on the side of simplicity, I recommend recreating entities in the new ontology's namespace (the last bullet above). This is especially relevant if only a portion of several existing ontologies/namespaces will be re-used. Why import large ontologies when you only need a handful of classes and properties? This can confuse your users and developers as to what is really relevant. Plus, you will have new entities/properties/axioms being defined in your new ontology. If you do not recreate entities, you end up with lots of different namespaces, and this translates to lots of different namespaces in your individuals. Your users and developers can become overwhelmed keeping track of which concept comes from which namespace.
For example, you may take document details from the SPAR DoCo ontology (http://www.sparontologies.net/ontologies/doco/source.ttl) and augment it with data from the Dublin Core (http://dublincore.org/2012/06/14/dcterms.rdf) and PRISM (http://prismstandard.org/namespaces/basic/2.0/) vocabularies, and then add details from the PROV-O ontology (http://www.w3.org/ns/prov-o). All these classes and properties use different namespaces and it gets hard to remember which is which. E.g., "foo" is an instance of the doco:document class and uses the dcterms:publisher and prism:doi properties, but is linked to a revision using a prov:wasDerivedFrom property. This could lead to errors in creating and querying the instances. It seems easier to say "foo" is an instance of the myData:document class, and uses the predicates myData:author, myData:publisher, myData:doi and myData:derivedFrom (where "myData" is the namespace of the ODP for tracking document details).
I know that some might disagree (or might agree!). If so, let me know.
Andrea
First, let me explain the alternatives:
- Included "en masse" means using import statements for each re-used ontology, and then referencing the specific entities (classes and properties) that are actually needed. Everything is referenced in the namespace where it was defined, and nothing is redefined or recreated.
- Reusing a class or property directly means referencing that class or property but without importing the entire ontology. Everything is referenced using the namespace where it was defined, and nothing is redefined or recreated. But, you might end up with a triple that looks like this: myNamespace:someKindOfDate a owl:DatatypeProperty, owl:subPropertyOf dcterms:date. And, it is up to the infrastructure to resolve the "dcterms" (Dublin Core) namespace to get the details of the date property.
- Redefining entities means that you take the classes or properties that should be reused and include their definitions in your ontology. So, if you are using the Dublin Core "creator" concept, you would include a definition for dcterms:creator. You might even add more information, as new predicates/objects defined for the entity, or maybe just copy over the existing predicates. Why might you do this? One reason is to have all the necessary details in one place. But, just as this is considered bad practice in programming (having multiple copies of the same code), I believe that copy and paste of another ontology's definition (using the same IRI/URI) is also wrong. You could end up with duplicated (or worse) divergent, or out-of-date declarations.
- Recreating entities is similar to redefining them, but different in some important ways. In this case, you create a semantically equivalent entity. Using the example above, a myNamespace:author entity might be created and the relevant details defined for it. In addition, you define an equivalentClass/Property declaration, linking it to its source (in this case, dcterms:creator). Taking this approach, if dcterms:creator means something different in a future version, the equivalentProperty statement can be removed. Or, if a new metadata standard is dictated by your company or customer, you simply add another newMetadataNamespace:author equivalentProperty declaration.
A namespace exists to establish the provenance of the entities defined within it, and to identify that the entities are related. Ontologies should have loose coupling and tight cohesion, just like code - and the namespace can (should?) indicate the purpose/domain-space of the ontology. You can certainly group everything under the umbrella of a namespace that represents "my overall application space" - but that seems a bit too broad. Also, you might have another application in the future where you re-use one or more of your own ontologies - and then, one might question the "my overall application space" namespace, or question which entities in that namespace are relevant to the new application.
Also, a namespace helps to disambiguate entities that might have the same name - but not necessarily the same semantics (or detail of semantics) - across different ontologies. For example, a Location entity in an Event ontology (or more correctly, ontology design pattern, ODP) should not go into detail about Locations (that is not the purpose of the ontology). Defining locations would be better served by other ontologies that specifically deal with network, spatial-temporal, latitude-longitude-altitude and/or other kinds of locations. So, an under-defined Location in an Event ODP can then link - as an equivalent class - to the more detailed location declarations in other "Location"-specific ODPs. In this way, you get loose coupling and tight cohesion. You can pull out one network location ODP and replace it by a better one - without affecting the Event ODP. In this case, you would only change the equivalentClass definition. :-)
As for re-creating entities in the ODP namespace, that is really done for convenience. I can actually argue both sides of this issue (keeping the entities with their namespaces/provenance versus recreating them). But, erring on the side of simplicity, I recommend recreating entities in the new ontology's namespace (the last bullet above). This is especially relevant if only a portion of several existing ontologies/namespaces will be re-used. Why import large ontologies when you only need a handful of classes and properties? This can confuse your users and developers as to what is really relevant. Plus, you will have new entities/properties/axioms being defined in your new ontology. If you do not recreate entities, you end up with lots of different namespaces, and this translates to lots of different namespaces in your individuals. Your users and developers can become overwhelmed keeping track of which concept comes from which namespace.
For example, you may take document details from the SPAR DoCo ontology (http://www.sparontologies.net/ontologies/doco/source.ttl) and augment it with data from the Dublin Core (http://dublincore.org/2012/06/14/dcterms.rdf) and PRISM (http://prismstandard.org/namespaces/basic/2.0/) vocabularies, and then add details from the PROV-O ontology (http://www.w3.org/ns/prov-o). All these classes and properties use different namespaces and it gets hard to remember which is which. E.g., "foo" is an instance of the doco:document class and uses the dcterms:publisher and prism:doi properties, but is linked to a revision using a prov:wasDerivedFrom property. This could lead to errors in creating and querying the instances. It seems easier to say "foo" is an instance of the myData:document class, and uses the predicates myData:author, myData:publisher, myData:doi and myData:derivedFrom (where "myData" is the namespace of the ODP for tracking document details).
I know that some might disagree (or might agree!). If so, let me know.
Andrea
Labels:
namespaces,
ontology design patterns,
reuse
Saturday, April 5, 2014
Ontology Reuse and Ontology Summit 2014
I've been doing a lot of thinking about ontology and vocabulary reuse (given my role as co-champion of Track A in Ontology Summit 2014). We are finally in our "synthesis" phase of the Summit, and I just updated our track's synthesis draft yesterday.
So, while this is all fresh in my mind, I want to highlight a few key take-aways ... For an ontology to be reused, it must provide something "that is commonly needed"; and then, the ontology must be found by someone looking to reuse it, understood by that person, and trusted as regards its quality. (Sam Adams made all these points in 1993 in a panel discussion on software reuse.) To be understood and trusted, it must be documented far more completely than is (usually) currently done.
Here are some of the suggestions for documentation:
VOCREF is a good start at specifying characteristics for an ontology. I will certainly continue to contribute to it. But, I also feel that too much content is contained in the vocref-top ontology (I did create an issue to address this). That makes it too top-heavy and not as reusable as I would like. Some of the content needs to be split into separate ontologies that can be reused independently of characterizing an ontology. Also, the VOCREF ontology needs to "dog-food" its own concepts, relationships, ... VOCREF itself needs to be more fully documented.
To try to help with ontology development and reuse, I decided to start a small catalog of content (I won't go so far as to call it a "repository"). The content in the catalog will vary from annotation properties that can provide basic documentation, to general concepts applicable to many domains (for example, a small event ontology), to content specific to a domain. The catalog may directly reference, document and (possibly) extend ontologies like VOCREF (with correct attribution), or may include content that is newly developed. For example, right now, I am working on some general patterns and a high level network management ontology. I will post my current work, and then drill-down to specific semantics.
All of the content will be posted on the Nine Points github page. The content will be fully documented, and licensed under the MIT License (unless prohibited by the author and the licensing of the original content). In addition, for much of the content, I will also try to discuss the ontology here, on my blog.
Let me know if you have feedback on this approach and if there is some specific content that you would like to see!
Andrea
So, while this is all fresh in my mind, I want to highlight a few key take-aways ... For an ontology to be reused, it must provide something "that is commonly needed"; and then, the ontology must be found by someone looking to reuse it, understood by that person, and trusted as regards its quality. (Sam Adams made all these points in 1993 in a panel discussion on software reuse.) To be understood and trusted, it must be documented far more completely than is (usually) currently done.
Here are some of the suggestions for documentation:
- Fully describe and define each of the concepts, relationships, axioms and rules that make up the ontology (or fragment)
- Explain why the ontology was developed
- Explain how the ontology is to be used (and perhaps how the uses may vary with different triple stores or tools)
- Explain how the ontology was/is being used (history) and how it was tested in those environment(s)
- Explain differences, if it is possible to use the ontology in different ways in different domains and/or for different purposes
- Provide valid encoding(s) of the ontology
- These encodings should discuss how each has evolved over time
- "Valid" means that there are no consistency errors when a reasoner is run against the ontology
- It is also valuable to create a few individuals, run a reasoner, and make sure that the individual's subsumption hierarchy is correct (e.g., an individual that is supposed to only be of type "ABC", is not also of type "DEF" and "XYZ")
- Multiple encodings may exist due to the use of different syntaxes (Turtle and OWL Functional Syntax, for example, to provide better readability, and better version control, respectively) and to specifically separate the content to provide:
- A "basic" version of the ontology with only the definitive concepts, axioms and properties
- Other ontologies that add properties and axioms, perhaps to address particular domains
- Rules that apply to the ontology, in general or for particular domains
VOCREF is a good start at specifying characteristics for an ontology. I will certainly continue to contribute to it. But, I also feel that too much content is contained in the vocref-top ontology (I did create an issue to address this). That makes it too top-heavy and not as reusable as I would like. Some of the content needs to be split into separate ontologies that can be reused independently of characterizing an ontology. Also, the VOCREF ontology needs to "dog-food" its own concepts, relationships, ... VOCREF itself needs to be more fully documented.
To try to help with ontology development and reuse, I decided to start a small catalog of content (I won't go so far as to call it a "repository"). The content in the catalog will vary from annotation properties that can provide basic documentation, to general concepts applicable to many domains (for example, a small event ontology), to content specific to a domain. The catalog may directly reference, document and (possibly) extend ontologies like VOCREF (with correct attribution), or may include content that is newly developed. For example, right now, I am working on some general patterns and a high level network management ontology. I will post my current work, and then drill-down to specific semantics.
All of the content will be posted on the Nine Points github page. The content will be fully documented, and licensed under the MIT License (unless prohibited by the author and the licensing of the original content). In addition, for much of the content, I will also try to discuss the ontology here, on my blog.
Let me know if you have feedback on this approach and if there is some specific content that you would like to see!
Andrea
Labels:
ontology,
ontology design patterns,
Ontology Summit 2014,
reuse,
VOCREF
Subscribe to:
Comments (Atom)