Saturday, July 26, 2014
Another OWL diagramming transform and some more thoughts on writing
First, I wrote a new XSL transform that outputs all NamedIndividuals specified in an ontology file. The purpose was to help with diagramming enumerations. (I made a simplifying assumption that you added individuals into a .owl file in order to create enumerated or exemplary individuals.) The location of the transform is GitHub (check out http://purl.org/NinePts/graphing). And, details on how to use the transform (for example, with the graphical editor, yED) is described in my post, Diagramming an RDF/XML ontology.
If you don't want some individuals included, feel free to refine the transform, or just delete individuals after an initial layout with yEd.
Second, here are some more writing tips, that build on the post, Words and writing .... Most of these I learned in high school (a very long time ago), as editor of the school paper. (And, yes, I still use them today.)
- My teacher taught us to vary the first letter of each paragraph, and start the paragraphs with interesting words (e.g., not "the", "this", "a", ...). Her point was that people got an impression of the article from glancing at the page, and the first words of the paragraphs made the most impression. If the words were boring, then the article was boring. I don't know if this is true, but it seems like a reasonable thing. Another good practice is to make sure your paragraphs are relatively short, so as not to seem overwhelming. (I try to keep my paragraphs under 5-6 sentences.) Also, each paragraph should have a clear focus and stick to it. It is difficult to read when the main subject of a paragraph wanders.
- Lastly, use a good opening sentence for each paragraph. It should establish the contents of the paragraph - setting it up for more details to come in the following sentences.
Andrea
Tuesday, May 20, 2014
Diagramming an RDF/XML OWL ontology
Recently, I was reading some emails on the Linked Data community distribution list about how they generate the LOD cloud diagram. Omnigraffle is used in the "official" workflow to create this diagram, but that tool costs money to buy. One of the email replies discussed a different approach.
A gentleman from Freenet.de needed to draw a similar diagram for the data cloud for the Open Linguistics Working Group. His team could not use the same code and processing flow as the LOD cloud folks, since they didn't have many Mac users. So, they developed an alternative based on GraphML. To create the basic graph, they developed a Python script. And, ...
Using yed's "organic" layout, a reasonable representation can be achieved which is then manually brought in shape with yed (positioning) and XML (font adjustment). In yed, we augment it with a legend and text and then export it into the graphic format of choice.Given my propensity to "reuse" good ideas, I decided to investigate GraphML and yEd. And, since GraphML is XML, ontologies can be defined in RDF/XML, and XSLT can be used to transform XML definitions, I used XSLT to generate various GraphML outputs of an ontology file. Once the GraphML outputs were in place, I used yEd to do the layout, as the Freenet.de team did. (It is important to note that the basic yEd tool is free. And, layout is the most difficult piece of doing a graphic.)
So, what did I find? You can be the judge. The XSLTs are found on GitHub (check out http://purl.org/NinePts/graphing). There are four files in the graphing directory:
- AnnotationProperties.xsl - A transform of any annotation property definitions in an RDF/XML file, drawing them as rectangles connected to a central entity named "Annotation Properties".
- ClassHierarchies.xsl - A transform of any class definitions in an RDF/XML file, drawing them in a class-superclass hierarchy.
- ClassProperties.xsl - A transform of any data type and object property definitions in an RDF/XML file, drawing them as rectangles with their types (functional, transitive, etc.) and domains and ranges.
- PropertyHierarchies.xsl - A transform of any data type and object property definitions in an RDF/XML file, drawing their property-super property relationships.
xsltproc -o result.graphml ../graphing/ClassProperties.xsl metadata-properties.owlI then took the result.graphml and opened it in the yEd Graph Editor. (If you do the same, you will find that all the classes, or properties lay on top of each other. I made no attempt to do any kind of layout since I planned to use yEd for that purpose.) For the class properties graph (from the above invocation), I used the Layout->Radial formatting, with the default settings. Here is the result:
I was impressed with how easy this was!
The really great thing is that if you don't like a layout, you can choose another format and even tweak the results. I did some tweaking for the "Property Hierarchies" diagram. In this case, I ran the PropertyHierarchies.xsl against the metadata-properties.owl file and used the Hierarchical Layout on the resulting GraphML file. Then, I selected all the data properties and moved them underneath the object properties. Here is the result:
Admittedly, the diagrams can get quite complex for a large ontology. But, you can easily change/combine/separate the XSLT transforms to include more or less content.
With about a day and half's worth of work (and using standards and free tooling), I think that I saved myself many frustrating and boring hours of diagramming. Let me know if you find this useful, or you have other suggestions for diagramming ontologies.
Andrea
Wednesday, May 7, 2014
Updated metadata ontology file (V0.6.0) and new metadata-properties ontology (V0.2.0) on GitHub
You can also see that there is a new addition to the metadata directory with the metdata-properties ontology. Metadata-properties takes some of the concepts from metadata-annotations, and redefines them as data and object properties. In addition, a few supporting classes are defined (specifically, Actor and Modification), where required to fully specify the semantics.
Actor is used as the subject of the object properties, contributedTo and created. Modification is designed to collect all the information related to a change or update to an individual. This is important when one wants to track the specifics of each change as a set of related data. This may not be important - for example, if one only wants to track the date of last modification or only track a description of each change. In these cases, the data property, dateLastModified, or the annotation property, changeNote, can be the predicate of a triple involving the updated individual directly.
It is important to understand that only a minimum amount of information is provided for Actor and Modification. They are defined, but are purposefully underspecified to allow application- or domain-specific details to be provided in another ontology. (In which case, the IRIs of the corresponding classes in the other ontology would be related to Actor and Modification using an owl:equivalentClass axiom. This was discussed in the post on modular ontologies, and tying together the pieces.)
Also in the metadata-properties ontology, an identifier property is defined. It is similar to the identifier property from Dublin Core, but is not equivalent since the metadata-properties' identifier is defined as a functional data property. (The Dublin Core property is "officially" defined as an annotation property.)
To download the files, there is information in the blog post from Apr 17th.
Please let me know if you have any feedback or issues.
Andrea
Monday, April 28, 2014
General, Reusable Metadata Ontology - V0.2
I have taken all the feedback, and reworked and simplified the ontology (I hope). All the changes are documented in the ontology's changeNote.
Important sidebar: I strongly recommend using something like a changeNote to track the evolution of every ontology and model.
As noted in the Apr 16th post, most of the concepts in the ontology are taken from the Dublin Core ELements vocabulary and the SKOS data model. In this version, the well-established properties from Dublin Core and SKOS use the namespaces/IRIs from those sources (http://purl.org/dc/elements/1.1/ and http://www.w3.org/2004/02/skos/core#, respectively). Some examples are dc:contributor, dc:description and skos:prefLabel. Where the semantics are different, or more obvious names are defined (for example, creating names that provide "directions" for the skos:narrower and broader relations), then the purl.org/ninepts namespace is used.
This release is getting much closer to a "finished" ontology. All of the properties have descriptions and examples, and most have scope/usage notes. The ontology's scope note describes what is not mapped from Dublin Core and SKOS, and why.
In addition, I have added two unique properties for the ontology. One is competencyQuestions and the other is competencyQuery. The concept of competency questions was originally defined in a 1995 paper by Gruninger and Fox as "requirements that are in the form of questions that [the] ontology must be able to answer." The questions help to define the scope of the ontology, and are [should be] translated to queries to validate the ontology. These queries are captured in the metadata ontology as SPARQL queries (and the corresponding competency question is included as a comment in the query, so that it can be tracked). This is a start at test-driven development for ontologies. :-)
Please take a look at the ontology (even if you did before since it has evolved), and feel free to comment or (even better) contribute.
Andrea
Thursday, April 17, 2014
Downloading the Metadata Ontology Files from GitHub
You are certainly free to fork the repository and get a local copy. Or, you can just download the file(s) by following these instructions:
- LEFT click on the file in the directory on GitHub
- The file is displayed with several tabs across the top. Select the Raw tab.
- The file is now displayed in your browser window as text. Save the file to your local disk using the "Save Page As ..." drop-down option, under File.
I try to note this in a short comment on the ontology (but given the confusion, I should probably expand the comment). I am also working on a metadata-properties ontology which defines some of the annotation properties as data and object properties. This will allow (for example) validating dateTime values and referencing objects/individuals in relations (as opposed to using literal values). It is important to note, however, that you can only use data and object properties with individuals (and not with class or property declarations, or you end up with OWL Full with no computational guarantees/no reasoning).
Lastly, for anyone that objects to using annotation properties for mappings (for example, where I map SKOS' exactMatch in the metadata-annotations ontology), no worries ... More is coming. As a place to start, I defined exactMatch, moreGeneralThan, moreSpecificThan, ... annotation properties for documentation and human-consumption. (I have to start somewhere. :-) And, I tried to be more precise in my naming than SKOS, which names the latter two relations, "broader" and "narrower", with no indication of whether the subject or the object is more broad or more narrow. (I always get this mixed up if I am away from the spec for more than a week. :-)
I want to unequivocally state that annotation properties are totally inadequate to do anything significant. But, they are a start, and something that another tool could query and use. Separately, I am working on a more formal approach to mapping but starting with documentation is where I am.
Obviously, there is a lot more work in the pipeline. I just wish I had more time (like everyone).
In the meantime, please let me know if you have more questions about the ontologies or any of my blog entries.
Andrea
Wednesday, April 16, 2014
General, Reusable, Metadata Ontology
I started with something relatively easy (I thought), which was a consolidation of basic Dublin Core and SKOS concepts into an OWL 2 ontology. The work is not yet finished (I have only been playing with the definition over the last few days). The "finished" pieces are the ontology metadata/documentation (including what I didn't map and why), and several of the properties (contributor, coverage, creator, date, language, mimeType, rights and their sub-properties). The rest is all still a work-in-progress.
It has been interesting creating and dog-fooding the ontology. I can definitely say that it was updated based on my experiences in using it!
You can check out the ontology definition on github (http://purl.org/ninepts/metadata). My "master" definition is in the .ofn file (OWL functional syntax), and I used Protege to generate a Turtle encoding from it. My goals are to maintain the master definition in a version-control-friendly format (ofn), and also providing a somewhat human-readable format (ttl). I also want to experiment with different natural language renderings that are more readable than Turtle (but I am getting ahead of myself).
I would appreciate feedback on this metadata work, and suggestions for other reusable ontologies (that would help to support industry and refine the development methodology). Some of the ontologies that I am contemplating are ontologies for collections, events (evaluating and bringing together concepts from several, existing event ontologies), actors, actions, policies, and a few others.
Please let me know what you think.
Andrea
Wednesday, February 5, 2014
More on modular ontologies and tying them together
The style of modularity you mention, with what another summit poster (forgive me for forgetting who at the moment) referred to as 'placeholder' concepts within modules, can be very effective. The most effective technique I've found to date, for some cases. Two additional points are worth making about how two execute this for maximum effectiveness (they may match what you've done, in fact, but are sometimes missed & so worth calling out for others. Point 1: lots of annotation on the placeholders. The location & connection of the well-defined concepts to link them to is often being saved for later and possibly for someone else. In order to make sure the right external concept is connected, whatever is known or desired of the underspecifies concept shoud be captured (in the location case, for example, may be that it needs to support enough granularit to be used for location at which a person can be contacted at current time, or must be the kind os location that has a shipping address, or is only intended to be the place of business of the enterprise to which Person is assigned & out of which they operate (e.g., embassy, business office, base, campus). That's often known or easily elicitable without leaving the focus of a specialized module, and can be captured in an annotation for use in finding existing, well defined ontology content and mapping. Point 2: advantages of modules, as you described are best maintained when the import and mapping are done *not* in the specialized module, but in a "lower" mapping module that inherits the specialized module and the mapping-target ontologies. Spindles of ontologies, which can be more or less intricate, allow for independent development and reuse of specialized modules, with lower mapping and integration modules, with a spindle-bottom that imports all in the spindle and effectivle acts as the integrated query, testing, and application module for all the modules contained in that spindle, providing a simplified and integrated interface to a more complex and highly modular system of ontologies. Meanwhile, specialized modules can be developed with SMEs who don't know, care, or have time to think about the stuff they aren't experts about, like distinguishing kinds location or temporal relations or the weather. Using placeholders and doing your mapping elsewhere may sound like extra work, but considering what it can enable, it can be an incredibly effective approach.Indeed, the second point is exactly my "integrating" ontology, which imports the target ontologies and does the mapping. As to the first point, that is very much worth highlighting. I err on the side of over-documenting and use various different kinds of notes and annotation. For a good example, take a look at the annotation properties in the FIBO Foundations ontology. It includes comment, description, directSource, keyword, definition, various kinds of notes, and much more. Another set of annotation properties that I use (which I have not seen documented before, but that I think is valuable for future mapping exercises) are WordNet synset references - as direct references or designating them as hyponyms or hypernyms. (For those not familiar with WordNet, check out this page and a previous blog post.) Andrea
Sunday, February 2, 2014
Creating a modular ontology and then tying the pieces together
So, how we tie the modules together in an application?
In a recent project, I used the equivalentClass OWL semantic to do this. For example, in a Person ontology, I defined the Person concept with its relevant properties. When it came to the Person's Location - that was just an under-specified (i.e., empty) Location class. I then found a Location ontology, developed by another group, and opted to use that. Lastly, I defined an "integrating" ontology that imported the Person and Location ontologies, and specified an equivalence between the relevant concepts. So, PersonNamespace:Location was defined as an equivalentClass to LocationNamespace:Location. Obviously, the application covered up all this for the users, and my triple store (with reasoner) handled the rest.
This approach left me with a lot of flexibility for reuse and ontology evolution, and didn't force imports except in my "integrating" ontology. And, a different application could bring in its own definition of Location and create its own "integrating" ontology.
But, what happens if you can't find a Location ontology that does everything that you need? You can still integrate/reuse other work, perhaps defined in your integrating ontology as subclasses of the (under-specified) PersonNamespace:Location concept.
Wednesday, January 29, 2014
Reuse of ontology and model concepts
But ... I know that you are busy. So, here are some take-aways from my talk:
- What were the candidates for reuse? There were actually several ontologies and models that were looked at (and I will talk about them in later posts), but this talk was about two specific standards: ISO 15926 for the process industry, and FIBO for the financial industry.
- Why did we reuse since there was not perfect overlap of the chosen domain models/ontologies and network management? Because there was good thought and insight put into the standards, and there also was tooling developed that we want to reuse. Besides that, we have limited time and money - so jump starting the development was "a good thing".
- Did we find valuable concepts to reuse? Definitely. Details are in the talk but two examples are:
- Defining individuals as possible versus actual. For anyone that worries about network and capacity planning, inventory management, or staging of new equipment, the distinction between what you have now, what you will have, and what you might have is really important.
- Ontology annotation properties. Documentation of definitions, sources of information, keywords, notes, etc. are extremely valuable to understand semantics. I have rarely seen good documentation in an ontology itself (it might be done in a specification that goes with the ontology). The properties defined and used in FIBO were impressive.
- Was reuse easy? Not really. It was difficult to pull apart sets of distinct concepts in ISO 15926, although we should have (and will do) more with templates in the future. Also, use of OWL was a mapping from the original definition, which made it far less "natural"/native. FIBO was much more modular and defined in OWL. But due to ontology imports, we pretty much ended up loading and working through the complete foundational ontology.
Given all this, what are some suggestions for getting more reuse?
- Create and publish more discrete, easily understood "modules" that:
- Define a maximum of 12-15 core entities with their relationships (12-15 items is about the limit of what people can visually retain)
- Document the assumptions made in the development (where perhaps short cuts were made, or could be made)
- Capture the axioms (rules) that apply separately from the core entities (this could allow adjustments to the axioms or assumptions for different domains or problem spaces, without invalidating the core concepts and their semantics)
- Encourage evolution and different renderings of the entities and relationships (for example, with and without short cuts)
- Focus on "necessary and sufficient" semantics when defining the core entities in a module and leave some things under-specified
- Don't completely define everything just because it touches your semantics (admittedly, you have to bring all the necessary semantics together to create a complete model or ontology, but more on that in the next post)
- A contrived example is that physical hardware is located somewhere in time and space, but it is unlikely that everyone's requirements for spatial and temporal information will be consistent. So, relate your Hardware entity to a Location and leave it at that. Let another module (or set of modules) handle the idiosyncrasies of Location.
Tuesday, January 21, 2014
Semantic Technologies and Ontologies Overview Presentation
[Disclaimer] The presentation is pretty basic ...
But it seemed to work. It overviews key terms (like the "o-word", ontology :-) and standards (based on the ever popular, semantic "layer cake" image). In looking over the deck, I see that I should have talked about RIF (Rule Interchange Format). But, I was using SWRL at the time, and so gravitated to that. (My apologies for not being complete.)
Since the talk was meant to show that semantic technologies are not just an academic exercise, I spent most of the time highlighting how and where the technologies are used. IMHO, I think that the major uses are:
- Semantic search and query expansion
- Mapping and merging of data
- Knowledge management
There are also quite a few examples of real companies using ontologies and semantic technologies. It is kind of amazing when you look at what is being done.
So, take a look and let me know what you think.
And, as a teaser, I want to highlight that I will be presenting at the next Ontology Summit 2014 session on Thursday, January 23rd, on "Reuse of Content from ISO 15926 and FIBO". If you want to listen in, the details for the conference call are here. Hopefully, you can join in.
Andrea
Friday, March 18, 2011
NIST and Access Control
I ran across an excellent paper from NIST (the US's National Institute of Standards and Technology), A Survey of Access Control Methods. The document is a component of the publication, "A Report on the Privilege (Access) Management Workshop". I highly recommend reading it, since the security landscape is evolving ... as the technology, online information, regulations/legislation, and "need to share" requirements of a modern, agile enterprise keep expanding.
Access control is discussed from the hard-core (and painfully detailed) ACL approach (access control lists) all the way through policy and risk-adaptive control (PBAC and RAdAC). Here is a useful image from the document, showing the evolution:
Reading the paper triggered some visceral reactions, on my part ... For example, I strongly feel that role-based access control is no longer adequate for the real-world. Yet, it is where most of us live today.
The problem is the need for agility. The world is no longer only about restricting access to specific, known-in-advance entities using a one-size-fits-all-conditions analysis ("need to protect" with predefined roles) - but also about granting the maximum access to information that is allowed ("need to share" considering the conditions under which sharing occurs).
Here are some examples ... Firefighters need the maximum data about the location and conditions of a fire that they can legally obtain (see my previous post, Using the Semantic Web to Fight Fires). Law enforcement personnel, at the federal, state or local levels, need all the data about suspicious activities that can be legally shared. An information worker needs to see and analyze all relevant data that is permitted (legally and within the corporate guidelines). *The word, "legally", comes up a lot here ... more on that in another post.
So, how do you accomplish this with simple roles? You can certainly build new roles that take various situational attributes into account. But how far can you go with this approach? At some point, the number of roles (variations on a theme) spirals out of control. You really need attribute based control. As the NIST paper points out, with attributes, you don't need to know all the requesters in advance. You just need to know about the conditions of the access.
But, simply adding attribute data (data about the information being accessed, the entity accessing it, the environment where the access occurs or is needed, ...) can get quite complex. The real problem is figuring out how to harmonize and evaluate the attribute information if it is accessed from several data stores or infrastructures. Then, closely associated with that problem is the need to be consistent across an enterprise - to not allow access (under the same conditions) through one infrastructure that is disallowed by another.
Policy-based access control, the next concept in the evolution, starts to address some of these concerns. NIST describes PBAC as "a harmonization and standardization of the ABAC model at an enterprise level in support of specific governance objectives." It concerns the creation and administration of organization-wide rule sets (policies) for access control, using attribute criteria that are also semantically consistent across the enterprise.
Wow, reading that last sentence made my head hurt. :-) Let me decompose the concepts. For policy-based access control to really work, we need (IMHO, in order of implementation):
- A well defined (dare I say "standard") policy/rule structure
- A well understood vocabulary for the actors, resources and attributes
- Ability to use #1 and #2 to define access control rules
- Ability to analyze the rules for consistency and completeness
- An infrastructure to support the evaluation and enforcement of the rules (at least by transforming between local data stores and infrastructures, and the well understood and defined vocabulary and policies/rules)
Some day, we will have best practices and standards for #1 and #2. Even better, we could have government-blessed renderings of the standard legislation (SOX, HIPAA, ...) using #1 and #2.
Can NIST also help with these activities? I hope that it can. In the meantime, there are some technologies like Semantic Web that can help.
As you can imagine, I have lots more things to discuss about the specifics of PBAC and RAdAC, in my next posts.
Andrea
Monday, June 8, 2009
PriceWaterhouseCoopers Spring Technology Forecast (Part 3)
The article includes a great quote on the information problem, why today's approaches (even metadata) are not enough, and the uses of Semantic Web technologies ... "Think of Linked Data as a type of database join that relies on contextual rules and pattern matching, not strict preset matches. As a user looks to mash up information from varied sources, Linked Data tools identify the semantics and ontologies to help the user fit the pieces together in the context of the exploration. ... Many organizations already recognize the importance of standards for metadata. What many don’t understand is that working to standardize metadata without an ontology is like teaching children to read without a dictionary. Using ontologies to organize the semantic rationalization of the data that flow between business partners is a process improvement over electronic data interchange (EDI) rationalization because it focuses on concepts and metadata, not individual data elements, such as columns in a relational database management system. The ontological approach also keeps the CIO’s office from being dragged into business-unit technical details and squabbling about terms. And linking your ontology to a business partner’s ontology exposes the context semantics that data definitions lack." PwC suggests taking 2 (non-exclusive) approaches to "explore" the Semantic Web and Linked Data:
- Add the dimension of semantics and ontologies to existing, internal data warehouses and data stores
- Provide tools to help users get at both internal and external Linked Data
Wednesday, June 3, 2009
PriceWaterhouseCoopers Spring Technology Forecast (Part 2)
The second featured article is Making Semantic Web connections. It discusses the business value of using Linked Data, and includes interesting information from a CEO survey about information gaps (and how the Semantic Web can address these gaps). The article argues that to get adequate information, the business must better utilize its own internal data, as well as data from external sources (such as information from members of the business' ecosystem or the Web). This is depicted in the following two figures from the article ...
I also want to include some quotes from the article - especially since they support what I said in an earlier blog from my days at Microsoft, Question on what "policy-based business" means ... :-)
- Data aren’t created in a vacuum. Data are created or acquired as part of the business processes that define an enterprise. And business processes are driven by the enterprise business model and business strategy, goals, and objectives. These are expressed in natural language, which can be descriptive and persuasive but also can create ambiguities. The nomenclature comprising
- ... the natural language used to describe the business, to design and execute business processes, and to define data elements is often left out of enterprise discussions of performance management and performance improvement.
- ... ontologies can become a vehicle for the deeper collaboration that needs to occur between business units and IT departments. In fact, the success of Linked Data within a business context will depend on the involvement of the business units. The people in the business units are the best people to describe the domain ontology they’re responsible for.
- Traditional integration methods manage the data problem one piece at a time. It is expensive, prone to error, and doesn’t scale. Metadata management gets companies partway there by exploring the definitions, but it still doesn’t reach the level of shared semantics defined in the context of the extended virtual enterprise. Linked Data offers the most value. It creates a context that allows companies to compare their semantics, to decide where to agree on semantics, and to select where to retain distinctive semantics because it creates competitive advantage.
And, yes, I did say something similar to this in an earlier post on Semantic Web and Business . (Thumbs up :-)
Tuesday, June 2, 2009
PriceWaterhouseCoopers Spring Technology Forecast (Part 1)
Spinning a data Web overviewed the technologies of the Semantic Web, and discussed how businesses can benefit from developing domain ontologies and then mediating/integrating/querying them across both internal and external data. The value of mediation is summarized in the following figure ...
I like this, since I said something similar in my post on the Semantic Web and Business.
Backing up this thesis, Tom Scott of BBC Earth provided a supporting quote in his interview, Traversing the Giant Global Graph. "... when you start getting either very large volumes or very heterogeneous data sets, then for all intents and purposes, it is impossible for any one person to try to structure that information. It just becomes too big a problem. For one, you don’t have the domain knowledge to do that job. It’s intellectually too difficult. But you can say to each domain expert, model your domain of knowledge— the ontology—and publish the model in the way that both users and machine can interface with it. Once you do that, then you need a way to manage the shared vocabulary by which you describe things, so that when I say “chair,” you know what I mean. When you do that, then you have a way in which enterprises can join this information, without any one person being responsible for the entire model. After this is in place, anyone else can come across that information and follow the graph to extract the data they’re interested in. And that seems to me to be a sane, sensible, central way of handling it."
Monday, May 11, 2009
Going to School - Knowledge Management Style
The three categories for capturing and sharing knowledge are:
- Technocratic - involved with tooling and the use of technology for knowledge management
- Economic - relating knowledge and income
- Behavioral -dealing with how to organize to facilitate knowledge capture and exchange
Within each of the categories, Earl posited that there are "schools" or focuses for knowledge management. Earl's seven schools are listed below (with some short descriptions):
- Systems - Part of the technocratic category, focusing on the use of technology and the storing of explicit knowledge in databases and various systems and repositories. The knowledge is typically organized by domain.
- Cartographic - Part of the technocratic category, focusing on who the "experts" are, in a company, and how to find and contact them. So, instead of explicit captured knowledge, the tacit knowledge held by individuals is paramount.
- Engineering - Part of the technocratic category, focusing on capturing and sharing knowledge for process improvement. In addition, the details and outputs of various processes and knowledge flows are captured. The knowledge in this school is organized by activities with the goal of business process improvement.
- Commercial - This is the only "economic" school and focuses on knowledge as a commercial asset. The emphasis is on income, which can be achieved in various ways ... such as limiting access to knowledge, based on payments or other exchanges, or rigorously managing a company's intellectual portfolio (individual know-how, patents, trademarks, etc.).
- Organizational - Part of the behavioral category, focusing on building and enabling knowledge-sharing networks and communities of practice, for some business purpose. Earl defines it as a behavioral school "because the essential feature of communities is that they exchange and share knowledge interactively, often in nonroutine, personal, and unstructured ways". For those not familiar with the term "community of practice", it is defined by Etienne Wenger as “groups of people who share a concern or a passion for something they do and learn how to do it better as they interact regularly.”
- Spatial - Part of the behavioral category, focusing on how space is used to facilitate socialization and the exchange of knowledge. This can be achieved by how office buildings are arranged, co-locating individuals working on the same project, etc.
- Strategic - Part of the behavioral category, focusing on knowledge (according to Earl) as "the essence of a firm's strategy ... The aim is to build, nurture, and fully exploit knowledge assets through systems, processes, and people and convert them into value as knowledge-based products and services." This may seem like the strategic school rolls all the others into it, and it does. But, what distinguishes it, again according to Earl, "is that knowledge or intellectual capital are viewed as the key resource."
And, how do you do this? Via capturing, publishing and mapping each business group's/community's vocabularies (ontologies) and processes, and understanding that community's organizational structure.
Wednesday, April 15, 2009
"Top Down" or "Bottom Up" Ontologies
What is a possible answer? Take the local, private and community ontologies of your business and map them "up" to an existing "standardized ontology" - such as exists in medicine or even construction - see, for example, ISO 15926. (I already discussed the possibilities of ontology alignment provided by the Semantic Web in earlier posts, and will provide more details over the next few weeks.)
Or, if a standard ontology does not exist, create one from the local ontologies by mapping the local ones to one or more "upper" ontologies. At this point, some people will say "ughhh" another term - "upper" ontology - what the heck is that? Upper ontologies capture very general and reusable terms and definitions. Two examples that are both interesting and useful are:
- SUMO (http://www.ontologyportal.org), the Suggested Upper Merged Ontology - SUMO incorporates much knowledge and broad content from a variety of sources. Its downside is that it is not directly importable into the Semantic Web infrastructure, as it is written in a different syntax (something called KIF). Its upsides are its vast, general coverage, its public domain IEEE licensing, and the many domain ontologies defined to extend it.
- Proton (http://proton.semanticweb.org/D1_8_1.pdf), PROTo ONtology - PROTON takes a totally different approach to its ontology definition. Instead of theoretical analysis and hand-creation of the ontology, PROTON was derived from a corpus of general news sources, and hence addresses modern day, political, financial and sports concepts. It is encoded in OWL (OWL-Lite to be precise) for Semantic Web use, and was defined as part of the European Union's SEKT (Semantically Enabled Knowledge Technologies) project, http://www.sekt-project.com. (I will definitely be blogging more about SEKT in future posts. There is much interesting work there!)
Thursday, April 9, 2009
Semantic Web and Business (Part 3)
- Concept = class = noun = vocabulary word
- Triple = subject-predicate-object (such as "John went to the library" - where "John" is the subject, "went-to" is the predicate, and "library" is the object)
- Role = relation = association = the predicate in the triple = verb
- Instance = a specific occurrence of a concept or relationship (can be manually defined or inferred)
- Axiom = a statement of fact/truth that is taken for granted (i.e., is not proved)
- Inference = deriving a logical conclusion from definitions and axioms
- T-Box = a set of concepts and relationships (i.e., the definitions)
- A-Box = a set of instances of the concepts and relationships
- Hierarchy = arrangement of concepts or instances by some kind of classification/relationship mechanism - typical classification hierarchies are by type ("is-a" relationships - for example, "a tiger is a mammal") or by composition ("has-a" relationships - for example, "a person's name has the strucutre: personal or first name, zero or more middle names, and surname or last name")
- Subsumption = is-a classification (determining the ordering of more general to more specific categories/concepts)
- Consistency analysis = check to see that all specific instances make sense given the definitions, rules and axioms of an ontology
- Satisfiability analysis = check to see that an instance of a concept can be created (i.e., that creating an instance will not produce an inconsistency/error)
- Key = one or more properties that uniquely identify an individual instance of a concept/class
- Monothetic classification = identifying a particular instance with a single key
- Polythetic classification = identifying a particular instance by several possible keys which may not all exist for that instance
- Surrogate key = an artificial key
- Natural key = a key that has semantic meaning
- CWA = Closed World Assumption (in databases) = anything not explicitly known to be true is assumed to be false (for example, if you know that John is the son of Mary but have a total of 3 children defined - John, Sue and Albert - and you ask who all the children of Mary are ... you get the answer "John" - 1 child)
- OWA = Open World Assumption (in semantic computing) = anything not explicitly known is assumed to be true (using the same scenario above, asking the same question ... you get the answer "John, Sue and Albert" - 3 children)
Semantic Web and Business (Part 2)
A description-logic reasoner (DL reasoner) takes concepts, individual instances of those concepts, roles (relationships between concepts and individuals) and sometimes constraints and rules - and then "reasons" over them to find inconsistencies (errors), infer new information, and determine classifications and hierarchies. Some basic relationships that are always present come from first-order logic - like intersections, unions, negations, etc. These are explicitly formalized in languages like OWL.
The reasoner that I am now using is Pellet from Clark and Parsia (http://clarkparsia.com/pellet/ ). It is integrated with Protege (which I mentioned in an earlier post), but also operates standalone. The nice thing is that Pellet has both open-source and commercial licenses to accomodate any business model - and is doing some very cool research on data validation and probabilistic reasoning (which you can read about on their blog, http://clarkparsia.com/weblog/).
How cool is it when you can get a program to tell you when your vocabulary is inconsistent or incomplete? Or, when a program can infer new knowledge for you, when you align two different vocabularies and then reason over the whole? No more relying on humans and test cases to spot all the errors!
Wednesday, April 8, 2009
Semantic Web and Business (Part 1)
Typically, you hear about semantic web as a way for computers to understand and operate over the data on the web, and not just exchange it via (mostly XML-based) syntaxes. However, to "understand" something, you must speak a common language and then have insight into the vocabulary and concepts used in that language. Well, the semantic web languages exist - they are standards like RDF (Resource Description Language ), RDF-S (RDF Schema ), and OWL (Web Ontology Language) . These syntaxes carry the details of the concepts, terms and relationships of the vocabulary. (Note that I provided only basic links to the specifications here. There is much more detail available!)
One problem is defining the syntax - and we are getting there via the work of the W3C. The next problem is getting agreement about the vocabulary. That is much harder - since every group has their own ideas about what the vocabulary should be. So, here again, the Semantic Web steps in. Semantic Web proponents are not just researching how to define and analyze vocabularies (you could also use the word, "ontology", here) - but how to merge and align them!
So, where does this intersect with business? Businesses have lots of implicit vocabularies/ontologies (for example, belonging to procurement, accounts payable, specific domain technologies integral to the organization, IT and other groups). And, business processes and data flows cross groups and therefore, cross vocabularies - and this leads to errors! Typically, lots of them!
Does this mean that everyone adopt a single vocabulary? Usually that is not even possible ... People who have learned a vocabulary and use it to mean very specific things, cannot easily change to use a new, different word. Another problem is agreeing on what a term means - like "customer" (is that the entity that pays for something, an end-user, or some other variant on this theme?).
Changing words will cause a slow down in the operations of the business due to the need to argue over terminology and representation. Then if a standard vocabulary is ever in place, there will be slowdowns and errors as people try to work the new vocabulary into their practices and processes. (BTW, I think that this is one reason that "standard" common models or a single enterprise information model are so difficult to achieve.)
How do we get around this? Enter the Semantic Web to help with the alignment of vocabularies/ontologies. But, first the vocabularies have to be captured. Certainly, no one expects people to write RDF, RDF-S or OWL. But, we all can write our natural languages - and that takes us back to "controlled languages" as I discussed in my previous post. I have a lot of ideas on how to achieve this ... but, this will come in later posts.
So, more on this in later weeks, but hopefully this post provides some reasons to be interested in the semantic web (more than just its benefits to search) ...