Observations on ontologies, semantic technologies and life

Wednesday, June 10, 2009

Types of Models - For Business Versus For IT

I was looking through my notes about articles that I had read - and found an interesting Burton Group report entitled Generalized and Detailed Data Models: Seeking the Best of Both Worlds. (I think that it was published earlier this year.) I must admit to having been both confused and intrigued by the title. :-)

In the paper, "generalized" models are those used to define database/storage structures and to find the general themes and fundamental aspects of the data (and its values). In short, they are the data models defined by IT to effectively and efficiently use the technologies that are in place (like SQL databases). Maybe "reduced" is a better word than "generalized" ...

On the other hand, "detailed" models are those that are useful to business people. They define and describe the information requirements of the business, and its vocabularies, rules and processes. They hold the details from the business perspective. Again, maybe another word like "conceptual" is better (since even the "generalized" models hold "details") ...

What is valuable is not the titles used for these models but their semantics. :-) The key message is that a business needs both types of models and they need to stay in sync. This is really important. The conceptual/detailed models hold the real business requirements and language. They haven't been reduced to basic data values whose semantics are lost in the technology used to define and declare them.

IMHO, a business loses information and knowledge when it only retains and works from the IT models. There is much to be gleaned from the business input and much value in keeping the business people engaged in the work. This is almost impossible once you reduce the business requirements to technology-speak.

As the report says, "do not allow generalized models to compromise your understanding of the business."

Posted by OntoInsights, LLC at 11:22 AM 2 comments:

Labels: business vocabularies, data models, knowledge

Monday, June 8, 2009

PriceWaterhouseCoopers Spring Technology Forecast (Part 3)

This is the last in a series of posts summarizing the PriceWaterhouseCooper Spring Technology Forecast. I spent a lot of time on the report, since it highlights many important concepts about the Semantic Web and business.

The last featured article in the report is entitled 'A CIO's strategy for rethinking "messy BI"'. The recommendation is to use Linked Data to bring together internal and external information - to help with the "information problem". How does PwC define the "information problem"? As follows ... "there's no way traditional information systems can handle all the sources [of data], many of which are structured differently or not structured at all." The recommendation boils down to creating a shared or upper ontology for information mediation, and then using it for analysis, for helping to create a business ecosystem, and to harmonize business logic and operating models. The two figures below illustrate these concepts.

The article includes a great quote on the information problem, why today's approaches (even metadata) are not enough, and the uses of Semantic Web technologies ... "Think of Linked Data as a type of database join that relies on contextual rules and pattern matching, not strict preset matches. As a user looks to mash up information from varied sources, Linked Data tools identify the semantics and ontologies to help the user fit the pieces together in the context of the exploration. ... Many organizations already recognize the importance of standards for metadata. What many don’t understand is that working to standardize metadata without an ontology is like teaching children to read without a dictionary. Using ontologies to organize the semantic rationalization of the data that flow between business partners is a process improvement over electronic data interchange (EDI) rationalization because it focuses on concepts and metadata, not individual data elements, such as columns in a relational database management system. The ontological approach also keeps the CIO’s office from being dragged into business-unit technical details and squabbling about terms. And linking your ontology to a business partner’s ontology exposes the context semantics that data definitions lack."

PwC suggests taking 2 (non-exclusive) approaches to "explore" the Semantic Web and Linked Data:

Add the dimension of semantics and ontologies to existing, internal data warehouses and data stores
Provide tools to help users get at both internal and external Linked Data

And, as with the previous posts, I want to finish with a quote from one of the interviews in the report. This quote comes from Frank Chum of Chevron, and discusses why they are now looking to the Semantic Web and ontologies to advance their business. "Four things are going on here. First, the Semantic Web lets you be more expressive in the business logic, to add more contextual meaning. Second, it lets you be more flexible, so that you don’t have to have everything fully specified before you start building. Then, third, it allows you to do inferencing, so that you can perform discovery on the basis of rules and axioms. Fourth, it improves the interoperability of systems, which allows you to share across the spectrum of the business ecosystem. With all of these, the Semantic Web becomes a very significant piece of technology so that we can probably solve some of the problems we couldn’t solve before. One could consider these enhanced capabilities [from Semantic Web technology] as a “souped up” BI [business intelligence]."

Posted by OntoInsights, LLC at 2:46 PM No comments:

Labels: business query, linked data, ontologies, semantic web, upper ontologies

Wednesday, June 3, 2009

PriceWaterhouseCoopers Spring Technology Forecast (Part 2)

This post continues the review and summarization of PwC's Spring Technology Forecast , focused on the Semantic Web.

The second featured article is Making Semantic Web connections. It discusses the business value of using Linked Data, and includes interesting information from a CEO survey about information gaps (and how the Semantic Web can address these gaps). The article argues that to get adequate information, the business must better utilize its own internal data, as well as data from external sources (such as information from members of the business' ecosystem or the Web). This is depicted in the following two figures from the article ...

I also want to include some quotes from the article - especially since they support what I said in an earlier blog from my days at Microsoft, Question on what "policy-based business" means ... :-)

Data aren’t created in a vacuum. Data are created or acquired as part of the business processes that define an enterprise. And business processes are driven by the enterprise business model and business strategy, goals, and objectives. These are expressed in natural language, which can be descriptive and persuasive but also can create ambiguities. The nomenclature comprising
... the natural language used to describe the business, to design and execute business processes, and to define data elements is often left out of enterprise discussions of performance management and performance improvement.
... ontologies can become a vehicle for the deeper collaboration that needs to occur between business units and IT departments. In fact, the success of Linked Data within a business context will depend on the involvement of the business units. The people in the business units are the best people to describe the domain ontology they’re responsible for.
Traditional integration methods manage the data problem one piece at a time. It is expensive, prone to error, and doesn’t scale. Metadata management gets companies partway there by exploring the definitions, but it still doesn’t reach the level of shared semantics defined in the context of the extended virtual enterprise. Linked Data offers the most value. It creates a context that allows companies to compare their semantics, to decide where to agree on semantics, and to select where to retain distinctive semantics because it creates competitive advantage.

As in my last post, I want to reinforce the message and include a quote from one of the interviews. This one comes from Uche Ogbuji of Zepheira ... "... it’s not a matter of top down. It’s modeling from the bottom up. The method is that you want to record as much agreement as you can. You also record the disagreements, but you let them go as long as they’re recorded. You don’t try to hammer them down. In traditional modeling, global consistency of the model is paramount. The semantic technology idea turns that completely on its head, and basically the idea is that global consistency would be great. Everyone would love that, but the reality is that there’s not even global consistency in what people are carrying around in their brains, so there’s no way that that’s going to reflect into the computer. You’re always going to have difficulties and mismatches, and, again, it will turn into a war, because people will realize the political weight of the decisions that are being made. There’s no scope for disagreement in the traditional top-down model. With the bottom-up modeling approach you still have the disagreements, but what you do is you record them."

And, yes, I did say something similar to this in an earlier post on Semantic Web and Business . (Thumbs up :-)

Posted by OntoInsights, LLC at 12:24 PM No comments:

Labels: business query, linked data, ontologies, semantic web

Tuesday, June 2, 2009

PriceWaterhouseCoopers Spring Technology Forecast (Part 1)

In an earlier post, I mentioned PriceWaterhouseCoopers' spring technology forecast and its discussion of the Semantic Web in business. In this and the following post, I want to overview and highlight several of the articles. Let's start with the first featured article ...

Spinning a data Web overviewed the technologies of the Semantic Web, and discussed how businesses can benefit from developing domain ontologies and then mediating/integrating/querying them across both internal and external data. The value of mediation is summarized in the following figure ...

I like this, since I said something similar in my post on the Semantic Web and Business.

Backing up this thesis, Tom Scott of BBC Earth provided a supporting quote in his interview, Traversing the Giant Global Graph. "... when you start getting either very large volumes or very heterogeneous data sets, then for all intents and purposes, it is impossible for any one person to try to structure that information. It just becomes too big a problem. For one, you don’t have the domain knowledge to do that job. It’s intellectually too difficult. But you can say to each domain expert, model your domain of knowledge— the ontology—and publish the model in the way that both users and machine can interface with it. Once you do that, then you need a way to manage the shared vocabulary by which you describe things, so that when I say “chair,” you know what I mean. When you do that, then you have a way in which enterprises can join this information, without any one person being responsible for the entire model. After this is in place, anyone else can come across that information and follow the graph to extract the data they’re interested in. And that seems to me to be a sane, sensible, central way of handling it."

Posted by OntoInsights, LLC at 10:45 PM No comments:

Labels: business query, linked data, ontologies, semantic web

Sunday, May 31, 2009

The Semantic Web in 3 Words

My husband asked me to explain the Semantic Web in three words (because I was going on about the web and my ideas) ... So, here they are:

Data
Linkages
Infrastructure

And now, I get to use more than 3 words :-).

Data is usually meta-data (data about data) - what a document is about, additional information like who the author is, etc. But, it can also be the raw information - like a business vocabulary.

Linkages are the relationships between the data. The information that ties the data together and lets you infer and extrapolate.

Infrastructure is the formalisms of the languages (RDF, RDF Schema, OWL, SPARQL, ...) and the services that are already provided (W3C's Linked Data, Protege, Pellet, ...). Data without backing services and formalisms means that you have to create everything yourself and there is no exponential building of knowledge that comes from sharing the data.

That's it. Let me know if you agree with my 3 words or have different ones.

Posted by OntoInsights, LLC at 9:34 AM 2 comments:

Labels: semantic web

Friday, May 29, 2009

Continuing on the topic of the Web of Data (aka Linked Data)

There is lots being published about Linked Data. I just saw that the Spring 2009 PriceWaterhouseCooper technology forecast is full of data Web and Semantic web coolness. But, before I jump into the forecast, I would like to give some background on the Linked Data work that is happening in the industry today.

Linking Open Data (LOD) is a W3C project. According to their web site, "The goal of the W3C SWEO Linking Open Data community project is to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting RDF links between data items from different data sources.RDF links enable you to navigate from a data item within one data source to related data items within other sources using a Semantic Web browser. RDF links can also be followed by the crawlers of Semantic Web search engines, which may provide sophisticated search and query capabilities over crawled data. As query results are structured data and not just links to HTML pages, they can be used within other applications. ... Collectively, the data sets consist of over 4.7 billion RDF triples, which are interlinked by around 142 million RDF links (May 2009)."

Here is the LOD figure showing what is linked today (actually March 2009):

Just to get a feel for what is included ... let me note that DBpedia (the bigger circle in the left center of the image) provides structured access to Wikipedia's human-oriented data (actually, it provides a SPARQL interface). According to DBpedia's web site , "The DBpedia knowledge base currently describes more than 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples). It features labels and short abstracts for these things in 30 different languages; 609,000 links to images and 3,150,000 links to external web pages; 4,878,100 external links into other RDF datasets, 415,000 Wikipedia categories, and 75,000 YAGO categories. The DBpedia knowledge base has several advantages over existing knowledge bases: it covers many domains; it represents real community agreement; it automatically evolve as Wikipedia changes, and it is truly multilingual. The DBpedia knowledge base allows you to ask quite surprising queries against Wikipedia, for instance “Give me all cities in New Jersey with more than 10,000 inhabitants” or “Give me all Italian musicians from the 18th century”. Altogether, the use cases of the DBpedia knowledge base are widespread and range from enterprise knowledge management, over Web search to revolutionizing Wikipedia search."

Going back to Tim Berners-Lee's request for us to imagine what it would be like to have people load and connect knowledge, let's imagine what all this data can do for a business and its decision making processes ....

Posted by OntoInsights, LLC at 2:47 PM No comments:

Labels: linked data, semantic web, Web of Data

Tuesday, May 26, 2009

Web 3.0 and the Web of Data

Web 3.0 is coming up (a lot) in posts on Read-Write Web and in other places. One Read-Write Web posting (The Web of Data, written by Alexander Korth in April of this year) discussed the 3 aspects of the next web (Web 3.0) ... "In the coming years, we will see a revolution in the ability of machines to access, process, and apply information. This revolution will emerge from three distinct areas of activity connected to the Semantic Web: the Web of Data, the Web of Services, and the Web of Identity providers. These webs aim to make semantic knowledge of data accessible, semantic services available and connectable, and semantic knowledge of individuals processable ...".

Tim Berners-Lee focused on the Web of Data in his TED talk on the next Web (recorded in Feb 2009). The talk is only a little longer than 15 minutes in length, and I highly recommend it. The key points are that we are now moving from a document-centric approach to storing information, to making raw data available and processable. That raw data is "linked data" - data about things (identified by URIs), including other interesting information (as RDF triples) and highlighting the relationships between the things. It is important to note that this is not about making data available through specific APIs or anticipated/pre-programmed queries on a "pretty" web site - but about making the "unadulterated data" available for machine understanding and new uses. It is about sharing and adding to data, making connections and relationships in novel ways, and bridging disciplines.

If you think about business and an enterprise, think about how powerful this would be - to capture knowledge, share it via social networking technologies, allow update and addition to the knowledge within the enterprise (again using the social networking tools of today), and to bridge disciplines and knowledge using the Semantic web mining and matching technologies. Overall, we improve the ability of the enterprise to capture and access its knowledge, and increase the captured knowledge. In the talk, Tim Berners-Lee asks people to imagine the "incredible resource" of "people doing their bit to produce a little bit, and it all connecting."

Just imagine ....

Posted by OntoInsights, LLC at 11:42 AM No comments:

Labels: knowledge, linked data, semantic web, Web 3.0, Web of Data