Dear Ed, (01)
You wrote:
My point about data discovery is
that the data it operates on was originally
captured and organized for useful purposes that
just happen to be different from the purpose of
the discovery exercise. That data was not
captured just because we thought that at some time
in the future it might be valuable to have it.
(That is why I called the concern a 'paradigm for
the acquisition mindset.') (02)
Yes, typically the captured data was purely for
operational and accounting purposes in most cases,
and the RDB layout was usually designed for the
sole purpose of handling the throughput of the
processing systems available. That is what I mean
by the "complexity of modern systems". (03)
But a typical use case is that the business wants
to understand customer purchases, purchase rates,
purchases at other businesses (through sharing of
data about customers [forget about privacy] based
on Driver's ID or SSN numbers as identifying
parts. Those items were not usually intended to
be mined later, but they are later found useful in
principal and therefore mined. (04)
Thus most really big data is based on more than
one database source, though most of the data may
be from one source. But Hans' point was that
there are all kinds of unsuspected, even
unimagined correlations among data entities - not
the entities in the data model, but those
mentioned in the columns as data. (05)
I understand your concern though, it's just I
wanted to set the archive records straight about
the mining that can be, and very often is now,
applied to the larger business picture. And with
great profit by the way, according to reports from
users. (06)
-Rich (07)
Sincerely,
Rich Cooper
EnglishLogicKernel.com
Rich AT EnglishLogicKernel DOT com
9 4 9 \ 5 2 5 - 5 7 1 2
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On
Behalf Of Barkmeyer, Edward J
Sent: Wednesday, September 03, 2014 3:17 PM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] FW: FW: Looking to
the Future of Data Science - NYTimes.com -
2014年08月27日 (08)
Rich, (09)
Lest there be any further confusion, I was talking
about XML as the data store form, not as the
transmission form. The purpose of XQuery is not
to be a query language for messages in the
transmission form. And yes, I should have said
"(XML, XQuery) databases", or perhaps hyphenated
the term, so that it would be clear that there
were only two items in the list. (010)
My point about data discovery is that the data it
operates on was originally captured and organized
for useful purposes that just happen to be
different from the purpose of the discovery
exercise. That data was not captured just because
we thought that at some time in the future it
might be valuable to have it. (That is why I
called the concern a 'paradigm for the acquisition
mindset.') The information that is there was not
"obscured due to the complexity of typical
systems"; it was obscured by not being a focus of
interest at the time the data was captured. (011)
And OBTW, you won't discover anything if you don't
inject the integrating ontology/schema for the new
knowledge you want to extract, and in most such
papers that I have seen, you also have to inject
the schema mapping, one way or another. The good
ones allow interesting functions in the mapping.
If anything, the process for 'discovering' the
information is technically more complex than the
process of storing it was. (012)
-Ed (013)
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On
Behalf Of Rich Cooper
Sent: Tuesday, September 02, 2014 10:50 PM
To: '[ontolog-forum] '
Subject: [ontolog-forum] FW: FW: Looking to the
Future of Data Science - NYTimes.com - 2014年08月27日 (014)
Hans Polzer describes more cogently than I did why
the data model (schema, what nomenclature have you
for) does NOT represent all the information to be
discovered. His post is below, (015)
-Rich (016)
Sincerely,
Rich Cooper
EnglishLogicKernel.com
Rich AT EnglishLogicKernel DOT com
9 4 9 \ 5 2 5 - 5 7 1 2
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On
Behalf Of Rich Cooper
Sent: Tuesday, September 02, 2014 1:03 PM
To: '[ontolog-forum] '
Subject: Re: [ontolog-forum] FW: Looking to the
Future of Data Science - NYTimes.com - 2014年08月27日 (017)
EJB:> What is wanted is not a
paradigm shift in processing technology - the last
two paradigm shifts got us XML databases and
XQuery and RDF triple stores, both of which are
clumsy repositories that just make the Big Data
problem more expensive.
You state three items, "both of which" are
clumsy. Actually, the first item, XML, has been a
very useful method for communicating within N-tier
systems. It has great value there but is usually
converted into the tables, columns and domains of
RDBs where the info gets stored. So XML is not a
problem for most systems. There are even free XML
parsers which have been packaged as components for
programmers to call so they don't have to do the
parsing themselves. It has been very, very useful
for multiple system interchanges of data.
EJB:> What is wanted (as Michael
Brunnbauer hinted) is a paradigm shift in data
acquisition mindset. I will paraphrase some other
contribution to this exploder, which I have since
lost: "If you don't know what you have when you
get it, you will never know it later."
Wrong!!!! The whole point of discovery systems is
in recognizing new information that was in the
database, but which is obscured from the obvious
observers due to the complexity of typical systems
today. You don't know what it is in advance; you
can only discover it through analysis. (018)
The stuff that is already known to be in the
database can just be queried. But bringing out
the full range of relationships, which are NOT
KNOWN uniquely in the data model, can be found
through discovery processes. (019)
See
http://www.EnglishLogicKernel.com/ElkForPatents.ht
ml for an example of the kinds of things that can
be discovered from relational databases containing
both structured and unstructured columns, as in
the USPTO database of patents. (020)
-Rich (021)
Sincerely,
Rich Cooper
EnglishLogicKernel.com
Rich AT EnglishLogicKernel DOT com
9 4 9 \ 5 2 5 - 5 7 1 2
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On
Behalf Of Kingsley Idehen
Sent: Tuesday, September 02, 2014 10:34 AM
To: ontolog-forum@xxxxxxxxxxxxxxxx
Subject: Re: [ontolog-forum] FW: Looking to the
Future of Data Science - NYTimes.com - 2014年08月27日 (022)
On 9/2/14 11:25 AM, Barkmeyer, Edward J wrote:
I regret to say, I think this definition is about
buzzword maintenance. The idea is clearly: Big
Data is about inventing a new information
processing technology that will work better for
datasets that RDB technology just can't handle -
"a paradigm shift" in technology. (023)
What is wanted is not a paradigm shift in
processing technology - the last two paradigm
shifts got us XML databases and XQuery and RDF
triple stores, both of which are clumsy
repositories that just make the Big Data problem
more expensive. (024)
What is wanted (as Michael Brunnbauer hinted) is a
paradigm shift in data acquisition mindset. I
will paraphrase some other contribution to this
exploder, which I have since lost: "If you don't
know what you have when you get it, you will never
know it later." (025)
There is a big difference between large volumes of
data that must be maintained in order to perform a
particular set of business or governmental
functions and responsibilities, and large volumes
of data that are available and might enable some
analytical process that is at best desirable.
Amazingly enough, we have muddled through the
support of the former for 50 years with
established technologies and state of the art
computational resources, and newer technologies
have become established as the quality of the
implementations and the resources for supporting
them became able to carry the increasing load. We
have been able to do this by working around the
limitations to deliver satisfactory, if less than
ideal, services somehow. As John Sowa and others
have said, this is a recurring problem; it is not
a new problem. (026)
The problem we have is with our appetite. There
is so much information food out there that we
could surely find the taste treats for the most
discriminating palates if we could just search it
all fast enough. That is all very exciting, but
it is irrelevant to solving the problem of
delivering to everyone his daily information
bread. The problem is in focusing on what we need
to process, not what we would like to process.
The people who are concerned about data they need
to process in order to deliver adequate services
and products are experiencing the 2014 version of
the 1960 problem. The rest are just blowing Big
Data horns. (027)
The would-be ISO definition fails to say:
Big Data: a data set(s) with characteristics that
for *a required function* at a given point in time
cannot be efficiently processed using
current/existing/established/traditional
technologies and techniques in order to *provide
adequate support for that function*. (028)
It is not about an arbitrary "particular problem
domain" or being able to "extract [some perceived]
value". That is an academic view, and why we
have research institutions. (029)
-Ed (030)
Ed, (031)
Great addition to this evolving conversation.
Naturally, I've incorporated your comments into
the "Big Data" description that I am maintaining: (032)
[1]
http://linkeddata.uriburner.com/describe/?url=http
s%3A%2F%2Fplus.google.com%2Fs%2FBigData%23thisdist
inct=1 -- without the effect of owl:sameAs
relation reasoning and inference (033)
[2]
http://linkeddata.uriburner.com/describe/?url=http
s%3A%2F%2Fplus.google.com%2Fs%2FBigData%23this&sas
=yes&distinct=1 -- with the effect of owl:sameAs
relation semantics reasoning and inference (034)
[3]
https://plus.google.com/112399767740508618350/post
s/79nHeum5DQR -- how I am using G+ post based
nanotations to fit the pieces of this puzzle
together, as I encounter new and interesting
insights (035)
[4]
https://plus.google.com/112399767740508618350/post
s/MRsyNtqgTXz -- ditto in regards to comments by
John Sowa . (036)
Related: (037)
[1]
http://kidehen.blogspot.com/2014/07/nanotation.htm
l -- about Nanotation
[2]
https://twitter.com/kidehen/status/506813897043881
984 -- Tweet related to paradigm shift re. data
acquisition (i.e., RDF sentence based Nanotations
that fit into place where text exists) .
--
Regards, (038)
Kingsley Idehen
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2:
http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile:
https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile:
http://www.linkedin.com/in/kidehen
Personal WebID:
http://kingsley.idehen.net/dataspace/person/kidehe
n#this
<http://kingsley.idehen.net/dataspace/person/kideh
en> (039)
<<attachment: winmail.dat>>
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (01)