Go to previous page Go to next page
This chapter describes the new and updated features for interMedia Text, release 8.1.7.
The following topics are covered:
The following sections outline the new and updated features for interMedia Text, Release 8.1.7. These features are:
With previous releases of interMedia Text, you can create a text index of type CONTEXT. This type of index is suited for indexing document collections that contain large coherent documents.
For this release, you can create a new type of index called CTXCAT in addition to the CONTEXT indextype. This is a combined index on a text column and one or more other columns. It is also known as a catalog index.
The CTXCAT indextype is suited for indexing short text fragments, such as names, addresses and item descriptions that are stored in separate columns. This type of index offers better query performance for structured queries.
You query a CTXCAT index with the CATSEARCH operator in the WHERE clause of a SELECT statement.
Consider a table called AUCTION with the following schema:
create table auction(item_id number, title varchar2(100), category_id number, price number, bid_close date);
Assume that queries on the table involve a mandatory text query clause and optional structured conditions on category_id. Results must be sorted based on either bid_close, category_id, or price.
You can create a catalog index to support the different types of structured queries a user might enter.
To create the indexes, first create the index set preference then add the required indexes to it.
The following example creates the index set preference and adds five different indexes to it:
begin
ctx_ddl.create_index_set('auction_iset');
ctx_ddl.add_index('auction_iset','bid_close'); /* index A */
ctx_ddl.add_index('auction_iset','category_id, bid_close'); /* index B */
ctx_ddl.add_index('auction_iset','bid_close, category_id'); /* index C */
ctx_ddl.add_index('auction_iset','price, bid_close'); /* index D */
ctx_ddl.add_index('auction_iset','bid_close, price'); /* index E */
end;
Create the combined catalog index with CREATE INDEX as follows:
create index auction_titlex on AUCTION(title) indextype is CTXCAT parameters
('index set auction_iset');
To query the title column for the word camera, you can issue regular and mixed queries as follows:
select from AUCTION where CATSEARCH(title, 'camera', NULL)>0;
The following query uses index A:
select from AUCTION where CATSEARCH(title, 'camera', 'bid_close=20-FEB-2000')>0;
The following query uses index B:
select from AUCTION where CATSEARCH(title, 'camera', 'category_id=99 order by bid_close desc')>0;
The following query uses index C:
select from AUCTION where CATSEARCH(title, 'camera', 'bid_close=20-FEB-2000 order by category_id')>0;
The following query uses index D:
select from AUCTION where CATSEARCH(title, 'camera', 'price=200 order by bid_ close')>0;
The following query uses index E:
select from AUCTION where CATSEARCH(title, 'camera', 'bid_close=20-FEB-2000 order by price')>0;
In previous releases, your text table required a primary key to be indexed. This is no longer required in this release. You can create text table without a primary key and and index your text column.
Use the new MULTI_COLUMN_DATASTORE type when your text is stored in more than one column in your text table. During indexing, the system concatenates the text columns and indexes the text as a single document.
With the URL_DATASTORE, you can store URLs in your text column for indexing. This release supports embedding username and password for FTP protocol URLs, as in:
ftp://username:password@ftp.hostname.com/dir/file.doc
Oracle8i interMedia Text Reference for more information about how to use the URL_DATASTORE.
FILE and URL datastores access files on the database machine. This might be undesirable for sensitive sites, since any user can browse the file system accessible to the oracle user using these datastores.
For better security, this release introduces the FILE_ACCESS_ROLE system parameter. You or your DBA can set this system parameter to the name of a database role. If set, any user attempting to create an index using FILE or URL datastores must have this role, or the index creation will fail. For instance, if the DBA does:
begin
ctx_adm.set_parameter('FILE_ACCESS_ROLE','WHITEHAT');
end;
then when user SCOTT tries to index:
create index foox on foo(text) indextype is ctxsys.context parameters('datastore
ctxsys.file_datastore')
Oracle checks if SCOTT has the role WHITEHAT. If he does, then the create index will proceed as normal. If not, then the create index will fail.
This check is made only at create index time. Setting this parameter or granting/revoking the named role has no effect on existing indexes using the file datastore
Oracle8i interMedia Text Reference for more information about how to use the FILE_DATASTORE and URL_DATASTORE.
Use the new PROCEDURE_FILTER filter preference to filter your documents with a PL/SQL or Java stored procedure. The stored procedure is called each time a document needs to be filtered during indexing.
This feature is different from the existing USER_FILTER which calls an external executable to perform document filtering.
Oracle8i interMedia Text supports multi-language stoplists. A multi-language stoplist is useful when you index a multi-language column, such as a text column with English, German, and Japanese documents.
With normal indexing, right-truncated wildcard queries such as TO% can possibly expand into a large wordlist and degrade query performance.
To improve query performance of right-truncated wildcard queries, you can create a prefix index. This type of index records token prefixes. The trade-off is a bigger index for improved wildcard searching.
Prefix indexing is different from sub-string indexing which also improves wildcard queries.
interMedia Text supports the optimization of single tokens in the index. The optimization of specific tokens in the index saves time over optimizing the entire index.
Document services has the following two new features:
The CTX_THES PL/SQL package supports adding, updating, and removing translations in your thesaurus. These new procedures are the following:
The fuzzy query operator now supports similarity scoring. You can order the results so that results with high similarity to the query word are ranked higher than results with low similarity.
You can also limit the number of expanded terms.
Oracle8i interMedia Text supports most document formats for indexing with Inso filter technology.
Oracle8i interMedia Text Reference for more information about the Inso filter and all supported document formats
In this release, support for the following formats has been added:
Microsoft Word 2000
Word 2000
Microsoft Excel 2000
Excel 2000
Microsoft PowerPoint 2000
PowerPoint 2000
Corel WordPerfect for Windows
Versions through 9.0
QuattroPro for Windows
Versions through 9.0
Corel Presentations
Versions 8.0 and 9.0
Microsoft Project
Project 98
Visio graphics format
Visio 4, 5 2000
Ichitaro
Version 5, 6, 7, 8, an 9
(Text and paragraph attributes only for versions 5 and 6.)
CDR (if tiff image is embedded)
Corel Draw version 2.0 - 9.0
MSG
Microsoft Outlook mail format
(Some limitations in field support.)
Inso filter technology is supported on the following platforms:
Oracle8i interMedia Text now provides a knowledge base for French in addition to English. The knowledge base is the language specific data used during theme analysis.
For other languages, you must provide your own thesauri. One or more thesauri in a language can be compiled to produce an interMedia Text knowledge base for that language using the ctxkbtc compiler.
In this release, Oracle8i interMedia Text extends theme functionality to other languages by allowing you to load your own knowledge base for any single-byte whitespace delimited language, including Spanish and French.
Theme functionality includes theme indexing, ABOUT queries, theme highlighting, and the generation of themes, gists, and theme summaries with CTX_DOC.
You extend theme functionality by adding a user-defined knowledge base. For example, you can create a Spanish knowledge base from a Spanish thesuarus.
To load your language-specific knowledge base, follow these steps:
ctxkbtc:
ctxkbtc -user ctxsys/ctxsys -name my_lang_thes
A knowledge base is compiled from the loaded thesaurus. To use this knowledge base in an index, specify the NLS_LANG language as the THEME_LANGUAGE attribute value for the BASIC_LEXER preference.
The following limitations hold for adding knowledge bases:
Oracle8i interMedia Text Reference for more information about theme functionality
Knowledge bases can be in any single-byte character set. Supplied knowledge bases are in WE8ISO8859P1. You can store an extended knowledge base in another character set such as US7ASCII.
The ctxkbtc executable now takes an additional argument as follows:
ctxkbtc -user ctxsys/ctxsys -stoplist <stoplistname>
Stopwords in the stoplist are added to the knowledge base as useless words that are prevented from becoming themes or contributing to themes. You can still add stopthemes after running this command using CTX_DLL.ADD_STOPTHEME.
Obtaining hierarchical query feedback information such as broader terms, narrower terms and related terms does not work in languages other than English and French.
In other languages, the knowledge bases are derived entirely from your thesauri. In such cases, Oracle recommends that you obtain hierarchical information from your thesauri.
Use the CATSEARCH operator to search catalog indexes. Use this operator in the WHERE clause of a SELECT statement.
CATSEARCH([schema.]column, text_query VARCHAR2, structured_query VARCHAR2,RETURN NUMBER;
Specify the text column to be searched on. This column must have a CTXCAT index associated with it.
Specify the query expression that defines your search in column. The CATSEARCH operator supports only the following query operations:
These operators have the following syntax:
Logical AND
a b c
Returns rows that contain a, b, and c.
Logical OR
a | b | c
Returns rows that contain a, b, or c.
Logical NOT
a - b
Returns rows that contain a and not b.
hyphen with no space
a-b
Hyphen ignored. Words such as web-site treated as a single query term.
" "
"a b c"
Returns rows that contain the phrase "a b c".
For example, entering "XYZ CD Player" means return all rows that contain this phrase exactly.
Specify the structured conditions and the ORDER BY clause. There must exist an index for any column you specify. For example, if you specify 'category_id=1 order by bid_close', you must have an index for 'category_id, bid_close' as specified with CTX_DDL.ADD_INDEX.
With structured_query, you can use standard SQL syntax with only the following operators:
When you use these operators, the following limitations apply:
For example, these expressions are supported:
catsearch(text, 'dog', 'foo > 15') catsearch(text, 'dog', 'bar = ''SMITH''') catsearch(text, 'dog', 'foo between 1 and 15') catsearch(text, 'dog', 'foo = 1 and abc = 123')
And these expression are not supported:
catsearch(text, 'dog', 'upper(bar) = ''A''') catsearch(text, 'dog', 'bar LIKE ''A%''') catsearch(text, 'dog', 'foo = abc') catsearch(text, 'dog', 'foo = 1 or abc = 3')
A typical query with CATSEARCH might include a structured clause as follows to find all rows that contain the word camera with id of 99 ordered by bid_close:
select from AUCTION where CATSEARCH(title, 'camera', 'category_id=99 order by bid_close desc')> 0;
The following query finds all rows with the exact phrase XYZ CD Player:
select from AUCTION where CATSEARCH(title, '"XYZ CD Player"', 'order by bid_ close desc')> 0;
The following query finds all rows with the terms XYZ and CD and Player:
select from AUCTION where CATSEARCH(title, 'XYZ CD Player', 'order by bid_close desc')> 0;
The following query finds all rows with the term CD-Player:
select from AUCTION where CATSEARCH(title, 'CD-Player', 'order by bid_close desc')> 0;
The following query finds all rows with the term CD and not Player:
select from AUCTION where CATSEARCH(title, 'CD - Player', 'order by bid_close desc')> 0;
The following query finds all rows with the terms CD or DVD or Speaker:
select from AUCTION where CATSEARCH(title, 'CD | DVD | Speaker', 'order by bid_ close desc')> 0;
This section describes the CREATE INDEX statement as it pertains to creating an interMedia Text domain index. In addition, it only describes those changes that are new for this release.
For a complete description of the CREATE INDEX statement, see Oracle8i interMedia Text Reference and Oracle8i SQL Reference.
Use CREATE INDEX to create an interMedia Text index. An interMedia Text index is an Oracle domain index of either type context or ctxcat.
You must create an appropriate interMedia Text index to issue CONTAINS or CATSEARCH queries.
You can create two types of interMedia Text indexes:
You do not need the CTXAPP role to create an interMedia Text index. If you have Oracle grants to create a B-tree index on the text column, you have sufficient permission to create a text index. The issuing owner, table owner, and index owner can all be different users, which is the standard behavior for regular B-tree indexes.
CREATE INDEX [schema.]index on [schema.]table(column) INDEXTYPE IS ctxsys.ctxcat [PARAMETERS(paramstring)] [PARALLEL n];
To create a CTXCAT index, you use CREATE INDEX and optionally specify preferences in the parameter string like you do when you create a context index.
However, the keywords you can use in a CTXCAT parameter string are limited to:
INDEX SET
Specify the index set to use to create the catalog index.
DATASTORE
Specify the datastore preference.
LEXER
Specify your lexer preference.
MEMORY
Specify memory size.
STOPLIST
Specify your stoplist.
STORAGE
Specify your storage preference.
WORDLIST
Specify your wordlist preference.
When you create an index of type ctxcat, you can use only the following index preferences:
Datastore
DIRECT_DATASTORE
Filter
None
Lexer
BASIC_LEXER
CHINESE_VGRAM_LEXER
JAPANESE_VGRAM_LEXER
KOREAN_LEXER
Wordlist
PREFIX_INDEX attribute of BASIC_WORDLIST for Japanese data.
Storage
BASIC_STORAGE
Stoplist
GENERIC_STOPLIST
Section Group
None
The index set can take up to 99 indexes, each made up of ordered lists of columns from the base table. However, there are two important restrictions on these columns:
The catalog index has its own set of system parameters for setting preference defaults. This allows an installation to have one set of default preferences for context indexes, and another set for ctxcat indexes. The new system parameters are:
While you can specify a wordlist preference for ctxcat indexes, most of the attributes do not apply, since the catsearch query language does not support wildcarding, fuzzy, and stemming. The only attribute of the wordlist preference that is useful is PREFIX_INDEX, for Japanese data.
Refer to the section Catalog (CTXCAT) Index.
CREATE INDEX [schema.]index on [schema.]table(column) INDEXTYPE IS ctxsys.context [PARAMETERS(paramstring)] [PARALLEL n];
Oracle8i interMedia Text Reference for more information about how to create a CONTEXT index.
This section describes the ALTER INDEX statement as it pertains to managing an interMedia Text domain index. In addition it only describes those changes that are new for this release.
For a complete description of the ALTER INDEX statement, see Oracle8i interMedia Text Reference and the Oracle8i SQL Reference.
Use ALTER INDEX REBUILD syntax to perform the following maintenance tasks for a text index:
The following syntax is used to rebuild the index, resume a failed operation, perform batch DML, add stopwords to the index, add sections and stop sections to index, or optimize the index:
ALTER INDEX [schema.]index REBUILD [ONLINE] [PARAMETERS (paramstring)];
Optionally specify the ONLINE parameter for nonblocking operation, which allows the index to be queried during an ALTER INDEX synchronize or optimize operation. You cannot specify ONLINE for replace, resume, or when adding stopwords or stop sections.
Optionally specify paramstring. If you do not specify paramstring, Oracle rebuilds the index with existing preference settings.
The syntax for paramstring is as follows:
paramstring = 'replace [datastore datastore_pref] [filter filter_pref] [lexer lexer_pref] [wordlist wordlist_pref] [storage storage_pref] [stoplist stoplist] [section group section_group] [memory memsize] | resume [memory memsize] | optimize [token index_token | fast | full [maxtime (time | unlimited)] | sync [memory memsize] | add stopword word [language language] | add zone section section_name tag tag | add field section section_name tag tag [(VISIBLE | INVISIBLE)] | add attr section section_name tag tag@attr | add stop section tag'
Optimizes the index. Specify token, fast, or full optimization. You typically optimize after you synchronize the index.
When you optimize in token mode, Oracle optimizes only the index token index_token in full mode. Use this method of optimization to quickly optimize frequently searched terms.
When you optimize in fast mode, Oracle works on the entire index, compacting fragmented rows. However, in fast mode, old data is not removed.
When you optimize in full mode, you can optimize the whole index or a portion. This method compacts rows and removes old data.
You use the maxtime parameter to specify in minutes the time Oracle is to spend on the optimization operation. Oracle starts the optimization where it left off and optimizes until complete or until the time limit has been reached, whichever comes first. Specifying a time limit is useful for automating index optimization, where you set Oracle to optimize the index for a specified time on a regular basis.
When you specify maxtime unlimited, the entire index is optimized. This is the default. When you specify 0 for maxtime, Oracle performs minimal optimization.
The following statement optimizes the token Oracle in token mode:
ALTER INDEX newsindex REBUILD PARAMETERS('optimize token Oracle');
The following statement optimizes newsindex in fast mode:
ALTER INDEX newsindex REBUILD PARAMETERS('optimize fast');
To specify an optimization operation to last for three hours (180 minutes), issue the following statement:
ALTER INDEX newsindex REBUILD PARAMETERS('optimize full maxtime 180');
To optimize the entire index without regard to time, issue the following statement:
ALTER INDEX newsindex REBUILD PARAMETERS('optimize full maxtime unlimited');
To optimize the entire index and to allow queries to be issued during the optimization, issue the following statement:
ALTER INDEX newsindex REBUILD ONLINE PARAMETERS('optimize full maxtime
unlimited');
Use this datastore when your text is stored in more than one column. During indexing, the system concatenates the text columns and indexes the text as a single document.
MULTI_COLUMN_DATASTORE has the following attributes:
columns
Specify a comma separated list of columns to be concatenated during indexing. You can also specify any expression allowable for the select statement column list for the base table. This includes expressions, PL/SQL functions, column aliases, and so on.
NUMBER and DATE column types are supported. They are converted to text before indexing using the default format mask. The TO_CHAR function can be used in the column list for formatting.
RAW and BLOB columns are directly concatenated as binary data.
LONG, LONG RAW, NCHAR, and NCLOB, nested table columns and collections are not supported.
You can specify no more than 500 columns.
To index, you must create a dummy column to specify in the CREATE INDEX statement. This column's contents are not made part of the virtual document, unless its name is specified in the columns attribute.
The index is synchronized only when the dummy column is updated. You can create triggers to propagate changes if needed.
Only CTXSYS is allowed to create preferences for the MULTI_COLUMN_DATASTORE type. Any other user who attempts to create a MULTI_COLUMN_DATASTORE preference receives an error.
Oracle makes this restriction because when the columns attribute contains a function call, the call is made by the CTXSYS schema. The potential exists for a malicious CTXAPP users to execute arbitrary functions for which they do not have execute permission.
If this is too restrictive, you can create a stored procedure under CTXSYS to create MULTI_COLUMN_DATASTORE preferences. The effective user is CTXSYS, who creates and owns the preferences. However, you can call this procedure from any schema as CTXSYS.
For even better security, you can drop your CTXSYS stored procedures or preferences immediately after creating the index.For optimal security, consider using the USER_DATASTORE type, which checks user permissions during indexing.
The following example creates a multi-column datastore preference called my_multi with three text columns:
begin
ctx_ddl.create_preference('my_multi', 'MULTI_COLUMN_DATASTORE');
ctx_ddl.set_attribute('my_multi', 'columns', 'column1, column2, column3');
end;
During indexing, the system creates a virtual document for each row. The virtual document is composed of the contents of the columns concatenated in the listing order with column name tags automatically added. For example:
create table mc(id primary key, name varchar2(10), address varchar2(80));
insert into mc values(1, 'John Smith', '123 Main Street');
exec ctx_ddl.set_attibute('mymds', 'columns', 'name, address');
This produces the following virtual text for indexing:
<NAME> John Smith </NAME> <ADDRESS> 123 Main Street </ADDRESS>
The system indexes the text between the tags, ignoring the tags themselves. To index these tags as sections, you can optionally create field sections with the BASIC_SECTION_GROUP.
No section group is created when you use the MULTI_COLUMN_DATASTORE. To create sections for these tags, you must create a section group.
When you use expressions or functions, the tag is composed of the first 30 characters of the expression unless a column alias is used.
For example, if your expression is as follows:
exec ctx_ddl.set_attibute('mymds', 'columns', '4 + 17');
then it produces the following virtual text:
<4 + 17> 21 </4 + 17>
If your expression is as follows:
exec ctx_ddl.set_attibute('mymds', 'columns', '4 + 17 col1');
then it produces the following virtual text:
<col1> 21 <col1>
The tags are in uppercase unless the column name or column alias is in lowercase and surrounded by double quotes. For example:
exec ctx_ddl.set_attibute('mymds', 'COLUMNS', 'foo');
produces the following virtual text:
<FOO> content of foo </FOO>
For lowercase tags, use the following:
exec ctx_ddl.set_attibute('mymds', 'COLUMNS', 'foo "foo"');
This expression produces:
<foo> content of foo </foo>
Use the PROCEDURE_FILTER filter preference type to filter your documents with a stored procedure. The stored procedure is called each time a document needs to be filtered.
This type has the following attributes:
procedure
Name of the filter stored procedure.
Any CTXSYS owned procedure. The procedure can be a PL/SQL or Java stored procedure.
input_type
Type of input argument for stored procedure.
VARCHAR2, BLOB, CLOB, FILE
output_type
Type of output argument for stored procedure.
VARCHAR2, CLOB, FILE
rowid_parameter
Include rowid parameter?
TRUE/FALSE
format_parameter
Include format parameter?
TRUE/FLASE
charset_parameter
Include charset parameter?
TRUE/FLASE
Specify the name of the stored procedure to use for filtering. The procedure can be a PL/SQL or Java stored procedure. The procedure can be a safe callout or call a safe callout.
The procedure must be owned by CTXSYS and have one of the following signatures:
PROCEDURE(IN BLOB, IN OUT NOCOPY CLOB) PROCEDURE(IN CLOB, IN OUT NOCOPY CLOB) PROCEDURE(IN VARCHAR, IN OUT NOCOPY CLOB) PROCEDURE(IN BLOB, IN OUT NOCOPY VARCHAR2) PROCEDURE(IN CLOB, IN OUT NOCOPY VARCHAR2) PROCEDURE(IN VARCHAR2, IN OUT NOCOPY VARCHAR2) PROCEDURE(IN BLOB, IN VARCHAR2) PROCEDURE(IN CLOB, IN VARCHAR2) PROCEDURE(IN VARCHAR2, IN VARCHAR2)
The first argument is the content of the unfiltered row as passed out by the datastore. The second argument is for the procedure to pass back the filtered document text.
The procedure attribute is mandatory and has no default.
Specify the type of the input argument of the filter procedure. You can specify one of the following:
BLOB
The input argument is of type BLOB. The unfiltered document is contained in the BLOB passed in.
CLOB
The input argument is of type CLOB. The unfiltered document is contained in the CLOB passed in.
No pre-filtering or character set conversion is done. If the datastore outputs binary data, that binary data is written directly to the CLOB, with NLS doing implicit mapping to character data as best it can.
VARCHAR2
The input argument is of type VARCHAR2. The unfiltered document is contained in the VARCHAR2 passed in.
The document can be a maximum of 32767 bytes of data. If the unfiltered document is greater than this length, an error is raised for the document and the filter procedure is not called.
FILE
The input argument is of type VARCHAR2. The unfiltered document content is contained in a temporary file in the file system whose filename is stored in the VARCHAR2 passed in.
For example, the value of the passed-in VARCHAR2 might be 'tmp/mydoc.tmp' which means that the document content is stored in the file '/tmp/mydoc.tmp'.
The file input type is useful only when your procedure is a safe callout, which can read the file.
The input_type attribute s not mandatory. If not specified, BLOB is the default.
Specify the type of output argument of the filter procedure. You can specify one of the following types:
CLOB
The output argument is IN OUT NOCOPY CLOB. Your procedure must write the filtered content to the CLOB passed in.
VARCHAR2
The output argument is IN OUT NOCOPY VARCHAR2. Your procedure must write the filtered content to the VARCHAR2 variable passed in.
FILE
The output argument must be IN VARCHAR2. On entering the filter procedure, the output argument is the name of a temporary file. The filter procedure must write the filtered contents to this named file.
Using a FILE output type is useful only when the procedure is a safe callout, which can write to the file.
The output_type attribute is not mandatory. If not specified, CLOB is the default.
When you specify TRUE, the rowid of the document to be filtered is passed as the first parameter, before the input and output parameters.
For example, with INPUT_TYPE BLOB, OUTPUT_TYPE CLOB, and ROWID_PARAMETER TRUE, the filter procedure must have the signature as follows:
procedure(in rowid, in blob, in out nocopy clob)
This attribute is useful for when your procedure requires data from other columns or tables. This attribute is not mandatory. The default is FALSE.
When you specify TRUE, the value of the format column of the document being filtered is passed to the filter procedure before input and output parameters, but after the rowid parameter, if enabled.
You specify the name of the format column at index time in the parameters string, using the keyword 'format column <columnname>'. The parameter type must be IN VARCHAR2.
The format column value can be read via the rowid parameter, but this attribute allows a single filter to work on multiple table structures, because the format attribute is abstracted and does not require the knowledge of the name of the table or format column.
FORMAT_PARAMETER is not mandatory. The default is FALSE.
When you specify TRUE, the value of the charset column of the document being filtered is passed to the filter procedure before input and output parameters, but after the rowid and format parameter, if enabled.
You specify the name of the charset column at index time in the parameters string, using the keyword 'charset column <columnname>'. The parameter type must be IN VARCHAR2.
CHARSET_PARAMETER attribute is not mandatory. The default is FALSE.
ROWID_PARAMETER, FORMAT_PARAMETER, and CHARSET_PARAMETER are all independent. The order is rowid, the format, then charset, but the filter procedure is passed only the minimum parameters required.
For example, assume that INPUT_TYPE is BLOB and OUTPUT_TYPE is CLOB. If your filter procedure requires all parameters, the procedure signature must be:
(id IN ROWID, format IN VARCHAR2, charset IN VARCHAR2, input IN BLOB, output IN OUT NOCOPY CLOB)
If your procedure requires only the ROWID, then the procedure signature must be:
(id IN ROWID,input IN BLOB, ouput IN OUT NOCOPY CLOB)
In order to create an index using a PROCEDURE_FILTER preference, the index owner must have execute permission on the procedure.Oracle checks this at index time, which is similar to the security measures for USER_DATASTORE.
The filter procedure can raise any errors needed through the normal PL/SQL raise_application_error facility. These errors are propagated to the CTX_USER_INDEX_ERRORS view or reported to the user, depending on the context in which the filter is invoked.
Consider a filter procedure CTXSYS.NORMALIZE that you define with the following signature:
PROCEDURE NORMALIZE(id IN ROWID, charset IN VARCHAR2, input IN CLOB, output IN OUT NOCOPY VARCHAR2);
To use this procedure as your filter, you set up your filter preference as follows:
begin
ctx_ddl.create_preference('myfilt', 'procedure_filter');
ctx_ddl.set_attribute('myfilt', 'procedure', 'normalize');
ctx_ddl.set_attribute('myfilt', 'input_type', 'clob');
ctx_ddl.set_attribute('myfilt', 'output_type', 'varchar2');
ctx_ddl.set_attribute('myfilt', 'rowid_parameter', 'TRUE');
ctx_ddl.set_attribute('myfilt', 'charset_parameter', 'TRUE');
end;
You can create a prefix index to improve right-truncated wildcard searches such as TO%. Without a prefix index, right truncated wildcard queries are expanded using equivalence. This type of expansion can possibly result in large wordlists, degrading query performance.
To enable prefix indexing, use the BASIC_WORDLIST preference type. The following new attributes have been added:
index_prefix
Specify YES to enable prefix indexing. Prefix indexing improves performance for right truncated wildcard searches such as TO%. Defaults to NO.
prefix_length_min
Specify the minimum length of indexed prefixes. Defaults to 1.
prefix_length_max
Specify the maximum length of indexed prefixes. Defaults to 64.
Specify yes to enable prefix indexing. Prefix indexing improves performance for right truncated wildcard searches such as TO%. Defaults to NO.
Enabling prefix indexing increases index size.
Prefix indexing chops up tokens into multiple prefixes to store in the $I table.For example, words TOKEN and TOY are normally indexed like this in the $I table:
TOKEN
0
DOCID 1 POS 1
TOY
0
DOCID 1 POS 3
With prefix indexing, Oracle indexes the prefix substrings of these tokens as follows with a new token type of 6:
TOKEN
0
DOCID 1 POS 1
TOY
0
DOCID 1 POS 3
T
6
DOCID 1 POS 1 POS 3
TO
6
DOCID 1 POS 1 POS 3
TOK
6
DOCID 1 POS 1
TOKE
6
DOCID 1 POS 1
TOKEN
6
DOCID 1 POS 1
TOY
6
DOCID 1 POS 3
Wildcard searches such as TO% are now faster because Oracle does no expansion of terms and merging of result sets. To obtain the result, Oracle need only examine the (TO,6) row.
Specify the minimum length of indexed prefixes. Defaults to 1.
For example, setting prefix_length_min to 3 and prefix_length_max to 5 indexes all prefixes between 3 and 5 characters long.
A wildcard search whose pattern is below the minimum length or above the maximum length is searched using the slower method of equivalence expansion and merging.
Specify the maximum length of indexed prefixes. Defaults to 64.
For example, setting prefix_length_min to 3 and prefix_length_max to 5 indexes all prefixes between 3 and 5 characters long.
A wildcard search whose pattern is below the minimum length or above the maximum length is searched using the slower method of equivalence expansion and merging.
The following example sets the wordlist preference for prefix indexing. The example specifies that Oracle create token prefixes between 3 and 4 characters long:
begin
ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST');
ctx_ddl.set_attribute('mywordlist','INDEX_PREFIX','YES');
ctx_ddl.set_attribute('mywordlist','PREFIX_MIN_LENGTH',3);
ctx_ddl.set_attribute('mywordlist','PREFIX_MAX_LENGTH', 4);
end
Use the fuzzy operator to expand queries to include words that are spelled similarly to the specified term. This type of expansion is helpful for finding more accurate results when there are frequent misspellings in your document set.
The new fuzzy syntax enables you to rank the result set so that documents that contain words with high similarity to the query word are scored higher than documents with lower similarity. You can also limit the number of expanded terms.
Unlike stem expansion, the number of words generated by a fuzzy expansion depends on what is in the index. Results can vary significantly according to the contents of the index.
Oracle8i interMedia Text supports fuzzy definitions for English, German, Italian, Dutch, Spanish, and OCR.
If the fuzzy expansion returns a stopword, the stopword is not included in the query or highlighted by CTX_DOC.HIGHLIGHT or CTX_DOC.MARKUP.
If base-letter conversion is enabled for a text column and the query expression contains a fuzzy operator, Oracle operates on the base-letter form of the query.
fuzzy(term, score, numresults, weight)
term
Specify the word on which to perform the fuzzy expansion. Oracle expands term to include words only in the index.
score
Specify a similarity score. Terms in the expansion that score below this number are discarded. Use a number between 1 and 80. The default is 60.
numresults
Specify the maximum number of terms to use in the expansion of term. Use a number between 1 and 5000. The default is 100.
weight
Specify WEIGHT or W for the results to be weighted according to their similarity scores.
Specify NOWEIGHT or N for no weighting of results.
Consider the CONTAINS query:
...CONTAINS(TEXT, 'fuzzy(government, 70, 6, weight)', 1) > 0;
This query expands to the first six fuzzy variations of government in the index that have a similarity score over 70.
In addition, documents in the result set are weighted according to their similarity to government. Documents containing words most similar to government receive the highest score.
You can skip unnecessary parameters using the appropriate number of commas. For example:
'fuzzy(government,,,weight)'
The old fuzzy syntax from previous releases is still supported. This syntax is as follows:
?term
Expands term to include all terms with similar spellings as the specified term.
Use this procedure to add a B-tree index to a catalog index preference. You create a catalog index preference to create a CTXCAT index.
Invoking CTX_DDL.ADD_INDEX after creating a CTXCAT index does not update the domain index to include the new index. You must add B-tree indexes with this procedure and then create your CTXCAT index with CREATE INDEX.
CTX_DDL.ADD_INDEX(set_name in varchar2, column_list varchar2, storage_clause varchar2);
Specify the name of the index set.
Specify a comma separated list of columns upon which to create the B-tree index. Order your columns according to your queries.
For example, if your structured query clause is 'column1=99 order by column2', specify 'column1, column2' for optimal query performance.
Similarly, if your structured query clause is 'column2=200 order by column1', specify 'column2, column1' for optimal query performance.
Specify a storage clause.
Consider a table called AUCTION with the following schema:
create table auction(item_id number, title varchar2(100), category_id number, price number, bid_close date);
Assume that queries on the table involve a mandatory text query clause and optional structured conditions on category_id. Results must be sorted based on either bid_close, category_id, or price.
You can create a catalog index to support the different types of structured queries a user might enter.
To create the indexes, first create the index set preference then add the required indexes to it.
The following example creates the index set preference and adds five different indexes to it:
begin
ctx_ddl.create_index_set('auction_iset');
ctx_ddl.add_index('auction_iset','bid_close'); /* index A */
ctx_ddl.add_index('auction_iset','category_id, bid_close'); /* index B */
ctx_ddl.add_index('auction_iset','bid_close, category_id'); /* index C */
ctx_ddl.add_index('auction_iset','price, bid_close'); /* index D */
ctx_ddl.add_index('auction_iset','bid_close, price'); /* index E */
end;
Create the combined catalog index with CREATE INDEX as follows:
create index auction_titlex on AUCTION(title) indextype is CTXCAT parameters
('index set auction_iset');
To query the title column for the word camera, you can issue regular and mixed queries as follows:
select from AUCTION where CATSEARCH(title, 'camera', NULL)>0;
The following query uses index A:
select from AUCTION where CATSEARCH(title, 'camera', 'bid_close=20-FEB-2000')>0;
The following query uses index B:
select from AUCTION where CATSEARCH(title, 'camera', 'category_id=99 order by bid_close desc')>0;
The following query uses index C:
select from AUCTION where CATSEARCH(title, 'camera', 'bid_close=20-FEB-2000 order by category_id')>0;
The following query uses index D:
select from AUCTION where CATSEARCH(title, 'camera', 'price=200 order by bid_ close')>0;
The following query uses index E:
select from AUCTION where CATSEARCH(title, 'camera', 'bid_close=20-FEB-2000 order by price')>0;
Use this procedure to create an index set for CTXCAT index types. You name this index set in the parameter clause of CREATE INDEX when you create a CTXCAT index.
CTX_DDL.CREATE_INDEX_SET(set_name in varchar2);
Specify the name of the index set. You name this index set in the parameter clause of CREATE INDEX when you create a CTXCAT index.
Use this procedure to drop an index set.
CTX_DDL.DROP_INDEX_SET(set_name in varchar2);
Specify the name of the index set to drop.
Use this procedure to remove a column from the column list.
CTX_DDL.REMOVE_INDEX(set_name in varchar2, column_list in varchar2);
Specify the name of the index set to drop.
Specify the column list to remove from the index set preference.
Use this procedure to optimize the index. You optimize your index after you synchronize it. Optimizing the index removes old data and minimizes index fragmentation. Optimizing the index can improve query response time
You can optimize in fast, full, or token mode. In token mode, you specify a specific token to be optimized. You can use token mode to optimize index tokens that are frequently searched, without spending time on optimizing tokens that are rarely referenced. An optimized token can improve query response time for that token.
Optimizing an index can result in better response time only if you insert, delete, or update documents in your base table after your initial indexing operation.
Using this procedure to optimize the index is the same as optimizing with the ALTER INDEX statement.
CTX_DDL.OPTIMIZE_INDEX(idx_name in varchar2, optlevel in varchar2, maxtime in number default null token in varchar2 default null);
Specify the name of the index.
Specify optimization level as a string. You can specify one of the following methods for optimization:
FAST or
CTX_DDL.OPTLEVEL_FAST
This method compacts fragmented rows. However, old data is not removed.
FULL or
CTX_DDL.OPTLEVEL_FULL
In this mode you can optimize the entire index or a portion of the index. This method compacts rows and removes old data.
TOKEN
This method lets you specify a specific token to be optimized. Oracle does a FULL optimization on the token you specify with token.
Use this method to optimize those tokens that are searched frequently.
Specify maximum optimization time, in minutes, for FULL optimize.
When you specify the symbol CTX_DDL.MAXTIME_UNLIMITED (or pass in NULL), the entire index is optimized. This is the default.
Specify the token to be optimized.
The following two examples optimize the index for fast optimization.
begin
ctx_ddl.optimize_index('myidx','FAST');
end;
begin
ctx_ddl.optimize_index('myidx',CTX_DDL.OPTLEVEL_FAST);
end;
The following example optimizes the index token Oracle:
begin
ctx_ddl.optimize_index('myidx','token', TOKEN=>'Oracle');
end;
Use this procedure to create a new, empty stoplist. Stoplists can contain words or themes that are not to be indexed.
You can also create multi-language stoplists to hold language-specific stopwords. A multi-lingual stoplist is useful when you index a table that contains documents in different languages, such as English, German, and Japanese.
You can add either stopwords, stopclasses, or stopthemes to a stoplist using ADD_STOPWORD, ADD_STOPCLASS, or ADD_STOPTHEME.
You can specify a stoplist in the parameter string of CREATE INDEX or ALTER INDEX to override the default stoplist CTXSYS.DEFAULT_STOPLIST.
CTX_DDL.CREATE_STOPLIST(stoplist_name IN VARCHAR2, stoplist_type IN VARCHAR2 DEFAULT 'BASIC_STOPLIST');
Specify the name of the stoplist to be created.
Specify MULTI_STOPLIST to create a stoplist with language-specific stopwords.
When indexing a multi-lingual table with a multi-lingual stoplist, your table must have a language column.
Specify BASIC_STOPLIST to create a stoplist for a single language. This is the default.
The following code creates a stoplist called mystop:
begin
ctx_ddl.create_stoplist('mystop', 'BASIC_STOPLIST');
end;
Use this procedure to add a single stopword to a stoplist. To create a list of stopwords, you must call this procedure once for each word.
The maximum number of stopwords, stopthemes, and stopclasses you can add to a stoplist is 4095.
CTX_DDL.ADD_STOPWORD(stoplist_name in varchar2, stopword in varchar2, language in varchar2 default NULL);
Specify the name of the stoplist.
Specify the stopword to be added.
Language-specific stopwords must be unique across the other stopwords specific to the language. For example, it is valid to have a German die and an English die in the same stoplist.
The maximum number of stopwords, stopthemes, and stopclasses you can add to a stoplist is 4095.
Specify the language of stopword when stoplist_name is of type MULTI_STOPWORD. You must specify the NLS name or abbreviation of an Oracle-supported language.
Otherwise, specify NULL.
The following example adds the stopwords because, notwithstanding, nonetheless,andthereforeto the stoplist mystop:
begin
ctx_ddl.add_stopword('mystop', 'because');
ctx_ddl.add_stopword('mystop', 'notwithstanding');
ctx_ddl.add_stopword('mystop', 'nonetheless');
ctx_ddl.add_stopword('mystop', 'therefore');
end;
The following example adds the German word die to a multi-language stoplist:
begin
ctx_ddl.add_stopword('mulitstop', 'die', 'd');
end;
You can add stopwords after you create the index with ALTER INDEX.
Use this procedure to create a new translation for a phrase in a specified language.
CTX_THES.CREATE_TRANSLATION(tname in varchar2, phrase in varchar2, language in varchar2, translation in varchar2);
Specify the name of the thesaurus, using no more than 30 characters.
Specify the phrase in the thesaurus to which to add a translation. Phrase must already exist in the thesaurus, or an error is raised.
Specify the language of the translation, using no more than 10 characters.
Specify the translated term, using no more than 256 characters.
If a translation for this phrase already exists, this new translation is added without removing that original translation, so long as that original translation is not the same. Adding the same translation twice results in an error.
The following code adds the Spanish translation for dog to my_thes:
begin
ctx_thes.create_translation('my_thes', 'dog', 'SPANISH', 'PERRO');
end;
Use this procedure to remove one or more translations for a phrase.
CTX_THES.DROP_TRANSLATION (tname in varchar2, phrase in varchar2, language in varchar2 default null, translation in varchar2 default null);
Specify the name of the thesaurus, using no more than 30 characters.
Specify the phrase in the thesaurus to which to remove a translation. The phrase must already exist in the thesaurus or an error is raised.
Optionally, specify the language of the translation, using no more than 10 characters. If not specified, the translation must also not be specified and all translations in all languages for the phrase are removed. An error is raised if the phrase has no translations.
Optionally, specify the translated term to remove, using no more than 256 characters. If no such translation exists, an error is raised.
The following code removes the Spanish translation for dog:
begin
ctx_thes.drop_translation('my_thes', 'dog', 'SPANISH', 'PERRO');
end;
Use this procedure to test that a thesaurus relation exists without actually doing the expansion. The function returns TRUE if the phrase has any of the relations in the specified list.
CTX_THES.HAS_RELATION(phrase in varchar2, rel in varchar2, tname in varchar2 default 'DEFAULT') returns boolean;
Specify the phrase.
Specify a single thesaural relation or a comma-separated list of relations, except PT. Specify 'ANY' for any relation.
Specify the thesaurus name.
The following example returns TRUE if the phrase "cat" in the DEFAULT thesaurus has any broader terms or broader generic terms:
begin
ctx_thes.has_relation('cat','BT,BTG');
end;
Use this procedure to update an existing translation.
CTX_THES.UPDATE_TRANSLATION(tname in varchar2, phrase in varchar2, language in varchar2, translation in varchar2, new_translation in varchar2);
Specify the name of the thesaurus, using no more than 30 characters.
Specify the phrase in the thesaurus to which to update a translation. The phrase must already exist in the thesaurus or an error is raised.
Specify the language of the translation, using no more than 10 characters.
Specify the translated term to update no more than 256 characters. If no such translation exists, an error is raised.
You can specify NULL if there is only one translation for the phrase. An error is raised if there is more than one translation for the term in the specified language.
Optionally, specify the new form of the translated term.
The following code updates the Spanish translation for dog:
begin
ctx_thes.drop_translation('my_thes', 'dog', 'SPANISH', 'PERRO', 'CAN');
end;
Use this procedure to identify all text tokens in a document. The tokens returned are those tokens which are inserted into the index.
Stop words are not returned. Section tags are not returned because they are not text tokens.
CTX_DOC.TOKENS(index_name IN VARCHAR2, textkey IN VARCHAR2, restab IN OUT NOCOPY TOKEN_TAB);
CTX_DOC.TOKENS(index_name IN VARCHAR2, textkey IN VARCHAR2, restab IN VARCHAR2, query_id IN NUMBER DEFAULT 0);
Specify the name of the index for the text column.
Specify the unique identifier (usually the primary key) for the document.
Thetextkey parameter can be one of the following:
You toggle between primary key and rowid identification using CTX_DOC.SET_KEY_TYPE.
You can specify that this procedure store results to either a table or to an in-memory PL/SQL table.
The tokens returned are those tokens which are inserted into the index. Stop words are not returned. Section tags are not returned because they are not text tokens.
To store results to a table, specify the name of the table. Token tables can be named anything, but must include the following columns, with names and data types as specified.
QUERY_ID
NUMBER
The identifier for the results generated by a particular call to CTX_DOC.TOKEN (only populated when table is used to store results from multiple TOKEN calls)
TOKEN
VARCHAR2(64)
The token string in the text.
OFFSET
NUMBER
The position of the token in the document, relative to the start of document which has a position of 1.
LENGTH
NUMBER
The character length of the token.
To store results to an in-memory table, specify the name of the in-memory table of type TOKEN_TAB. The TOKEN_TAB datatype is defined as follows:
type token_rec is record (token varchar2(64); offset number; length number;); type token_tab is table of token_rec index by binary_integer;
CTX_DOC.TOKENS clears the TOKEN_TAB you specify before the operation.
Specify the identifier used to identify the row(s) inserted into restab.
The following example generates the tokens for document 1 and stores them in an in-memory table, declared as the_tokens. The example then loops through the table to display the document tokens.
declare
the_tokens ctx_doc.token_tab;
begin
ctx_doc.tokens('myindex','1',the_tokens);
for i in 1..the_tokens.count loop
dbms_output.put_line(the_tokens(i).token);
end loop;
end;
Use the CTX_DOC.THEMES procedure to generate a list of themes for a document. Each theme is stored as a row in either a result table or in-memory PL/SQL table you specify.
CTX_DOC.THEMES(index_name IN VARCHAR2, textkey IN VARCHAR2, restab IN OUT THEME_TAB, full_themes IN BOOLEAN DEFAULT FALSE, numthemes IN NUMBER DEFAULT 50);
CTX_DOC.THEMES(index_name IN VARCHAR2, textkey IN VARCHAR2, restab IN VARCHAR2, query_id IN NUMBER DEFAULT 0, full_themes IN BOOLEAN DEFAULT FALSE, numthemes IN NUMBER DEFAULT 50);
Specify the name of the index for the text column.
Specify the unique identifier (usually the primary key) for the document.
Thetextkey parameter can be one of the following:
You toggle between primary key and rowid identification using CTX_DOC.SET_KEY_TYPE.
You can specify that this procedure store results to either a table or to an in-memory PL/SQL table.
To store results in a table, specify the name of the table.
For more information about the structure of the theme result table, see the theme table description in the see Oracle8i interMedia Text Reference.
To store results in an in-memory table, specify the name of the in-memory table of type THEME_TAB. The THEME_TAB datatype is defined as follows:
type theme_rec is record ( theme varchar2(2000); weight number; ); type theme_tab is table of theme_rec index by binary_integer;
CTX_DOC.THEMES clears the THEME_TAB you specify before the operation.
Specify the identifier used to identify the row(s) inserted into restab.
Specify whether this procedure generates a single theme or a hierarchical list of parent themes (full themes) for each document theme.
Specify TRUE for this procedure to write full themes to the THEME column of the result table.
Specify FALSE for this procedure to write single theme information to the THEME column of the result table. This is the default.
Specify the number of themes to retrieve. For example, if you specify 10, the top 10 themes are returned for the document. The default is 50.
If you specify 0 or NULL, this procedure returns all themes in a document. If the document contains more than 50 themes, only the top 50 themes show conceptual hierarchy.
The following example generates the top 10 themes for document 1 and stores them in an in-memory table called the_themes. The example then loops through the table to display the document themes.
declare
the_themes ctx_doc.theme_tab;
begin
ctx_doc.themes('myindex','1',the_themes, numthemes=>10);
for i in 1..the_themes.count loop
dbms_output.put_line(the_themes(i).theme||':'||the_themes(i).weight);
end loop;
end;
The following example creates a theme table called CTX_THEMES:
create table CTX_THEMES (query_id number, theme varchar2(2000), weight number);
To obtain a list of the top 20 themes where each element in the list is a single theme, issue a statement like the following:
begin
ctx_doc.themes('newsindex','34','CTX_THEMES',1,full_themes => FALSE,
numthemes=> 20);
end;
To obtain a list of the top 20 themes where each element in the list is a hierarchical list of parent themes, issue a statement like the following:
begin
ctx_doc.themes('newsindex','34','CTX_THEMES',1,full_themes => TRUE,
numthemes=>20);
end;
Use the CTX_DOC.GIST procedure to generate a gist and theme summaries for a document. You can generate paragraph-level or sentence-level gists or theme summaries.
CTX_DOC.GIST(index_name IN VARCHAR2, textkey IN VARCHAR2, restab IN OUT CLOB, glevel IN VARCHAR2 DEFAULT 'P', pov IN VARCHAR2 DEFAULT NULL, numParagraphs IN NUMBER DEFAULT 16, maxPercent IN NUMBER DEFAULT 10, numthemes IN NUMBER DEFAULT 50);
CTX_DOC.GIST(index_name IN VARCHAR2, textkey IN VARCHAR2, restab IN VARCHAR2, query_id IN NUMBER DEFAULT 0, glevel IN VARCHAR2 DEFAULT 'P', pov IN VARCHAR2 DEFAULT NULL, numParagraphs IN NUMBER DEFAULT 16, maxPercent IN NUMBER DEFAULT 10, numthemes IN NUMBER DEFAULT 50);
Specify the name of the index associated with the text column containing the document identified by textkey.
Specify the unique identifier (usually the primary key) for the document.
Thetextkey parameter can be one of the following:
You toggle between primary key and rowid identification using CTX_DOC.SET_KEY_TYPE.
You can specify that this procedure store the gist and theme summaries to either a table or to an in-memory CLOB.
To store results to a table specify the name of the table.
For more information about the structure of the gist result table, see the gist table description in the Oracle8i interMedia Text Reference.
To store results in memory, specify the name of the CLOB locator. If restab is NULL, a temporary CLOB is allocated and returned. You must de-allocate the locator after using it.
If restab is not NULL, the CLOB is truncated before the operation.
Specify an identifier to use to identify the row(s) inserted into restab.
Specify the type of gist or theme summary to produce. The possible values are:
The default is P.
Specify whether a gist or a single theme summary is generated. The type of gist or theme summary generated (sentence-level or paragraph-level) depends on the value specified for glevel.
To generate a gist for the entire document, specify a value of `GENERIC' for pov. To generate a theme summary for a single theme in a document, specify the theme as the value for pov.
When using result table storage and you do not specify a value for pov, this procedure returns the generic gist plus up to fifty theme summaries for the document.
When using in-memory result storage to a CLOB, you must specify a pov. However, if you do not specify pov, this procedure generates only a generic gist for the document.
The pov parameter is case sensitive. To return a gist for a document, specify `GENERIC' in all uppercase. To return a theme summary, specify the theme exactly as it is generated for the document.
Only the themes generated by CTX_DOC.THEMES Updated Syntax for a document can be used as input for pov.
Specify the maximum number of document paragraphs (or sentences) selected for the document gist or theme summaries. The default is 16.
The numParagraphs parameter is used only when this parameter yields a smaller gist or theme summary size than the gist or theme summary size yielded by the maxPercent parameter.
This means that the system always returns the smallest size gist or theme summary.
Specify the maximum number of document paragraphs (or sentences) selected for the document gist or theme summaries as a percentage of the total paragraphs (or sentences) in the document. The default is 10.
The maxPercentparameter is used only when this parameter yields a smaller gist or theme summary size than the gist or theme summary size yielded by the numParagraphs parameter.
This means that the system always returns the smallest size gist or theme summary.
Specify the number of theme summaries to produce when you do not specify a value for pov. For example, if you specify 10, this procedure returns the top 10 theme summaries. The default is 50.
If you specify 0 or NULL, this procedure returns all themes in a document. If the document contains more than 50 themes, only the top 50 themes show conceptual hierarchy.
The following example generates a non-default size generic gist of at most 10 paragraphs. The result is stored in memory in a CLOB locator. The code then de-allocates the returned CLOB locator after using it.
declare
gklob clob;
amt number := 40;
line varchar2(80);
begin
ctx_doc.gist('newsindex','34','gklob',1,glevel => 'P',pov => 'GENERIC',
numParagraphs => 10);
-- gklob is NULL when passed-in, so ctx-doc.gist will allocate a temporary
-- CLOB for us and place the results there.
dbms_lob.read(gklob, amt, 1, line);
dbms_output.put_line('FIRST 40 CHARS ARE:'||line);
-- have to de-allocate the temp lob
dbms_lob.freetemporary(gklob);
end;
The following example creates a gist table called CTX_GIST:
create table CTX_GIST (query_id number, pov varchar2(80), gist CLOB);
The following example returns a default sized paragraph level gist for document 34 as well as the top 10 theme summaries in the document:
begin
ctx_doc.gist('newsindex','34','CTX_GIST', 1, glevel => 'P', numthemes=10);
end;
The following example generates a non-default size gist of at most 10 paragraphs:
begin
ctx_doc.gist('newsindex','34','CTX_GIST',1,glevel => 'P',pov => 'GENERIC',
numParagraphs => 10);
end;
The following example generates a gist whose number of paragraphs is at most 10 percent of the total paragraphs in document:
begin
ctx_doc.gist('newsindex','34','CTX_GIST',1, glevel =>'P',pov => 'GENERIC',
maxPercent => 10);
end;
The following example returns a paragraph level theme summary for insects for document 34. The default theme summary size is returned.
begin
ctx_doc.gist('newsindex','34','CTX_GIST',1,glevel =>'P', pov => 'insects');
end;