Home
IBM
Speeding up queries by a factor of 100 or more with DB2 Text Search

Speeding up queries by a factor of 100 or more with DB2 Text Search

During a recent Rails project at IBM we had to deal with a large table consisting of customers. The table is made up of legacy enterprise data, and contains close to a million records.

Among many other fields, the table Customers includes a column name defined as VARCHAR. name is used to store company names.

One of the requirements for the project was to implement a Google Suggest-like feature. That is, when the user enters a few characters for the customer’s name, they should be prompted, via AJAX, with a list of possible suggestions containing that string (or at least beginning with that string.)

The naive approach would be to use a simple LIKE predicate, to generate queries such as:

SELECT name
FROM Customers
WHERE name LIKE '%micro%'
ORDER BY name

One of the main problems with this approach is performance. Using LIKE this way doesn’t allow us to take advantage of indexes that are defined on the column name.

Unsurprisingly, on a modest server, such a query takes about 25 seconds. A UI that isn’t able to respond for 25 seconds will feel like an eternity for the end user, of course, so a simple query like this obviously cannot be seen as a viable solution.

Enter DB2 Text Search. You may be familiar with other full text search engines such as Ferret, Lucene, or Sphinx. DB2 V9.7.4 ships with an extremely powerful full text search engine that has plenty of useful features, including excellent integration with native XML columns.

Using the free edition of DB2 (DB2 Express-C) which includes Text Search, we were able to implement the autocomplete functionality we were after in a heartbeat, thanks to a query like the following:

SELECT name
FROM Customers
WHERE CONTAINS(name, 'micro') = 1
ORDER BY name

This query was executed in mere fractions of a second for most searches, and behaved exactly as needed. In fact, not only did we match strings that began with the searched token, but also ones that contained it elsewhere (e.g., micro would match both Microsoft and Sun Microsystems.) The results where ordered alphabetically, but could have easily been ordered by relevance via the SCORE function, also available via DB2 Text Search.

As usual, it’s a matter of using the right tool for the right job, and DB2 Text Search was created exactly for these kind of scenarios.

Let’s briefly look at how you can also go about setting up and playing with it on Linux.

Installing DB2 Text Search

To start with, you’ll need to have a 64bit Linux distro, and then follow these simple steps.

Download DB2 Express-C. Make sure you grab the db2exc_974_LNX_x86_64.tar.gz file and not the Light edition.

Next, install DB2 by following these steps (the Installing DB2 section in particular.) Make sure you select a custom installation, and that you select everything including DB2 Text Search (which is not checked by default).

After you’ve started DB2 with db2start via the instance user (e.g., db2inst1,) launch the text search server by running:


db2ts start for text

To enable text search for an existing database, run:


db2ts enable database for text connect to mydb

Next, you’ll need to create a text search index and populate it. For example:

db2ts "create index customer_name for text on customers(name) connect to mydb"
db2ts "update index customer_name for text connect to mydb"

That’s it. From now on you’ll be able to use functions such as CONTAINS to quickly search for your data.

Such a function can also be easily invoked in Rails:

Customer.where("CONTAINS(name, ?) = 1", name)

Or wrapping it a little in a model:

class Customer < ActiveRecord::Base
 # ... 
 def self.search(name, max_res = 10)
 where('CONTAINS(name, ?) = 1', name).
 order('name').
 limit(max_res)
 end
end

Which can then be invoked as follows:


Customer.search("micro")

(Technically we wouldn’t have needed limit, so long as we passed RESULTLIMIT to the CONTAINS function.)

You can read more about DB2 Text Search at the Information Center.

Get more stuff like this

Subscribe to my mailing list to receive similar updates about programming.

Thank you for subscribing. Please check your email to confirm your subscription.

Something went wrong.

I respect your privacy and take protecting it seriously. Feel free to review my privacy policy page for further details.

Share this:

About The Author

Antonio Cangiano
More from this Author

Antonio Cangiano is an Engineering Manager and AI Specialist at IBM. He authored Ruby on Rails for Microsoft Developers (Wrox, 2009) and Technical Blogging (The Pragmatic Bookshelf, 2012, 2019). He is also the Marketing Lead for Cognitive Class, an educational initiative which he helped grow from zero to over 3 Million students. You can follow him on Twitter.

3 Comments

  1. johno July 20, 2011

    Queries using like “name%” actually should use index on name. It’s called prefix search. And normally you would also have a limit clause there, since you really don’t need to retrieve all matches when someone queries for something general like “peter%”.

    What you probably wanted to write is a query with “%name%”. It has the same semantics as CONTAINS.

    Loading...
    • Antonio Cangiano July 20, 2011

      Good spotting, Johno. It was meant to be ‘%name%’ in the article from the very beginning (which was what took 25+ seconds). I fixed it now.

      Loading...
  2. duke July 20, 2011

    DB2 text search cannot distinguish between C++ and C#

    Is this likely to be fixed any time?

    thanks

    Loading...

Leave a ReplyCancel reply

[フレーム]

This site uses Akismet to reduce spam. Learn how your comment data is processed.

[フレーム]

AltStyle によって変換されたページ (->オリジナル) /