3
\$\begingroup\$

My query is this:

UPDATE `phrases`
SET `phrases`.`count`=(SELECT COUNT(*) FROM `strings` WHERE `string` LIKE CONCAT('%', `phrases`.`phrase`, '%'))

My tables look like this:

CREATE TABLE `phrases` (
 `hash` varchar(32) NOT NULL,
 `count` int DEFAULT 0,
 `phrase` text NOT NULL,
 PRIMARY KEY (`hash`),
 KEY(`count`)
)

And

CREATE TABLE `strings` (
 `string` text NOT NULL,
)

phrases has 18,000 rows and strings has 1500 rows.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Dec 20, 2011 at 21:57
\$\endgroup\$
1
  • \$\begingroup\$ It might be more efficient to have a separate table where you would store the counts per phrase, and then only update this table once a new string is added. Since the number of strings is low in comparison to the phrases, I figure this wont happen that often. So you would not perform the whole count again, just add 1 if the new string matches that phrase. \$\endgroup\$ Commented Dec 20, 2011 at 22:09

2 Answers 2

4
\$\begingroup\$

Since you're using a LIKE with wildcards, you're going to do a table-scan against both tables, running a total of 18000*1500 = 27000000 substring comparisons.

To optimize this, you need to use some fulltext index technology. I suggest Sphinx Search or Apache Solr. If you do this, you don't need to keep a count of how many matches there are, because the search index makes it a lot less expensive to get a count on demand.

MySQL also implements a FULLTEXT index type, but it is only supported in the MyISAM storage engine in current versions (up to 5.5). I don't recommend using MyISAM for important data.

MySQL 5.6 is developing a fulltext index for InnoDB.

answered Dec 20, 2011 at 22:04
\$\endgroup\$
0
1
\$\begingroup\$

You should drop the index and collect the counts.

This will speed up the updating of the count column.

When done, put the index back.

ALTER TABLE phrase DROP INDEX `count`;
UPDATE phrase SET COUNT=0;
UPDATE phrases INNER JOIN string
ON ( LOCATE(strings.string,phrases.phrase) > 0 )
SET phrase.`count`=phrase.`count`+1;
ALTER TABLE phrase ADD INDEX `count` (`count`);

This INNER JOIN is nothing more than a Cartesian Product (pointed out by Bill Karwin's answer as 27,000,000 rows being examined in a temp table).

If the time to process is something can live with, all well and good.

If the time to process is disastrously slow, you must try Bill Karwin's answer.

answered Dec 20, 2011 at 22:07
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.