1
\$\begingroup\$

I want to find out the 'rank' of a word in any given language. A word with a rank of 1 means it's the most commonly used word in the language.

There are two MySQL tables.

document:

  • id: 1
  • language: "english"

word:

  • id: 2
  • body: "the"
  • document_id: 1

There are about 1 million rows in the word table. Here is a query that works, but it takes 7-10 seconds. Is there a better way to write this query?

select count(*) + 1
 from (
 select word.body, count(*) as count
 from word
 join document on word.document_id = document.id
 where document.language = 'english'
 group by word.body
 ) t
 where count > (select count
 from (
 select word.body, count(*) as count
 from word
 join document on word.document_id = document.id
 where document.language = 'english'
 group by word.body
 ) t
 where body = 'the')
asked Dec 15, 2017 at 17:13
\$\endgroup\$
5
  • \$\begingroup\$ Are you sure that is a problem you want to solve with SQL? I'd have used full-text indexing for that which usually implies using extensions such as Sphinx. \$\endgroup\$ Commented Dec 15, 2017 at 17:50
  • \$\begingroup\$ I'm open to alternative solutions. \$\endgroup\$ Commented Dec 15, 2017 at 17:51
  • \$\begingroup\$ What types of indexes exist on the word table? This mentioned FT's, which might be a good place to start... \$\endgroup\$ Commented Dec 16, 2017 at 0:05
  • \$\begingroup\$ I have an index on word.document_id and word.body. It helped with other queries. I put a fulltext index on document.body(not listed in question), but I was unable to query document.body to get the desired result. (And other fulltext queries were slower , too) \$\endgroup\$ Commented Dec 16, 2017 at 0:10
  • 1
    \$\begingroup\$ Hmm... how often is data updated in the word table? And how often is the query above run? Perhaps some caching/memoization technique might help... \$\endgroup\$ Commented Dec 16, 2017 at 0:31

1 Answer 1

1
\$\begingroup\$

Initially I was thinking of suggesting the HAVING clause, but I don't think that would help in this case, since the query selects the aggregated count and uses that for a comparison.

One option is to use a Common Table Expression (CTE) but apparently those aren't introduced until MySQL version 8. One other option is to make a View:

CREATE VIEW countEnglishWord as select word.body, count(*) as count
 from word
 join document on word.document_id = document.id
 where document.language = 'english'
 group by word.body;

and use it like this:

select count(*) + 1, body
from countEnglishWord c
where count > (select count
 from countEnglishWord t
 where body = 'the'
)

I haven't tested it on a million rows... but 3 in this sqlfiddle.

answered Dec 15, 2017 at 19:17
\$\endgroup\$
1
  • 1
    \$\begingroup\$ This works, but doesn't affect performance. \$\endgroup\$ Commented Dec 16, 2017 at 0:03

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.