MySQL query to select number of fields occurring more times than x

Question 1

I want to find out the 'rank' of a word in any given language. A word with a rank of 1 means it's the most commonly used word in the language.

There are two MySQL tables.

document:

id: 1
language: "english"

word:

id: 2
body: "the"
document_id: 1

There are about 1 million rows in the word table. Here is a query that works, but it takes 7-10 seconds. Is there a better way to write this query?

select count(*) + 1
 from (
 select word.body, count(*) as count
 from word
 join document on word.document_id = document.id
 where document.language = 'english'
 group by word.body
 ) t
 where count > (select count
 from (
 select word.body, count(*) as count
 from word
 join document on word.document_id = document.id
 where document.language = 'english'
 group by word.body
 ) t
 where body = 'the')

Question 2

Are you sure that is a problem you want to solve with SQL? I'd have used full-text indexing for that which usually implies using extensions such as Sphinx.

Question 3

I'm open to alternative solutions.

Question 4

What types of indexes exist on the word table? This mentioned FT's, which might be a good place to start...

Question 5

I have an index on word.document_id and word.body. It helped with other queries. I put a fulltext index on document.body(not listed in question), but I was unable to query document.body to get the desired result. (And other fulltext queries were slower , too)

Question 6

Hmm... how often is data updated in the word table? And how often is the query above run? Perhaps some caching/memoization technique might help...

Question 7

Initially I was thinking of suggesting the HAVING clause, but I don't think that would help in this case, since the query selects the aggregated count and uses that for a comparison.

One option is to use a Common Table Expression (CTE) but apparently those aren't introduced until MySQL version 8. One other option is to make a View:

CREATE VIEW countEnglishWord as select word.body, count(*) as count
 from word
 join document on word.document_id = document.id
 where document.language = 'english'
 group by word.body;

and use it like this:

select count(*) + 1, body
from countEnglishWord c
where count > (select count
 from countEnglishWord t
 where body = 'the'
)

I haven't tested it on a million rows... but 3 in this sqlfiddle.

Question 8

This works, but doesn't affect performance.

score 1 · Answer 1 · 2017-12-15 19:17:23Z

Initially I was thinking of suggesting the HAVING clause, but I don't think that would help in this case, since the query selects the aggregated count and uses that for a comparison.

One option is to use a Common Table Expression (CTE) but apparently those aren't introduced until MySQL version 8. One other option is to make a View:

CREATE VIEW countEnglishWord as select word.body, count(*) as count
 from word
 join document on word.document_id = document.id
 where document.language = 'english'
 group by word.body;

and use it like this:

select count(*) + 1, body
from countEnglishWord c
where count > (select count
 from countEnglishWord t
 where body = 'the'
)

I haven't tested it on a million rows... but 3 in this sqlfiddle.

1

\$\begingroup\$ This works, but doesn't affect performance. \$\endgroup\$

twharmon
– twharmon

2017年12月16日 00:03:07 +00:00
Commented Dec 16, 2017 at 0:03

Stack Exchange Network

MySQL query to select number of fields occurring more times than x

document:

word:

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

MySQL query to select number of fields occurring more times than x

document:

word:

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions