I want to find out the 'rank' of a word in any given language. A word with a rank of 1
means it's the most commonly used word in the language.
There are two MySQL tables.
document:
- id: 1
- language: "english"
word:
- id: 2
- body: "the"
- document_id: 1
There are about 1 million rows in the word
table. Here is a query that works, but it takes 7-10 seconds. Is there a better way to write this query?
select count(*) + 1
from (
select word.body, count(*) as count
from word
join document on word.document_id = document.id
where document.language = 'english'
group by word.body
) t
where count > (select count
from (
select word.body, count(*) as count
from word
join document on word.document_id = document.id
where document.language = 'english'
group by word.body
) t
where body = 'the')
1 Answer 1
Initially I was thinking of suggesting the HAVING
clause, but I don't think that would help in this case, since the query selects the aggregated count and uses that for a comparison.
One option is to use a Common Table Expression (CTE) but apparently those aren't introduced until MySQL version 8. One other option is to make a View:
CREATE VIEW countEnglishWord as select word.body, count(*) as count
from word
join document on word.document_id = document.id
where document.language = 'english'
group by word.body;
and use it like this:
select count(*) + 1, body
from countEnglishWord c
where count > (select count
from countEnglishWord t
where body = 'the'
)
I haven't tested it on a million rows... but 3 in this sqlfiddle.
-
1\$\begingroup\$ This works, but doesn't affect performance. \$\endgroup\$twharmon– twharmon2017年12月16日 00:03:07 +00:00Commented Dec 16, 2017 at 0:03
word
table? And how often is the query above run? Perhaps some caching/memoization technique might help... \$\endgroup\$