2

I have this simple query:

 SELECT `value`, count(*) as `noHits` 
 FROM `topic` 
 WHERE identifier IN ('" . implode("','", array_keys($processedIdentifier)) ."') 
 GROUP BY `value` 
 ORDER BY count(*) DESC LIMIT 150

It generates a list of records such as:

keyword1, 100
keyword2, 90
keyword3, 40
keyword4, 40
keyword5, 40
keyword6, 40
...
keyword1500, 40

My challenge is to select a list of top 150 records ordered by count(*) DESC however when there are more than 1 value having the same count(*) as in the list above keyword3 to keyword1500 select them randomly from the database.

It seems as if the selection is not random from the tests I made.

The result is used to generate a wordcloud and I do not want the same keywords to always be selected (out of thousands) when they have the same count(*)

asked Sep 20, 2019 at 14:00
3
  • In the database there could be 1500 keywords from keyword3 to keyword1500 all having value 40. It would be great to have MySQL select a random number of these keywords to populate the remaining list of the selected top 150 records.. So 1+2 are given but 3-150 should be selected out of 1500 keywords randomly.. Commented Sep 20, 2019 at 14:32
  • 1
    ORDER BY COUNT(*), RAND() LIMIT 150 Commented Sep 20, 2019 at 18:15
  • One Answer is "too complex" (and it admits such). One answer fails to start with the given SELECT, but instead assumes a temp table. @Akina should present is 'trivial' modification as an Answer, not just a Comment. Commented Oct 5, 2019 at 0:22

2 Answers 2

1

To solve this I did the following (see the fiddle)

Created a table:

CREATE TABLE test -- 22 records
(
 the_word TEXT,
 no_of_times INTEGER
);

Populated it with the data at the end of this answer.

Then ran the following query:

SELECT the_word, no_of_times FROM test
ORDER BY no_of_times DESC, RAND() LIMIT 15;

I tested this several times and obtained results like the following. The first 6 records were always the same - keywords1-6. The remaining 9 records were always randomly from keyword7-22.

Result (sample - first 10 shown):

 the_word no_of_times
keyword1 100
keyword2 90
keyword3 80
keyword4 70
keyword5 60
keyword6 50
keyword17 40
keyword19 40
keyword22 40
keyword18 40

This is considerably more elegant than my first answer - guess I overcomplicated things!:-)

=========================== Data (22 records) ================

INSERT INTO test VALUES ('keyword1', 100);
INSERT INTO test VALUES ('keyword2', 90);
INSERT INTO test VALUES ('keyword3', 80);
INSERT INTO test VALUES ('keyword4', 70);
INSERT INTO test VALUES ('keyword5', 60);
INSERT INTO test VALUES ('keyword6', 50);
INSERT INTO test VALUES ('keyword7', 40);
INSERT INTO test VALUES ('keyword8', 40);
INSERT INTO test VALUES ('keyword9', 40);
INSERT INTO test VALUES ('keyword10', 40);
INSERT INTO test VALUES ('keyword11', 40);
INSERT INTO test VALUES ('keyword12', 40);
INSERT INTO test VALUES ('keyword13', 40);
INSERT INTO test VALUES ('keyword14', 40);
INSERT INTO test VALUES ('keyword15', 40);
INSERT INTO test VALUES ('keyword16', 40);
INSERT INTO test VALUES ('keyword17', 40);
INSERT INTO test VALUES ('keyword18', 40);
INSERT INTO test VALUES ('keyword19', 40);
INSERT INTO test VALUES ('keyword20', 40);
INSERT INTO test VALUES ('keyword21', 40);
INSERT INTO test VALUES ('keyword22', 40);
answered Sep 21, 2019 at 12:34
1
  • Final comment in this thread! :-) I've done up another fiddle with two ways that COUNT(*) and SUM(no_of_times) can be used to perform the calculations - it is predicated on a slightly different interpretation of your original data which I believe is what Akina was thinking? At bottom of fiddle! Commented Sep 21, 2019 at 14:21
0

This answer is far too complex - might be good for learning about CTEs, but check my much more elegant answer here.

This proved to be quite tricky - it involves chaining CTEs (Common Table Expressions - aka the WITH clause - an excelent site). It was made all the more difficult because of the fact that one isn't allowed to put SQL expressions into LIMIT and OFFSET clauses - the solution would have been considerably easier and more elegant.

I'll explain the logic that I used as I go along - there's a fiddle available here - you'll need MySQL 8 for Window (or Analytic) function. As I mentioned, I chained a series of CTEs so I'll go through the process step by step.

First, I created and populated a table - see the bottom of this answer for that data or the fiddle.

Then I ran this:

WITH cte1 AS
(
 SELECT 15 AS tot_num_recs_wanted -- the 15 could come from a table
),
cte2 AS
(
 SELECT 
 *,
 ROW_NUMBER() OVER (ORDER BY no_of_times DESC) AS rn
 FROM test
)
SELECT * FROM cte2;

Result:

the_word no_of_times rn
keyword1 100 1
keyword2 90 2
keyword3 80 3
keyword4 70 4
keyword5 60 5
keyword6 50 6
keyword7 40 7
keyword8 40 8
keyword9 40 9
keyword10 40 10
-- further data snipped for brevity - see fiddle
-- this could have been made much easier if 
-- could have used SQL in the LIMIT and OFFSET clauses
-- There would have been no need to use (another) CTE
-- or the ROW_NUMBER() Window function.

So, we can see that we require 15 records. So, we need to select the first 6 by default (no_of_times 50 - 100) and then 9 records randomly from the remaining 22 (no_of_times = 40).

-- above snipped, again for brevity
cte3 AS
(
 SELECT rn, no_of_times nt
 FROM cte2
 WHERE rn = (SELECT tot_num_recs_wanted FROM cte1)
)
SELECT * FROM cte3;

Result:

rn nt
15 40

So, we know that we require 15 records and the cutoff is 40.

Then I ran:

...snipped...
cte4 AS
(
 SELECT 
 *,
 ROW_NUMBER() OVER (ORDER BY RAND()) AS rand_rn
 FROM test
 WHERE no_of_times = (SELECT nt FROM cte3) -- i.e. 40
)
SELECT * FROM cte4;

Result:

the_word no_of_times rand_rn
keyword22 40 1
keyword11 40 2
keyword9 40 3
keyword8 40 4
keyword10 40 5
keyword16 40 6
keyword7 40 7
keyword15 40 8
keyword13 40 9
keyword21 40 10
-- snipped...

So, now I have the 16 records that have no_of_times that are equal to 40 and, more importantly ordered using the RAND() function.

Then, I select from that table using this SQL:

, cte5 AS
(
 SELECT * FROM cte4
 WHERE rand_rn <= (SELECT tot_num_recs_wanted FROM cte1) 
 - (SELECT COUNT(*) FROM test WHERE no_of_times > (SELECT nt FROM cte3))
)
SELECT * FROM cte5;
-- again, this could have been made much easier if 
-- could have used SQL in the LIMIT and OFFSET clauses
-- There would have been no need to use (another) CTE
-- or the ROW_NUMBER() Window function.

Result:

the_word no_of_times rand_rn
keyword9 40 1
keyword18 40 2
keyword12 40 3
keyword11 40 4
keyword13 40 5
keyword22 40 6
keyword19 40 7
keyword15 40 8
keyword10 40 9
-- **__9 rows__** --

9 rows + the 6 that have no_of_points greater than 40 gives the 15 required rows.

The key part of the above SQL is:

WHERE rand_rn <= (SELECT tot_num_recs_wanted FROM cte1) 
 - (SELECT COUNT(*) FROM test WHERE no_of_times > (SELECT nt FROM cte3))

This is 15 - 6 which is 9 - exactly what we needed for the random component of our records.

And finally, I ran a UNION between my random records and the ones with no_of_points above 40 like this:

SELECT the_word, no_of_times FROM cte5
UNION 
SELECT the_word, no_of_times FROM test
WHERE no_of_times > (SELECT nt FROM cte3)
ORDER BY no_of_times DESC;

Result:

the_word no_of_times
keyword1 100
keyword2 90
keyword3 80
keyword4 70
keyword5 60
keyword6 50
keyword11 40
keyword14 40
keyword21 40
-- snipped --

The first 6 keywords are invariant, but if you run the fiddle several times, you will see that the keywords> 6 (i.e. no_of_points = 40) change on each run.

I'm not sure how efficient this solution is, but suitable indexes might help. I hope this answers your question - if not let me know. p.s. welcome to the forum! :-)

============================== Table and data ======================

CREATE TABLE test -- 22 records
(
 the_word TEXT,
 no_of_times INTEGER
);
INSERT INTO test VALUES ('keyword1', 100);
INSERT INTO test VALUES ('keyword2', 90);
INSERT INTO test VALUES ('keyword3', 80);
INSERT INTO test VALUES ('keyword4', 70);
INSERT INTO test VALUES ('keyword5', 60);
INSERT INTO test VALUES ('keyword6', 50);
INSERT INTO test VALUES ('keyword7', 40);
INSERT INTO test VALUES ('keyword8', 40);
INSERT INTO test VALUES ('keyword9', 40);
INSERT INTO test VALUES ('keyword10', 40);
INSERT INTO test VALUES ('keyword11', 40);
INSERT INTO test VALUES ('keyword12', 40);
INSERT INTO test VALUES ('keyword13', 40);
INSERT INTO test VALUES ('keyword14', 40);
INSERT INTO test VALUES ('keyword15', 40);
INSERT INTO test VALUES ('keyword16', 40);
INSERT INTO test VALUES ('keyword17', 40);
INSERT INTO test VALUES ('keyword18', 40);
INSERT INTO test VALUES ('keyword19', 40);
INSERT INTO test VALUES ('keyword20', 40);
INSERT INTO test VALUES ('keyword21', 40);
INSERT INTO test VALUES ('keyword22', 40);
answered Sep 20, 2019 at 18:37
5
  • Thanks @Vérace for a really cool solution and for the welcome. I too thought it might be trickier than what first appears. We will test your implementation later. Unfortunately in our current production environment we still run MySQL 5.5.62-0+deb8u1 - (Debian),,, so we would need to wait before we can implement it.. Commented Sep 21, 2019 at 9:09
  • It appears that I spoke too soon. It appears that a modified version of Akina's solution works nicely on 5.6 - see the fiddle [here]( dbfiddle.uk/…). Yhould still get version 8 though :-) Commented Sep 21, 2019 at 11:10
  • Simply adding RAND() after COUNT() in order does not work in our enivronment.. please see screen shots here: imgur.com/a/Dv668NU? ping @Vèrace Commented Sep 21, 2019 at 12:08
  • It's in the fiddle here - as I said, I had to modify the orignal! It's the first SQL after the table definition and the data - multiply tested! I can write it up as a new answer if you like? Commented Sep 21, 2019 at 12:19
  • It is fine - no need - I tried again in our environment and now it seems fine when adding the RAND() after COUNT(*) DESC.. strange however we will later validate with our solution that the keywords are in order and inline with your fiddle and Akinas solution. Commented Sep 21, 2019 at 12:32

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.