Select random database records between subset of records having the same count(*) in a query

Question 1

I have this simple query:

 SELECT `value`, count(*) as `noHits` 
 FROM `topic` 
 WHERE identifier IN ('" . implode("','", array_keys($processedIdentifier)) ."') 
 GROUP BY `value` 
 ORDER BY count(*) DESC LIMIT 150

It generates a list of records such as:

keyword1, 100
keyword2, 90
keyword3, 40
keyword4, 40
keyword5, 40
keyword6, 40
...
keyword1500, 40

My challenge is to select a list of top 150 records ordered by count(*) DESC however when there are more than 1 value having the same count(*) as in the list above keyword3 to keyword1500 select them randomly from the database.

It seems as if the selection is not random from the tests I made.

The result is used to generate a wordcloud and I do not want the same keywords to always be selected (out of thousands) when they have the same count(*)

Question 2

In the database there could be 1500 keywords from keyword3 to keyword1500 all having value 40. It would be great to have MySQL select a random number of these keywords to populate the remaining list of the selected top 150 records.. So 1+2 are given but 3-150 should be selected out of 1500 keywords randomly..

Question 3

ORDER BY COUNT(*), RAND() LIMIT 150

Question 4

One Answer is "too complex" (and it admits such). One answer fails to start with the given SELECT, but instead assumes a temp table. @Akina should present is 'trivial' modification as an Answer, not just a Comment.

Question 5

To solve this I did the following (see the fiddle)

Created a table:

CREATE TABLE test -- 22 records
(
 the_word TEXT,
 no_of_times INTEGER
);

Populated it with the data at the end of this answer.

Then ran the following query:

SELECT the_word, no_of_times FROM test
ORDER BY no_of_times DESC, RAND() LIMIT 15;

I tested this several times and obtained results like the following. The first 6 records were always the same - keywords1-6. The remaining 9 records were always randomly from keyword7-22.

Result (sample - first 10 shown):

 the_word no_of_times
keyword1 100
keyword2 90
keyword3 80
keyword4 70
keyword5 60
keyword6 50
keyword17 40
keyword19 40
keyword22 40
keyword18 40

This is considerably more elegant than my first answer - guess I overcomplicated things!:-)

=========================== Data (22 records) ================

INSERT INTO test VALUES ('keyword1', 100);
INSERT INTO test VALUES ('keyword2', 90);
INSERT INTO test VALUES ('keyword3', 80);
INSERT INTO test VALUES ('keyword4', 70);
INSERT INTO test VALUES ('keyword5', 60);
INSERT INTO test VALUES ('keyword6', 50);
INSERT INTO test VALUES ('keyword7', 40);
INSERT INTO test VALUES ('keyword8', 40);
INSERT INTO test VALUES ('keyword9', 40);
INSERT INTO test VALUES ('keyword10', 40);
INSERT INTO test VALUES ('keyword11', 40);
INSERT INTO test VALUES ('keyword12', 40);
INSERT INTO test VALUES ('keyword13', 40);
INSERT INTO test VALUES ('keyword14', 40);
INSERT INTO test VALUES ('keyword15', 40);
INSERT INTO test VALUES ('keyword16', 40);
INSERT INTO test VALUES ('keyword17', 40);
INSERT INTO test VALUES ('keyword18', 40);
INSERT INTO test VALUES ('keyword19', 40);
INSERT INTO test VALUES ('keyword20', 40);
INSERT INTO test VALUES ('keyword21', 40);
INSERT INTO test VALUES ('keyword22', 40);

Question 6

Final comment in this thread! :-) I've done up another fiddle with two ways that COUNT(*) and SUM(no_of_times) can be used to perform the calculations - it is predicated on a slightly different interpretation of your original data which I believe is what Akina was thinking? At bottom of fiddle!

Question 7

This answer is far too complex - might be good for learning about CTEs, but check my much more elegant answer here.

This proved to be quite tricky - it involves chaining CTEs (Common Table Expressions - aka the WITH clause - an excelent site). It was made all the more difficult because of the fact that one isn't allowed to put SQL expressions into LIMIT and OFFSET clauses - the solution would have been considerably easier and more elegant.

I'll explain the logic that I used as I go along - there's a fiddle available here - you'll need MySQL 8 for Window (or Analytic) function. As I mentioned, I chained a series of CTEs so I'll go through the process step by step.

First, I created and populated a table - see the bottom of this answer for that data or the fiddle.

Then I ran this:

WITH cte1 AS
(
 SELECT 15 AS tot_num_recs_wanted -- the 15 could come from a table
),
cte2 AS
(
 SELECT 
 *,
 ROW_NUMBER() OVER (ORDER BY no_of_times DESC) AS rn
 FROM test
)
SELECT * FROM cte2;

Result:

the_word no_of_times rn
keyword1 100 1
keyword2 90 2
keyword3 80 3
keyword4 70 4
keyword5 60 5
keyword6 50 6
keyword7 40 7
keyword8 40 8
keyword9 40 9
keyword10 40 10
-- further data snipped for brevity - see fiddle
-- this could have been made much easier if 
-- could have used SQL in the LIMIT and OFFSET clauses
-- There would have been no need to use (another) CTE
-- or the ROW_NUMBER() Window function.

So, we can see that we require 15 records. So, we need to select the first 6 by default (no_of_times 50 - 100) and then 9 records randomly from the remaining 22 (no_of_times = 40).

-- above snipped, again for brevity
cte3 AS
(
 SELECT rn, no_of_times nt
 FROM cte2
 WHERE rn = (SELECT tot_num_recs_wanted FROM cte1)
)
SELECT * FROM cte3;

Result:

rn nt
15 40

So, we know that we require 15 records and the cutoff is 40.

Then I ran:

...snipped...
cte4 AS
(
 SELECT 
 *,
 ROW_NUMBER() OVER (ORDER BY RAND()) AS rand_rn
 FROM test
 WHERE no_of_times = (SELECT nt FROM cte3) -- i.e. 40
)
SELECT * FROM cte4;

Result:

the_word no_of_times rand_rn
keyword22 40 1
keyword11 40 2
keyword9 40 3
keyword8 40 4
keyword10 40 5
keyword16 40 6
keyword7 40 7
keyword15 40 8
keyword13 40 9
keyword21 40 10
-- snipped...

So, now I have the 16 records that have no_of_times that are equal to 40 and, more importantly ordered using the RAND() function.

Then, I select from that table using this SQL:

, cte5 AS
(
 SELECT * FROM cte4
 WHERE rand_rn <= (SELECT tot_num_recs_wanted FROM cte1) 
 - (SELECT COUNT(*) FROM test WHERE no_of_times > (SELECT nt FROM cte3))
)
SELECT * FROM cte5;
-- again, this could have been made much easier if 
-- could have used SQL in the LIMIT and OFFSET clauses
-- There would have been no need to use (another) CTE
-- or the ROW_NUMBER() Window function.

Result:

the_word no_of_times rand_rn
keyword9 40 1
keyword18 40 2
keyword12 40 3
keyword11 40 4
keyword13 40 5
keyword22 40 6
keyword19 40 7
keyword15 40 8
keyword10 40 9
-- **__9 rows__** --

9 rows + the 6 that have no_of_points greater than 40 gives the 15 required rows.

The key part of the above SQL is:

WHERE rand_rn <= (SELECT tot_num_recs_wanted FROM cte1) 
 - (SELECT COUNT(*) FROM test WHERE no_of_times > (SELECT nt FROM cte3))

This is 15 - 6 which is 9 - exactly what we needed for the random component of our records.

And finally, I ran a UNION between my random records and the ones with no_of_points above 40 like this:

SELECT the_word, no_of_times FROM cte5
UNION 
SELECT the_word, no_of_times FROM test
WHERE no_of_times > (SELECT nt FROM cte3)
ORDER BY no_of_times DESC;

Result:

the_word no_of_times
keyword1 100
keyword2 90
keyword3 80
keyword4 70
keyword5 60
keyword6 50
keyword11 40
keyword14 40
keyword21 40
-- snipped --

The first 6 keywords are invariant, but if you run the fiddle several times, you will see that the keywords> 6 (i.e. no_of_points = 40) change on each run.

I'm not sure how efficient this solution is, but suitable indexes might help. I hope this answers your question - if not let me know. p.s. welcome to the forum! :-)

============================== Table and data ======================

CREATE TABLE test -- 22 records
(
 the_word TEXT,
 no_of_times INTEGER
);
INSERT INTO test VALUES ('keyword1', 100);
INSERT INTO test VALUES ('keyword2', 90);
INSERT INTO test VALUES ('keyword3', 80);
INSERT INTO test VALUES ('keyword4', 70);
INSERT INTO test VALUES ('keyword5', 60);
INSERT INTO test VALUES ('keyword6', 50);
INSERT INTO test VALUES ('keyword7', 40);
INSERT INTO test VALUES ('keyword8', 40);
INSERT INTO test VALUES ('keyword9', 40);
INSERT INTO test VALUES ('keyword10', 40);
INSERT INTO test VALUES ('keyword11', 40);
INSERT INTO test VALUES ('keyword12', 40);
INSERT INTO test VALUES ('keyword13', 40);
INSERT INTO test VALUES ('keyword14', 40);
INSERT INTO test VALUES ('keyword15', 40);
INSERT INTO test VALUES ('keyword16', 40);
INSERT INTO test VALUES ('keyword17', 40);
INSERT INTO test VALUES ('keyword18', 40);
INSERT INTO test VALUES ('keyword19', 40);
INSERT INTO test VALUES ('keyword20', 40);
INSERT INTO test VALUES ('keyword21', 40);
INSERT INTO test VALUES ('keyword22', 40);

Question 8

Thanks @Vérace for a really cool solution and for the welcome. I too thought it might be trickier than what first appears. We will test your implementation later. Unfortunately in our current production environment we still run MySQL 5.5.62-0+deb8u1 - (Debian),,, so we would need to wait before we can implement it..

Question 9

It appears that I spoke too soon. It appears that a modified version of Akina's solution works nicely on 5.6 - see the fiddle [here]( dbfiddle.uk/…). Yhould still get version 8 though :-)

Question 10

Simply adding RAND() after COUNT() in order does not work in our enivronment.. please see screen shots here: imgur.com/a/Dv668NU? ping @Vèrace

Question 11

It's in the fiddle here - as I said, I had to modify the orignal! It's the first SQL after the table definition and the data - multiply tested! I can write it up as a new answer if you like?

Question 12

It is fine - no need - I tried again in our environment and now it seems fine when adding the RAND() after COUNT(*) DESC.. strange however we will later validate with our solution that the keywords are in order and inline with your fiddle and Akinas solution.

Vérace Vérace 31k9 gold badges73 silver badges86 bronze badges · Answer 1 · 2019-09-21 12:34:26Z

To solve this I did the following (see the fiddle)

Created a table:

CREATE TABLE test -- 22 records
(
 the_word TEXT,
 no_of_times INTEGER
);

Populated it with the data at the end of this answer.

Then ran the following query:

SELECT the_word, no_of_times FROM test
ORDER BY no_of_times DESC, RAND() LIMIT 15;

I tested this several times and obtained results like the following. The first 6 records were always the same - keywords1-6. The remaining 9 records were always randomly from keyword7-22.

Result (sample - first 10 shown):

 the_word no_of_times
keyword1 100
keyword2 90
keyword3 80
keyword4 70
keyword5 60
keyword6 50
keyword17 40
keyword19 40
keyword22 40
keyword18 40

This is considerably more elegant than my first answer - guess I overcomplicated things!:-)

=========================== Data (22 records) ================

INSERT INTO test VALUES ('keyword1', 100);
INSERT INTO test VALUES ('keyword2', 90);
INSERT INTO test VALUES ('keyword3', 80);
INSERT INTO test VALUES ('keyword4', 70);
INSERT INTO test VALUES ('keyword5', 60);
INSERT INTO test VALUES ('keyword6', 50);
INSERT INTO test VALUES ('keyword7', 40);
INSERT INTO test VALUES ('keyword8', 40);
INSERT INTO test VALUES ('keyword9', 40);
INSERT INTO test VALUES ('keyword10', 40);
INSERT INTO test VALUES ('keyword11', 40);
INSERT INTO test VALUES ('keyword12', 40);
INSERT INTO test VALUES ('keyword13', 40);
INSERT INTO test VALUES ('keyword14', 40);
INSERT INTO test VALUES ('keyword15', 40);
INSERT INTO test VALUES ('keyword16', 40);
INSERT INTO test VALUES ('keyword17', 40);
INSERT INTO test VALUES ('keyword18', 40);
INSERT INTO test VALUES ('keyword19', 40);
INSERT INTO test VALUES ('keyword20', 40);
INSERT INTO test VALUES ('keyword21', 40);
INSERT INTO test VALUES ('keyword22', 40);

Final comment in this thread! :-) I've done up another fiddle with two ways that COUNT(*) and SUM(no_of_times) can be used to perform the calculations - it is predicated on a slightly different interpretation of your original data which I believe is what Akina was thinking? At bottom of fiddle!

Vérace Vérace 31k9 gold badges73 silver badges86 bronze badges · Answer 2 · 2019-09-20 18:37:28Z

This answer is far too complex - might be good for learning about CTEs, but check my much more elegant answer here.

This proved to be quite tricky - it involves chaining CTEs (Common Table Expressions - aka the WITH clause - an excelent site). It was made all the more difficult because of the fact that one isn't allowed to put SQL expressions into LIMIT and OFFSET clauses - the solution would have been considerably easier and more elegant.

I'll explain the logic that I used as I go along - there's a fiddle available here - you'll need MySQL 8 for Window (or Analytic) function. As I mentioned, I chained a series of CTEs so I'll go through the process step by step.

First, I created and populated a table - see the bottom of this answer for that data or the fiddle.

Then I ran this:

WITH cte1 AS
(
 SELECT 15 AS tot_num_recs_wanted -- the 15 could come from a table
),
cte2 AS
(
 SELECT 
 *,
 ROW_NUMBER() OVER (ORDER BY no_of_times DESC) AS rn
 FROM test
)
SELECT * FROM cte2;

Result:

the_word no_of_times rn
keyword1 100 1
keyword2 90 2
keyword3 80 3
keyword4 70 4
keyword5 60 5
keyword6 50 6
keyword7 40 7
keyword8 40 8
keyword9 40 9
keyword10 40 10
-- further data snipped for brevity - see fiddle
-- this could have been made much easier if 
-- could have used SQL in the LIMIT and OFFSET clauses
-- There would have been no need to use (another) CTE
-- or the ROW_NUMBER() Window function.

So, we can see that we require 15 records. So, we need to select the first 6 by default (no_of_times 50 - 100) and then 9 records randomly from the remaining 22 (no_of_times = 40).

-- above snipped, again for brevity
cte3 AS
(
 SELECT rn, no_of_times nt
 FROM cte2
 WHERE rn = (SELECT tot_num_recs_wanted FROM cte1)
)
SELECT * FROM cte3;

Result:

rn nt
15 40

So, we know that we require 15 records and the cutoff is 40.

Then I ran:

...snipped...
cte4 AS
(
 SELECT 
 *,
 ROW_NUMBER() OVER (ORDER BY RAND()) AS rand_rn
 FROM test
 WHERE no_of_times = (SELECT nt FROM cte3) -- i.e. 40
)
SELECT * FROM cte4;

Result:

the_word no_of_times rand_rn
keyword22 40 1
keyword11 40 2
keyword9 40 3
keyword8 40 4
keyword10 40 5
keyword16 40 6
keyword7 40 7
keyword15 40 8
keyword13 40 9
keyword21 40 10
-- snipped...

So, now I have the 16 records that have no_of_times that are equal to 40 and, more importantly ordered using the RAND() function.

Then, I select from that table using this SQL:

, cte5 AS
(
 SELECT * FROM cte4
 WHERE rand_rn <= (SELECT tot_num_recs_wanted FROM cte1) 
 - (SELECT COUNT(*) FROM test WHERE no_of_times > (SELECT nt FROM cte3))
)
SELECT * FROM cte5;
-- again, this could have been made much easier if 
-- could have used SQL in the LIMIT and OFFSET clauses
-- There would have been no need to use (another) CTE
-- or the ROW_NUMBER() Window function.

Result:

the_word no_of_times rand_rn
keyword9 40 1
keyword18 40 2
keyword12 40 3
keyword11 40 4
keyword13 40 5
keyword22 40 6
keyword19 40 7
keyword15 40 8
keyword10 40 9
-- **__9 rows__** --

9 rows + the 6 that have no_of_points greater than 40 gives the 15 required rows.

The key part of the above SQL is:

WHERE rand_rn <= (SELECT tot_num_recs_wanted FROM cte1) 
 - (SELECT COUNT(*) FROM test WHERE no_of_times > (SELECT nt FROM cte3))

This is 15 - 6 which is 9 - exactly what we needed for the random component of our records.

And finally, I ran a UNION between my random records and the ones with no_of_points above 40 like this:

SELECT the_word, no_of_times FROM cte5
UNION 
SELECT the_word, no_of_times FROM test
WHERE no_of_times > (SELECT nt FROM cte3)
ORDER BY no_of_times DESC;

Result:

the_word no_of_times
keyword1 100
keyword2 90
keyword3 80
keyword4 70
keyword5 60
keyword6 50
keyword11 40
keyword14 40
keyword21 40
-- snipped --

The first 6 keywords are invariant, but if you run the fiddle several times, you will see that the keywords> 6 (i.e. no_of_points = 40) change on each run.

I'm not sure how efficient this solution is, but suitable indexes might help. I hope this answers your question - if not let me know. p.s. welcome to the forum! :-)

============================== Table and data ======================

CREATE TABLE test -- 22 records
(
 the_word TEXT,
 no_of_times INTEGER
);
INSERT INTO test VALUES ('keyword1', 100);
INSERT INTO test VALUES ('keyword2', 90);
INSERT INTO test VALUES ('keyword3', 80);
INSERT INTO test VALUES ('keyword4', 70);
INSERT INTO test VALUES ('keyword5', 60);
INSERT INTO test VALUES ('keyword6', 50);
INSERT INTO test VALUES ('keyword7', 40);
INSERT INTO test VALUES ('keyword8', 40);
INSERT INTO test VALUES ('keyword9', 40);
INSERT INTO test VALUES ('keyword10', 40);
INSERT INTO test VALUES ('keyword11', 40);
INSERT INTO test VALUES ('keyword12', 40);
INSERT INTO test VALUES ('keyword13', 40);
INSERT INTO test VALUES ('keyword14', 40);
INSERT INTO test VALUES ('keyword15', 40);
INSERT INTO test VALUES ('keyword16', 40);
INSERT INTO test VALUES ('keyword17', 40);
INSERT INTO test VALUES ('keyword18', 40);
INSERT INTO test VALUES ('keyword19', 40);
INSERT INTO test VALUES ('keyword20', 40);
INSERT INTO test VALUES ('keyword21', 40);
INSERT INTO test VALUES ('keyword22', 40);

Thanks @Vérace for a really cool solution and for the welcome. I too thought it might be trickier than what first appears. We will test your implementation later. Unfortunately in our current production environment we still run MySQL 5.5.62-0+deb8u1 - (Debian),,, so we would need to wait before we can implement it..
It appears that I spoke too soon. It appears that a modified version of Akina's solution works nicely on 5.6 - see the fiddle [here]( dbfiddle.uk/…). Yhould still get version 8 though :-)
Simply adding RAND() after COUNT() in order does not work in our enivronment.. please see screen shots here: imgur.com/a/Dv668NU? ping @Vèrace
It's in the fiddle here - as I said, I had to modify the orignal! It's the first SQL after the table definition and the data - multiply tested! I can write it up as a new answer if you like?
It is fine - no need - I tried again in our environment and now it seems fine when adding the RAND() after COUNT(*) DESC.. strange however we will later validate with our solution that the keywords are in order and inline with your fiddle and Akinas solution.

Stack Exchange Network

Select random database records between subset of records having the same count(*) in a query

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Select random database records between subset of records having the same count(*) in a query

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions