95

I'd like to optimize my queries so I look into mysql-slow.log.

Most of my slow queries contains ORDER BY RAND(). I cannot find a real solution to resolve this problem. Theres is a possible solution at MySQLPerformanceBlog but I don't think this is enough. On poorly optimized (or frequently updated, user managed) tables it doesn't work or I need to run two or more queries before I can select my PHP-generated random row.

Is there any solution for this issue?

A dummy example:

SELECT accomodation.ac_id,
 accomodation.ac_status,
 accomodation.ac_name,
 accomodation.ac_status,
 accomodation.ac_images
FROM accomodation, accomodation_category
WHERE accomodation.ac_status != 'draft'
 AND accomodation.ac_category = accomodation_category.acat_id
 AND accomodation_category.acat_slug != 'vendeglatohely'
 AND ac_images != 'b:0;'
ORDER BY
 RAND()
LIMIT 1
BenMorel
36.9k52 gold badges208 silver badges339 bronze badges
asked Aug 7, 2009 at 12:55
1

9 Answers 9

68

Try this:

SELECT *
FROM (
 SELECT @cnt := COUNT(*) + 1,
 @lim := 10
 FROM t_random
 ) vars
STRAIGHT_JOIN
 (
 SELECT r.*,
 @lim := @lim - 1
 FROM t_random r
 WHERE (@cnt := @cnt - 1)
 AND RAND(20090301) < @lim / @cnt
 ) i

This is especially efficient on MyISAM (since the COUNT(*) is instant), but even in InnoDB it's 10 times more efficient than ORDER BY RAND().

The main idea here is that we don't sort, but instead keep two variables and calculate the running probability of a row to be selected on the current step.

See this article in my blog for more detail:

Update:

If you need to select but a single random record, try this:

SELECT aco.*
FROM (
 SELECT minid + FLOOR((maxid - minid) * RAND()) AS randid
 FROM (
 SELECT MAX(ac_id) AS maxid, MIN(ac_id) AS minid
 FROM accomodation
 ) q
 ) q2
JOIN accomodation aco
ON aco.ac_id =
 COALESCE
 (
 (
 SELECT accomodation.ac_id
 FROM accomodation
 WHERE ac_id > randid
 AND ac_status != 'draft'
 AND ac_images != 'b:0;'
 AND NOT EXISTS
 (
 SELECT NULL
 FROM accomodation_category
 WHERE acat_id = ac_category
 AND acat_slug = 'vendeglatohely'
 )
 ORDER BY
 ac_id
 LIMIT 1
 ),
 (
 SELECT accomodation.ac_id
 FROM accomodation
 WHERE ac_status != 'draft'
 AND ac_images != 'b:0;'
 AND NOT EXISTS
 (
 SELECT NULL
 FROM accomodation_category
 WHERE acat_id = ac_category
 AND acat_slug = 'vendeglatohely'
 )
 ORDER BY
 ac_id
 LIMIT 1
 )
 )

This assumes your ac_id's are distributed more or less evenly.

answered Aug 7, 2009 at 13:01

12 Comments

Hello, Quassnoi! First of all, thanks for your fast response! Maybe it's my fault but it's still unclear your solution. I'll update my original post with a concrete example and I'll be happy if you explain your solution on this example.
there was a typo at "JOIN accomodation aco ON aco.id =" where aco.id really is aco.ac_id. on the other hand the corrected query didn't worked for me because it throws an error #1241 - Operand should contain 1 column(s) at the fifth SELECT (the fourth sub-select). I tried to find the problem with parenthesis (if i'm not wrong) but i cannot find the problem yet.
@fabrik : try now. It would be really helpful if you posted the table scripts so that I could check them before posting.
Thanks, it works! :) Can you edit the JOIN ... ON aco.id part to JOIN ... ON aco.ac_id so i can accept your solution. Thanks again! A question: i wonder if possible this is a worse random like ORDER BY RAND()? Just because this query repeating some result(s) a lot of times.
@Adam: no, that's intentional, so that you can reproduce the results.
|
12

It depends on how random you need to be. The solution you linked works pretty well IMO. Unless you have large gaps in the ID field, it's still pretty random.

However, you should be able to do it in one query using this (for selecting a single value):

SELECT [fields] FROM [table] WHERE id >= FLOOR(RAND()*MAX(id)) LIMIT 1

Other solutions:

  • Add a permanent float field called random to the table and fill it with random numbers. You can then generate a random number in PHP and do "SELECT ... WHERE rnd > $random"
  • Grab the entire list of IDs and cache them in a text file. Read the file and pick a random ID from it.
  • Cache the results of the query as HTML and keep it for a few hours.
answered Aug 8, 2009 at 23:31

3 Comments

Is it just me or this query doesn't work? I tried it with several variations and they all throw "Invalid use of group function"..
You can do it with a subquery SELECT [fields] FROM [table] WHERE id >= FLOOR(RAND()*(SELECT MAX(id) FROM [table])) LIMIT 1 but this doesn't seem to work properly since it never returns the last record
SELECT [fields] FROM [table] WHERE id >= FLOOR(1 + RAND()*(SELECT MAX(id) FROM [table])) LIMIT 1 Seems to be doing the trick for me
1

Here's how I'd do it:

SET @r := (SELECT ROUND(RAND() * (SELECT COUNT(*)
 FROM accomodation a
 JOIN accomodation_category c
 ON (a.ac_category = c.acat_id)
 WHERE a.ac_status != 'draft'
 AND c.acat_slug != 'vendeglatohely'
 AND a.ac_images != 'b:0;';
SET @sql := CONCAT('
 SELECT a.ac_id,
 a.ac_status,
 a.ac_name,
 a.ac_status,
 a.ac_images
 FROM accomodation a
 JOIN accomodation_category c
 ON (a.ac_category = c.acat_id)
 WHERE a.ac_status != ''draft''
 AND c.acat_slug != ''vendeglatohely''
 AND a.ac_images != ''b:0;''
 LIMIT ', @r, ', 1');
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;
answered Aug 7, 2009 at 21:00

5 Comments

my table isn't continuous because it's often edited. for example currently the first id is 121.
The technique above does not rely on the id values being continuous. It chooses a random number between 1 and COUNT(*), not 1 and MAX(id) like some other solutions.
Using OFFSET (which is what @r is for) does not avoid a scan -- up to a full table scan.
@RickJames, that's right. If I were to answer this question today, I'd do the query by primary key. Using an offset with LIMIT does scan a lot of rows. Querying by primary key, although much faster, doesn't guarantee an even chance of choosing each row -- it favors rows that follow gaps.
1

I am optimizing a lot of existing queries in my project. Quassnoi's solution has helped me speed up the queries a lot! However, I find it hard to incorporate the said solution in all queries, especially for complicated queries involving many subqueries on multiple large tables.

So I am using a less optimized solution. Fundamentally it works the same way as Quassnoi's solution.

SELECT accomodation.ac_id,
 accomodation.ac_status,
 accomodation.ac_name,
 accomodation.ac_status,
 accomodation.ac_images
FROM accomodation, accomodation_category
WHERE accomodation.ac_status != 'draft'
 AND accomodation.ac_category = accomodation_category.acat_id
 AND accomodation_category.acat_slug != 'vendeglatohely'
 AND ac_images != 'b:0;'
 AND rand() <= $size * $factor / [accomodation_table_row_count]
LIMIT $size

$size * $factor / [accomodation_table_row_count] works out the probability of picking a random row. The rand() will generate a random number. The row will be selected if rand() is smaller or equals to the probability. This effectively performs a random selection to limit the table size. Since there is a chance it will return less than the defined limit count, we need to increase probability to ensure we are selecting enough rows. Hence we multiply $size by a $factor (I usually set $factor = 2, works in most cases). Finally we do the limit $size

The problem now is working out the accomodation_table_row_count. If we know the table size, we COULD hard code the table size. This would run the fastest, but obviously this is not ideal. If you are using Myisam, getting table count is very efficient. Since I am using innodb, I am just doing a simple count+selection. In your case, it would look like this:

SELECT accomodation.ac_id,
 accomodation.ac_status,
 accomodation.ac_name,
 accomodation.ac_status,
 accomodation.ac_images
FROM accomodation, accomodation_category
WHERE accomodation.ac_status != 'draft'
 AND accomodation.ac_category = accomodation_category.acat_id
 AND accomodation_category.acat_slug != 'vendeglatohely'
 AND ac_images != 'b:0;'
 AND rand() <= $size * $factor / (select (SELECT count(*) FROM `accomodation`) * (SELECT count(*) FROM `accomodation_category`))
LIMIT $size

The tricky part is working out the right probability. As you can see the following code actually only calculates the rough temp table size (In fact, too rough!): (select (SELECT count(*) FROM accomodation) * (SELECT count(*) FROM accomodation_category)) But you can refine this logic to give a closer table size approximation. Note that it is better to OVER-select than to under-select rows. i.e. if the probability is set too low, you risk not selecting enough rows.

This solution runs slower than Quassnoi's solution since we need to recalculate the table size. However, I find this coding a lot more manageable. This is a trade off between accuracy + performance vs coding complexity. Having said that, on large tables this is still by far faster than Order by Rand().

Note: If the query logic permits, perform the random selection as early as possible before any join operations.

answered Sep 17, 2014 at 5:24

Comments

1

(Yeah, I will get dinged for not having enough meat here, but can't you be a vegan for one day?)

Case: Consecutive AUTO_INCREMENT without gaps, 1 row returned
Case: Consecutive AUTO_INCREMENT without gaps, 10 rows
Case: AUTO_INCREMENT with gaps, 1 row returned
Case: Extra FLOAT column for randomizing
Case: UUID or MD5 column

Those 5 cases can be made very efficient for large tables. See my blog for the details.

answered Feb 1, 2016 at 1:31

Comments

0

This will give you single sub query that will use the index to get a random id then the other query will fire getting your joined table.

SELECT accomodation.ac_id,
 accomodation.ac_status,
 accomodation.ac_name,
 accomodation.ac_status,
 accomodation.ac_images
FROM accomodation, accomodation_category
WHERE accomodation.ac_status != 'draft'
 AND accomodation.ac_category = accomodation_category.acat_id
 AND accomodation_category.acat_slug != 'vendeglatohely'
 AND ac_images != 'b:0;'
AND accomodation.ac_id IS IN (
 SELECT accomodation.ac_id FROM accomodation ORDER BY RAND() LIMIT 1
)
PengOne
48.4k17 gold badges132 silver badges152 bronze badges
answered Oct 12, 2011 at 0:31

Comments

0

The solution for your dummy-example would be:

SELECT accomodation.ac_id,
 accomodation.ac_status,
 accomodation.ac_name,
 accomodation.ac_status,
 accomodation.ac_images
FROM accomodation,
 JOIN 
 accomodation_category 
 ON accomodation.ac_category = accomodation_category.acat_id
 JOIN 
 ( 
 SELECT CEIL(RAND()*(SELECT MAX(ac_id) FROM accomodation)) AS ac_id
 ) AS Choices 
 USING (ac_id)
WHERE accomodation.ac_id >= Choices.ac_id 
 AND accomodation.ac_status != 'draft'
 AND accomodation_category.acat_slug != 'vendeglatohely'
 AND ac_images != 'b:0;'
LIMIT 1

To read more about alternatives to ORDER BY RAND(), you should read this article.

answered Mar 20, 2012 at 18:23

Comments

0

My recommendation is to add a column with a UUID (version 4) or other random value, with a unique index (or just the primary key).

Then you can simply generate a random value at query time and select rows greater than the generated value, ordering by the random column.

Make sure if you receive less than the expected number of rows, you repeat the query without the greater than clause (to select rows at the "beginning" of the result set).

uuid = generateUUIDV4()
select * from foo
where uuid > :uuid
order by uuid
limit 42
if count(results) < 42 {
 select * from foo
 order by uuid
 limit :remainingResultsRequired
}
answered Aug 15, 2021 at 23:09

Comments

-1
function getRandomRow(){
 $id = rand(0,NUM_OF_ROWS_OR_CLOSE_TO_IT);
 $res = getRowById($id);
 if(!empty($res))
 return $res;
 return getRandomRow();
}
//rowid is a key on table
function getRowById($rowid=false){
 return db select from table where rowid = $rowid; 
}
Stephen Rauch
50k32 gold badges118 silver badges142 bronze badges
answered Nov 6, 2017 at 1:02

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.