I have a games table in my db, which has id column(pk) and slug column(unique) - other columns are not relevant. Now, what I'd like to get is one row, or it's slug value only, which would be random row, but the slug must not match myslug
. I've come up with:
SELECT slug, nameslug FROM games
JOIN
(SELECT ROUND(MIN(id) + RAND() * (MAX(id) - MIN(id))) AS pk
FROM games) AS rand
WHERE id >= rand.pk AND slug <> '20piaedfl6ah'
LIMIT 1;
but it sometimes returns false as it doesn't check if that row's slug column value is the same at the point where it searches for random row.
2 Answers 2
At a glance, I'm not convinced that your original query is too far from wrong.
Unless, I'm greatly mistaken, the only time it should be unable to return a results is when the target slug
happens to have the MAX(id)
as its id
, and when that row is the only valid row to consider. If the only id
greater than or equal to the randomized id
value you've calculated is the one eliminated by the other WHERE
condition, there are no rows to return.
If that's the case, then something like this may work:
SELECT slug, id, rand.pk FROM games
JOIN
(SELECT ROUND(MIN(id) + RAND() * (MAX(id) - MIN(id))) AS pk
FROM games WHERE slug <> '20piaedfl6ah') AS rand
WHERE slug <> '20piaedfl6ah'
AND id >= rand.pk
LIMIT 1;
Here's a fork of @Vérace's DBFiddle with your original query and my modification of it. I made sure that the row to avoid has the highest ID
Here, we've ensured that neither end of our range matches our row to avoid. As long as that's true, then we should be OK with any value in our range. If the rand.pk
matches the slug
to avoid (or matches no rows at all - I assume there may be gaps in the ID sequence), then we know there's at least one row available that's OK - the one that matches our MAX(id)
value (which cannot be either a gap row, or the row to avoid).
In numerous runs of the DBfiddle, I never saw the second query (which is the one protected against not finding a match) take longer than the first - however, I must note that the time to set-up the schema is included there.
And, of course, if there's only one row available, and its slug
matches, you would still find no rows - however, I'd hold that in that case, finding no rows is the correct behavior.
What you want is:
SELECT g.id, g.slug
FROM games g
WHERE g.slug != '20piaedfl6ah'
ORDER BY RAND()
LIMIT 1;
Your previous code was going to generate a random integer between your MIN(id) and MAX(id). That could include the number with slug = '20piaedfl6ah'!
If you run the SQL here often enough, you will see the id with the wrong slug chosen at some point! I also show the correct results - compare and see - run the SQL a few times! I also put in some row numbering code which could be of help in this kind of scenario!
[EDIT]
Following the OP's comment:
The query you provided would, as far as I understand it, load all
rows, and then take one. Since I expect lots of rows in that table,
that simply won't do.
Then, I'm afraid that we have exhausted the capabilities of MySQL as things stand currently!
What you could do is either download MySQL 8.03 rc OR (my preferred solution) use PostgreSQL for your system - a superior RDBMS in many ways!
Then, using the magic of CTE
s, you can do what you want (with a couple of caveats - see below!)
I did this with PostgreSQL in a db-fiddle here.
Then run a query like
WITH min_max AS
(
SELECT
MIN(id) AS min_id,
MAX(id) AS max_id
FROM games
),
rand_id_list AS
(
SELECT
ROUND(m.min_id + RANDOM() * (m.max_id - m.min_id)) AS rand_no
FROM min_max m
UNION -- Not UNION ALL - we don't want dups!
SELECT
ROUND(m.min_id + RANDOM() * (m.max_id - m.min_id))
FROM min_max m
UNION
SELECT
ROUND(m.min_id + RANDOM() * (m.max_id - m.min_id))
FROM min_max m
),
ordered_list AS
(
SELECT rand_no, row_number() OVER (ORDER BY rand_no)
FROM rand_id_list
)
SELECT id, slug FROM
games g
JOIN ordered_list ol
ON g.id = ol.rand_no
AND slug != '20piaedfl6ah';
This solution has the advantage of only searching the games
table for the values of MAX(id)
and MIN(id)
once and only doing one SELECT
on it for the final elimination of the undesired slug
!
Caveats:
1) your id
sequence have no (ideally) or few gaps. You can increase the number of id
s in the rand_id_list
table by adding UNION
s. You should be able to add enough to give you reasonable certainty about your list!
2) there may be a more elegant way of doing this using WITH RECURSIVE
, but I'm not sure.
3) I'm assuming you have any fields that you search on indexed, i.e. slug
(and nameslug
if you search on that).
-
The reason I was generating the random ID first was so that I don't need to use
SELECT x FROM y ORDER BY RAND()
in the first place. The query you provided would, as far as I understand it, load all rows, and then take one. Since I expect lots of rows in that table, that simply won't do.errorous– errorous2017年11月05日 09:32:31 +00:00Commented Nov 5, 2017 at 9:32 -
@Verace you cna rewrite without CTEs.ypercubeᵀᴹ– ypercubeᵀᴹ2017年11月05日 11:02:56 +00:00Commented Nov 5, 2017 at 11:02
-
Oh bugger! I spent about 30 mins doing that! :-) Would you care to show us how?Vérace– Vérace2017年11月05日 11:10:41 +00:00Commented Nov 5, 2017 at 11:10
-
@Vérace I can try but one question first, so I understand the logic behind it. Why the union of 3 identical queries?ypercubeᵀᴹ– ypercubeᵀᴹ2017年11月06日 21:31:19 +00:00Commented Nov 6, 2017 at 21:31
-
@ypercube - the queries are not identical. They have a
RAND()
which means three different values. This is to be sure that the undesired result isn't present by chance. With 2, theres a minute possibility of a dup, with three, vitually none!Vérace– Vérace2017年11月06日 22:08:15 +00:00Commented Nov 6, 2017 at 22:08