Fetch one value from random row where value is not x

Question 1

I have a games table in my db, which has id column(pk) and slug column(unique) - other columns are not relevant. Now, what I'd like to get is one row, or it's slug value only, which would be random row, but the slug must not match myslug. I've come up with:

SELECT slug, nameslug FROM games
JOIN
 (SELECT ROUND(MIN(id) + RAND() * (MAX(id) - MIN(id))) AS pk
 FROM games) AS rand
WHERE id >= rand.pk AND slug <> '20piaedfl6ah'
LIMIT 1;

but it sometimes returns false as it doesn't check if that row's slug column value is the same at the point where it searches for random row.

Question 2

At a glance, I'm not convinced that your original query is too far from wrong.

Unless, I'm greatly mistaken, the only time it should be unable to return a results is when the target slug happens to have the MAX(id) as its id, and when that row is the only valid row to consider. If the only id greater than or equal to the randomized id value you've calculated is the one eliminated by the other WHERE condition, there are no rows to return.

If that's the case, then something like this may work:

SELECT slug, id, rand.pk FROM games
JOIN
 (SELECT ROUND(MIN(id) + RAND() * (MAX(id) - MIN(id))) AS pk
 FROM games WHERE slug <> '20piaedfl6ah') AS rand
WHERE slug <> '20piaedfl6ah'
 AND id >= rand.pk
LIMIT 1;

Here's a fork of @Vérace's DBFiddle with your original query and my modification of it. I made sure that the row to avoid has the highest ID

Here, we've ensured that neither end of our range matches our row to avoid. As long as that's true, then we should be OK with any value in our range. If the rand.pk matches the slug to avoid (or matches no rows at all - I assume there may be gaps in the ID sequence), then we know there's at least one row available that's OK - the one that matches our MAX(id)value (which cannot be either a gap row, or the row to avoid).

In numerous runs of the DBfiddle, I never saw the second query (which is the one protected against not finding a match) take longer than the first - however, I must note that the time to set-up the schema is included there.

And, of course, if there's only one row available, and its slug matches, you would still find no rows - however, I'd hold that in that case, finding no rows is the correct behavior.

Question 3

What you want is:

SELECT g.id, g.slug
FROM games g
WHERE g.slug != '20piaedfl6ah'
ORDER BY RAND()
LIMIT 1;

Your previous code was going to generate a random integer between your MIN(id) and MAX(id). That could include the number with slug = '20piaedfl6ah'!

If you run the SQL here often enough, you will see the id with the wrong slug chosen at some point! I also show the correct results - compare and see - run the SQL a few times! I also put in some row numbering code which could be of help in this kind of scenario!

[EDIT]

Following the OP's comment:

The query you provided would, as far as I understand it, load all 
rows, and then take one. Since I expect lots of rows in that table, 
that simply won't do.

Then, I'm afraid that we have exhausted the capabilities of MySQL as things stand currently!

What you could do is either download MySQL 8.03 rc OR (my preferred solution) use PostgreSQL for your system - a superior RDBMS in many ways!

Then, using the magic of CTEs, you can do what you want (with a couple of caveats - see below!)

I did this with PostgreSQL in a db-fiddle here.

Then run a query like

WITH min_max AS
(
 SELECT 
 MIN(id) AS min_id,
 MAX(id) AS max_id 
FROM games
),
rand_id_list AS
(
 SELECT 
 ROUND(m.min_id + RANDOM() * (m.max_id - m.min_id)) AS rand_no
 FROM min_max m
 UNION -- Not UNION ALL - we don't want dups!
 SELECT 
 ROUND(m.min_id + RANDOM() * (m.max_id - m.min_id))
 FROM min_max m 
 UNION
 SELECT 
 ROUND(m.min_id + RANDOM() * (m.max_id - m.min_id))
 FROM min_max m 
), 
ordered_list AS
(
 SELECT rand_no, row_number() OVER (ORDER BY rand_no)
 FROM rand_id_list
)
SELECT id, slug FROM
games g
JOIN ordered_list ol
ON g.id = ol.rand_no
AND slug != '20piaedfl6ah';

This solution has the advantage of only searching the games table for the values of MAX(id) and MIN(id) once and only doing one SELECT on it for the final elimination of the undesired slug!

Caveats:

1) your id sequence have no (ideally) or few gaps. You can increase the number of ids in the rand_id_list table by adding UNIONs. You should be able to add enough to give you reasonable certainty about your list!

2) there may be a more elegant way of doing this using WITH RECURSIVE, but I'm not sure.

3) I'm assuming you have any fields that you search on indexed, i.e. slug (and nameslug if you search on that).

Question 4

The reason I was generating the random ID first was so that I don't need to use SELECT x FROM y ORDER BY RAND() in the first place. The query you provided would, as far as I understand it, load all rows, and then take one. Since I expect lots of rows in that table, that simply won't do.

Question 5

@Verace you cna rewrite without CTEs.

Question 6

Oh bugger! I spent about 30 mins doing that! :-) Would you care to show us how?

Question 7

@Vérace I can try but one question first, so I understand the logic behind it. Why the union of 3 identical queries?

Question 8

@ypercube - the queries are not identical. They have a RAND() which means three different values. This is to be sure that the undesired result isn't present by chance. With 2, theres a minute possibility of a dup, with three, vitually none!

RDFozz RDFozz 11.7k4 gold badges25 silver badges38 bronze badges · Accepted Answer · 2017-11-06 21:07:56Z

At a glance, I'm not convinced that your original query is too far from wrong.

Unless, I'm greatly mistaken, the only time it should be unable to return a results is when the target slug happens to have the MAX(id) as its id, and when that row is the only valid row to consider. If the only id greater than or equal to the randomized id value you've calculated is the one eliminated by the other WHERE condition, there are no rows to return.

If that's the case, then something like this may work:

SELECT slug, id, rand.pk FROM games
JOIN
 (SELECT ROUND(MIN(id) + RAND() * (MAX(id) - MIN(id))) AS pk
 FROM games WHERE slug <> '20piaedfl6ah') AS rand
WHERE slug <> '20piaedfl6ah'
 AND id >= rand.pk
LIMIT 1;

Here's a fork of @Vérace's DBFiddle with your original query and my modification of it. I made sure that the row to avoid has the highest ID

Here, we've ensured that neither end of our range matches our row to avoid. As long as that's true, then we should be OK with any value in our range. If the rand.pk matches the slug to avoid (or matches no rows at all - I assume there may be gaps in the ID sequence), then we know there's at least one row available that's OK - the one that matches our MAX(id)value (which cannot be either a gap row, or the row to avoid).

In numerous runs of the DBfiddle, I never saw the second query (which is the one protected against not finding a match) take longer than the first - however, I must note that the time to set-up the schema is included there.

And, of course, if there's only one row available, and its slug matches, you would still find no rows - however, I'd hold that in that case, finding no rows is the correct behavior.

Stack Exchange Network

Fetch one value from random row where value is not x

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Fetch one value from random row where value is not x

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions