Return to Answer

replaced http://stackoverflow.com/ with https://stackoverflow.com/

Source Link

edited May 23, 2017 at 12:40

Community Bot

edited May 23, 2017 at 12:40

Community Bot

If N and n are small --- It doesn't matter what algorithm you choose, they will all be fast.
If N is huge and n is small --- the given algorithm above works great. It breaks down however....
as n approaches N in size --- Once a sizable percentage of N has been filled, you will start hitting many repeat hits (if 90% of the range has already been taken, only one in 10 tries will get something new). You can of course flip the chosen and not chosen values (ie- generate the 10% you don't want, then return all the others), or you can move on to a solution of this type- http://stackoverflow.com/questions/196017/unique-random-numbers-in-o1 https://stackoverflow.com/questions/196017/unique-random-numbers-in-o1. While this is doable in Haskell, the part where you start swapping values in an array is trickier than in any mutable language (you basically either have to use IO around a mutable array, or start making copies of the initial array, which Haskell can do much more efficiently by reusing parts of the initial array in the copied version, but still, less efficient than just doing mutating swaps.... Someone else might have a better suggestion than this....).
If N is really large, holding all values in an array in memory might not be practical. In this case, you might want to use some sort of "unsort" command line tool (a few of these exist in Unix), do a one time pass over all the values, and then take the first n elements of the list.
If N is huge, and you don't care that n come out precisely, you can always do something statistical (do one linear pass and accept each value with probability n/N). This should be much faster than method (4), but of course you won't get exactly n values, and you can't just truncate early, or else you will be favoring earlier values over latter ones.

If N and n are small --- It doesn't matter what algorithm you choose, they will all be fast.
If N is huge and n is small --- the given algorithm above works great. It breaks down however....
as n approaches N in size --- Once a sizable percentage of N has been filled, you will start hitting many repeat hits (if 90% of the range has already been taken, only one in 10 tries will get something new). You can of course flip the chosen and not chosen values (ie- generate the 10% you don't want, then return all the others), or you can move on to a solution of this type- http://stackoverflow.com/questions/196017/unique-random-numbers-in-o1. While this is doable in Haskell, the part where you start swapping values in an array is trickier than in any mutable language (you basically either have to use IO around a mutable array, or start making copies of the initial array, which Haskell can do much more efficiently by reusing parts of the initial array in the copied version, but still, less efficient than just doing mutating swaps.... Someone else might have a better suggestion than this....).
If N is really large, holding all values in an array in memory might not be practical. In this case, you might want to use some sort of "unsort" command line tool (a few of these exist in Unix), do a one time pass over all the values, and then take the first n elements of the list.
If N is huge, and you don't care that n come out precisely, you can always do something statistical (do one linear pass and accept each value with probability n/N). This should be much faster than method (4), but of course you won't get exactly n values, and you can't just truncate early, or else you will be favoring earlier values over latter ones.

If N and n are small --- It doesn't matter what algorithm you choose, they will all be fast.
If N is huge and n is small --- the given algorithm above works great. It breaks down however....
as n approaches N in size --- Once a sizable percentage of N has been filled, you will start hitting many repeat hits (if 90% of the range has already been taken, only one in 10 tries will get something new). You can of course flip the chosen and not chosen values (ie- generate the 10% you don't want, then return all the others), or you can move on to a solution of this type- https://stackoverflow.com/questions/196017/unique-random-numbers-in-o1. While this is doable in Haskell, the part where you start swapping values in an array is trickier than in any mutable language (you basically either have to use IO around a mutable array, or start making copies of the initial array, which Haskell can do much more efficiently by reusing parts of the initial array in the copied version, but still, less efficient than just doing mutating swaps.... Someone else might have a better suggestion than this....).
If N is really large, holding all values in an array in memory might not be practical. In this case, you might want to use some sort of "unsort" command line tool (a few of these exist in Unix), do a one time pass over all the values, and then take the first n elements of the list.
If N is huge, and you don't care that n come out precisely, you can always do something statistical (do one linear pass and accept each value with probability n/N). This should be much faster than method (4), but of course you won't get exactly n values, and you can't just truncate early, or else you will be favoring earlier values over latter ones.

Add type sig for uniqueRandomInts, plus code formatting and grammar tweaks

Source Link

edit approved Mar 20, 2016 at 19:39

jpaugh

edit approved Mar 20, 2016 at 19:39

jpaugh

import Control.Arrow
import System.Random
uniqueRandomInts :: (RandomGen g, Random a)
 => (a, a) -> Int -> g -> ([a], g)
uniqueRandomInts range n = 
 (map fst &&& snd . last) . take n . iterate getNext . randomR range
 where getNext = randomR range . snd

(you need to import System.Random and Control.Arrow)

First note that the gg (generator) parameter isn't shown in the example, but it is there.... "uniqueRandomInts range n"uniqueRandomInts range n returns a function of type RandomGen g=>g->([a], g) so you will still need to apply this parameter in order to get the values.

RandomGen g => g -> ([a], g)

so you will still need to apply this parameter in order to get the values.

The "iterate getNext $ randomR range g"iterate getNext $ randomR range g part (where I've shown where the gg will be applied) creates an infinite list of (a, g)(a, g), where the a'sa's are the random values, and the g'sg's are the generators returned at each step. (for reference, "iterate f x"iterate f x is a function that returns [x, f x, (f.f) x, (f.f.f) x, ....]). Passing around infinite lists is one of those mind blowing things to people who are coming to haskell from other languages, but remember that the language is lazy, so no actual value will be calculated until requested (this is sort of how the human mind works.... If I ask you to imagine an infinite sequence of "1"s, you can reason about it without actually sitting there trying to enumerate all the "1"s in your brain before you go on[x, f x, (f.f) x, (f.f.f) x, ....]).

We then apply "take n"Passing around infinite lists is one of those mind blowing things to this,people who are coming to takeHaskell from other languages, but remember that the first n valueslanguage is lazy, so no actual value will be calculated until requested (this is sort of how the human mind works.... If I ask you to imagine an infinite sequence of (thus turning1's, you can reason about it into something finite before wewithout actually needsitting there trying to calculate any ofenumerate all the values1's in your brain before you go on).

We then apply take n to this, to take the first n values of the sequence (thus turning it into something finite before we actually need to calculate any of the values).

Finally, we have the mysterious "(map fst &&& snd . last)"(map fst &&& snd . last) part. This isn't actually needed for the calculations themselves, but just to format the values in the way that you wanted them. "(func1 &&& func2) x"(func1 &&& func2) x applies the functions on the value xx, and returns (func1 x, func2 x)(func1 x, func2 x). We are pulling out the random values in the first slot, and getting the last random generator for the second spot.

As to the prefixing, Haskell has a vibrant community ecosystem where many people contribute code to the public, and some sort of prefixing is needed to keep things sane.... Of course this is only needed if you plan to contribute your code. I personally don't prefix at all when I am writing something small for myself (why complicate things?), and can always add it if I need later. If a personal project becomes larger, I can always add it. This is pretty much how I programmed in Java also....
Pre-checking is very common and valuable in Haskell, however.... so is generality. There is no reason that this particular function shouldn't be used by an numerical type over any range (would you want to rewrite the exact same function for a set of random floats over the range -1.0-1.0 to +1.0+1.0?). I would personally place the limit check and type fixing higher up in the code.... And when I did, I would use the "Data.Ix.inRange"Data.Ix.inRange function.
I didn't use explicit recursion (it was used in a lower level function that I used), however I'll answer the general philosophy that I follow to determine if a subfunction should be top level or not.... I make a function top level if it describes a generic tool that others could use. I usually use a "where" clause generally only for function documentation (ie- when the name of the variable describes what it is doing). Sometimes you need to define a where clause variable for other reasons (ie- recursion or to avoid repititionrepetition), but stranglystrangely enough these are usually cases that you are better off writing aroudaround anyway (recursion and repititionrepetition are usually abstracted away in lower level calls like "iterate"iterate and (&&&)(&&&), which are cleaner to use anyway). This isn't a hard fast rule though! Not everything can be abstracted in some high level function yet.
I don't know :), however if you ever turn on -Wall-Wall in the compiler, it will complain every time you shadow another variable, so to avoid the nagging I usually try to change the name. Sometimes choosing a different name becomes annoying, so I just turn off -Wall-Wall :)

Then put one more filter in your "randomValue"randomValue pipeline-

There are many algorithms to generate random unique values.... The unique part complicates things a bit, and depending on your circumstances, different algorithms work better. I will discuss those cases later in this answer, but for now I chose the same algorithm that you used in your answer.
Because Haskell is lazy, adding another function "filter"filter in a bunch of composed functions barely takes a speed, memory or latancy performance hit. Data will flow through the functions from right to left as it is generated, you will start to see results even before all the data has started to flow into the pipeline. In this way, haskellHaskell function composition is more like unixUnix pipe chaining than anything else.
In fact breaking things apart like this is my perferredpreferred way to do things. By now I might have more code than if I wrote it all in one function, but each of these are simple, easy to debug tools that you could imagine reusing.
A "removeDuplicates"removeDuplicates function (which I didn't write) would read a stream of values in, and filter out anything that it has previously seen. This is almost what we need, but we want to base uniqueness on only the resulting random value (if we included the generator, it would always be unique!). It is common in Haskell to write generalized "Using"-Using or "With"-With functions that allow you to pass a function in to give the "key" that you want to use for comparisons (....and note that you can easily define removeDuplicates = removeDuplicatesUsing idremoveDuplicates = removeDuplicatesUsing id).
I wrote the removeDuplicatesremoveDuplicates in the same style as the uniqueRandomIntsuniqueRandomInts function (using a list of tuples generated by iterateiterate).... Writing out using recursion here might be easier to read, I guess it is just a matter of taste. This feels more Haskell-y to me, and that was the point of this comparison.
I still left out the range check, but after reading your comment I reverse what I said above, I think it is a good idea to put it in! (I looked too quickly and thought it was a value range, not a count range).

Finally, a note about algorithms (independent on the language chosedchosen to write this). If you are choosing nn unique items from a range of NN values....

If NN and nn are small --- It doesn't matter what algorithm you choose, they will all be fast.
If NN is huge and nn is small --- the given algorithm above works great. It breaks down however....
as nn approaches NN in size --- Once a sizable percentage of NN has been filled, you will start hitting many repeat hits (if 90% of the range has already been taken, only one in 10 tries will get something new). You can of course flip the chosen and not chosen values (ie- generate the 10% you don't want, then return all the others), or you can move on to a solution of this type- http://stackoverflow.com/questions/196017/unique-random-numbers-in-o1. While this is doable in haskellHaskell, the part where you start swapping values in an array is trickier than in any mutable language (you basically either have to use IO around a mutable array, or start making copies of the initial array, which haskellHaskell can do much more effecientlyefficiently by reusing parts of the initial array in the copied version, but still, less effecientefficient than just doing mutating swaps.... Someone else might have a better suggestion than this....).
If NN is really large, holding all values in an array in memory might not be practical. In this case, you might want to use some sort of "unsort" command line tool (a few of these exist in unixUnix), do a one time pass over all the values, and then take the first nn elements of the list.
If NN is huge, and you don't care that nn come out precisely, you can always do something statistical (do one linear pass and accept each value with probability n/Nn/N). This should be much faster than 4.method (4), but of course you won't get exactly nn values, and you can't just truncate early, or else you will be favoring earlier values over latter ones.

uniqueRandomInts range n = 
 (map fst &&& snd . last) . take n . iterate getNext . randomR range
 where getNext = randomR range . snd

(you need to import System.Random and Control.Arrow)

First note that the g (generator) parameter isn't shown in the example, but it is there.... "uniqueRandomInts range n" returns a function of type RandomGen g=>g->([a], g) so you will still need to apply this parameter in order to get the values.

The "iterate getNext $ randomR range g" part (where I've shown where the g will be applied) creates an infinite list of (a, g), where the a's are the random values, and the g's are the generators returned at each step. (for reference, "iterate f x" is a function that returns [x, f x, (f.f) x, (f.f.f) x, ....]). Passing around infinite lists is one of those mind blowing things to people who are coming to haskell from other languages, but remember that the language is lazy, so no actual value will be calculated until requested (this is sort of how the human mind works.... If I ask you to imagine an infinite sequence of "1"s, you can reason about it without actually sitting there trying to enumerate all the "1"s in your brain before you go on).

We then apply "take n" to this, to take the first n values of the sequence (thus turning it into something finite before we actually need to calculate any of the values).

Finally, we have the mysterious "(map fst &&& snd . last)" part. This isn't actually needed for the calculations themselves, but just to format the values in the way that you wanted them. "(func1 &&& func2) x" applies the functions on the value x, and returns (func1 x, func2 x). We are pulling out the random values in the first slot, and getting the last random generator for the second spot.

As to the prefixing, Haskell has a vibrant community ecosystem where many people contribute code to the public, and some sort of prefixing is needed to keep things sane.... Of course this is only needed if you plan to contribute your code. I personally don't prefix at all when I am writing something small for myself (why complicate things?), and can always add it if I need later. If a personal project becomes larger, I can always add it. This is pretty much how I programmed in Java also....
Pre-checking is very common and valuable in Haskell, however.... so is generality. There is no reason that this particular function shouldn't be used by an numerical type over any range (would you want to rewrite the exact same function for a set of random floats over the range -1.0 to +1.0?). I would personally place the limit check and type fixing higher up in the code.... And when I did, I would use the "Data.Ix.inRange" function.
I didn't use explicit recursion (it was used in a lower level function that I used), however I'll answer the general philosophy that I follow to determine if a subfunction should be top level or not.... I make a function top level if it describes a generic tool that others could use. I usually use a "where" clause generally only for function documentation (ie- the name of the variable describes what it is doing). Sometimes you need to define a where clause variable for other reasons (ie- recursion or to avoid repitition), but strangly enough these are usually cases that you are better off writing aroud anyway (recursion and repitition are usually abstracted away in lower level calls like "iterate" and (&&&), which are cleaner to use anyway). This isn't a hard fast rule though! Not everything can be abstracted in some high level function yet.
I don't know :), however if you ever turn on -Wall in the compiler, it will complain every time you shadow another variable, so to avoid the nagging I usually try to change the name. Sometimes choosing a different name becomes annoying, so I just turn off -Wall :)

Then put one more filter in your "randomValue" pipeline-

There are many algorithms to generate random unique values.... The unique part complicates things a bit, and depending on your circumstances, different algorithms work better. I will discuss those cases later in this answer, but for now I chose the same algorithm that you used in your answer.
Because Haskell is lazy, adding another function "filter" in a bunch of composed functions barely takes a speed, memory or latancy performance hit. Data will flow through the functions from right to left as it is generated, you will start to see results even before all the data has started to flow into the pipeline. In this way, haskell function composition is more like unix pipe chaining than anything else.
In fact breaking things apart like this is my perferred way to do things. By now I might have more code than if I wrote it all in one function, but each of these are simple, easy to debug tools that you could imagine reusing.
A "removeDuplicates" function (which I didn't write) would read a stream of values in, and filter out anything that it has previously seen. This is almost what we need, but we want to base uniqueness on only the resulting random value (if we included the generator, it would always be unique!). It is common in Haskell to write generalized "Using" or "With" functions that allow you to pass a function in to give the "key" that you want to use for comparisons (....and note that you can easily define removeDuplicates = removeDuplicatesUsing id).
I wrote the removeDuplicates in the same style as the uniqueRandomInts function (using a list of tuples generated by iterate).... Writing out using recursion here might be easier to read, I guess it is just a matter of taste. This feels more Haskell-y to me, and that was the point of this comparison.
I still left out the range check, but after reading your comment I reverse what I said above, I think it is a good idea to put it in! (I looked too quickly and thought it was a value range, not a count range).

Finally, a note about algorithms (independent on the language chosed to write this). If you are choosing n unique items from a range of N values....

If N and n are small- It doesn't matter what algorithm you choose, they will all be fast.
If N is huge and n is small- the given algorithm above works great. It breaks down however....
as n approaches N in size- Once a sizable percentage of N has been filled, you will start hitting many repeat hits (if 90% of the range has already been taken, only one in 10 tries will get something new). You can of course flip the chosen and not chosen values (ie- generate the 10% you don't want, then return all the others), or you can move on to a solution of this type- http://stackoverflow.com/questions/196017/unique-random-numbers-in-o1. While this is doable in haskell, the part where you start swapping values in an array is trickier than in any mutable language (you basically either have to use IO around a mutable array, or start making copies of the initial array, which haskell can do much more effeciently by reusing parts of the initial array in the copied version, but still, less effecient than just doing mutating swaps.... Someone else might have a better suggestion than this....).
If N is really large, holding all values in an array in memory might not be practical. In this case, you might want to use some sort of "unsort" command line tool (a few of these exist in unix), do a one time pass over all the values, and then take the first n elements of the list.
If N is huge, and you don't care that n come out precisely, you can always do something statistical (do one linear pass and accept each value with probability n/N). This should be much faster than 4., but of course you won't get exactly n values, and you can't just truncate early, or else you will be favoring earlier values over latter ones.

import Control.Arrow
import System.Random
uniqueRandomInts :: (RandomGen g, Random a)
 => (a, a) -> Int -> g -> ([a], g)
uniqueRandomInts range n = 
 (map fst &&& snd . last) . take n . iterate getNext . randomR range
 where getNext = randomR range . snd

First note that the g (generator) parameter isn't shown in the example, but it is there.... uniqueRandomInts range n returns a function of type

RandomGen g => g -> ([a], g)

so you will still need to apply this parameter in order to get the values.

The iterate getNext $ randomR range g part (where I've shown where the g will be applied) creates an infinite list of (a, g), where the a's are the random values, and the g's are the generators returned at each step. (for reference, iterate f x is a function that returns [x, f x, (f.f) x, (f.f.f) x, ....]).

Passing around infinite lists is one of those mind blowing things to people who are coming to Haskell from other languages, but remember that the language is lazy, so no actual value will be calculated until requested (this is sort of how the human mind works.... If I ask you to imagine an infinite sequence of 1's, you can reason about it without actually sitting there trying to enumerate all the 1's in your brain before you go on).

We then apply take n to this, to take the first n values of the sequence (thus turning it into something finite before we actually need to calculate any of the values).

Finally, we have the mysterious (map fst &&& snd . last) part. This isn't actually needed for the calculations themselves, but just to format the values in the way that you wanted them. (func1 &&& func2) x applies the functions on the value x, and returns (func1 x, func2 x). We are pulling out the random values in the first slot, and getting the last random generator for the second spot.

As to the prefixing, Haskell has a vibrant community ecosystem where many people contribute code to the public, and some sort of prefixing is needed to keep things sane.... Of course this is only needed if you plan to contribute your code. I personally don't prefix at all when I am writing something small for myself (why complicate things?), and can always add it if I need later. If a personal project becomes larger, I can always add it. This is pretty much how I programmed in Java also....
Pre-checking is very common and valuable in Haskell, however.... so is generality. There is no reason that this particular function shouldn't be used by an numerical type over any range (would you want to rewrite the exact same function for a set of random floats over the range -1.0 to +1.0?). I would personally place the limit check and type fixing higher up in the code.... And when I did, I would use the Data.Ix.inRange function.
I didn't use explicit recursion (it was used in a lower level function that I used), however I'll answer the general philosophy that I follow to determine if a subfunction should be top level or not.... I make a function top level if it describes a generic tool that others could use. I usually use a "where" clause generally only for function documentation (ie- when the name of the variable describes what it is doing). Sometimes you need to define a where clause variable for other reasons (ie- recursion or to avoid repetition), but strangely enough these are usually cases that you are better off writing around anyway (recursion and repetition are usually abstracted away in lower level calls like iterate and (&&&), which are cleaner to use anyway). This isn't a hard fast rule though! Not everything can be abstracted in some high level function yet.
I don't know :), however if you ever turn on -Wall in the compiler, it will complain every time you shadow another variable, so to avoid the nagging I usually try to change the name. Sometimes choosing a different name becomes annoying, so I just turn off -Wall :)

Then put one more filter in your randomValue pipeline-

There are many algorithms to generate random unique values.... The unique part complicates things a bit, and depending on your circumstances, different algorithms work better. I will discuss those cases later in this answer, but for now I chose the same algorithm that you used in your answer.
Because Haskell is lazy, adding another function filter in a bunch of composed functions barely takes a speed, memory or latancy performance hit. Data will flow through the functions from right to left as it is generated, you will start to see results even before all the data has started to flow into the pipeline. In this way, Haskell function composition is more like Unix pipe chaining than anything else.
In fact breaking things apart like this is my preferred way to do things. By now I might have more code than if I wrote it all in one function, but each of these are simple, easy to debug tools that you could imagine reusing.
A removeDuplicates function (which I didn't write) would read a stream of values in, and filter out anything that it has previously seen. This is almost what we need, but we want to base uniqueness on only the resulting random value (if we included the generator, it would always be unique!). It is common in Haskell to write generalized -Using or -With functions that allow you to pass a function in to give the "key" that you want to use for comparisons (....and note that you can easily define removeDuplicates = removeDuplicatesUsing id).
I wrote the removeDuplicates in the same style as the uniqueRandomInts function (using a list of tuples generated by iterate).... Writing out using recursion here might be easier to read, I guess it is just a matter of taste. This feels more Haskell-y to me, and that was the point of this comparison.
I still left out the range check, but after reading your comment I reverse what I said above, I think it is a good idea to put it in! (I looked too quickly and thought it was a value range, not a count range).

Finally, a note about algorithms (independent on the language chosen to write this). If you are choosing n unique items from a range of N values....

If N and n are small --- It doesn't matter what algorithm you choose, they will all be fast.
If N is huge and n is small --- the given algorithm above works great. It breaks down however....
as n approaches N in size --- Once a sizable percentage of N has been filled, you will start hitting many repeat hits (if 90% of the range has already been taken, only one in 10 tries will get something new). You can of course flip the chosen and not chosen values (ie- generate the 10% you don't want, then return all the others), or you can move on to a solution of this type- http://stackoverflow.com/questions/196017/unique-random-numbers-in-o1. While this is doable in Haskell, the part where you start swapping values in an array is trickier than in any mutable language (you basically either have to use IO around a mutable array, or start making copies of the initial array, which Haskell can do much more efficiently by reusing parts of the initial array in the copied version, but still, less efficient than just doing mutating swaps.... Someone else might have a better suggestion than this....).
If N is really large, holding all values in an array in memory might not be practical. In this case, you might want to use some sort of "unsort" command line tool (a few of these exist in Unix), do a one time pass over all the values, and then take the first n elements of the list.
If N is huge, and you don't care that n come out precisely, you can always do something statistical (do one linear pass and accept each value with probability n/N). This should be much faster than method (4), but of course you won't get exactly n values, and you can't just truncate early, or else you will be favoring earlier values over latter ones.

I misunderstood the question, and am fixing my answer.

Source Link

edited Dec 6, 2013 at 2:52

jamshidh

edited Dec 6, 2013 at 2:52

jamshidh

EDIT-----

Oops, as you pointed out, I misread the problem.... You need unique values.

Luckily we can still use all the code above, with a slight addition. Add this-

removeDuplicatesUsing::Ord b=>(a->b)->[a]->[a]
removeDuplicatesUsing f theList = [x|(Just x, _, _) <- iterate getNext (Nothing, theList, empty)]
 where 
 getNext (_, x:xs, used) 
 | f x `member` used = (Nothing, xs, used)
 | otherwise = (Just x, xs, insert (f x) used)

Then put one more filter in your "randomValue" pipeline-

(map fst &&& snd . last) . take n . removeDuplicatesUsing fst . iterate getNext . randomR range

Let me explain what is going on here

There are many algorithms to generate random unique values.... The unique part complicates things a bit, and depending on your circumstances, different algorithms work better. I will discuss those cases later in this answer, but for now I chose the same algorithm that you used in your answer.
Because Haskell is lazy, adding another function "filter" in a bunch of composed functions barely takes a speed, memory or latancy performance hit. Data will flow through the functions from right to left as it is generated, you will start to see results even before all the data has started to flow into the pipeline. In this way, haskell function composition is more like unix pipe chaining than anything else.
In fact breaking things apart like this is my perferred way to do things. By now I might have more code than if I wrote it all in one function, but each of these are simple, easy to debug tools that you could imagine reusing.
A "removeDuplicates" function (which I didn't write) would read a stream of values in, and filter out anything that it has previously seen. This is almost what we need, but we want to base uniqueness on only the resulting random value (if we included the generator, it would always be unique!). It is common in Haskell to write generalized "Using" or "With" functions that allow you to pass a function in to give the "key" that you want to use for comparisons (....and note that you can easily define removeDuplicates = removeDuplicatesUsing id).
I wrote the removeDuplicates in the same style as the uniqueRandomInts function (using a list of tuples generated by iterate).... Writing out using recursion here might be easier to read, I guess it is just a matter of taste. This feels more Haskell-y to me, and that was the point of this comparison.
I still left out the range check, but after reading your comment I reverse what I said above, I think it is a good idea to put it in! (I looked too quickly and thought it was a value range, not a count range).

Finally, a note about algorithms (independent on the language chosed to write this). If you are choosing n unique items from a range of N values....

If N and n are small- It doesn't matter what algorithm you choose, they will all be fast.
If N is huge and n is small- the given algorithm above works great. It breaks down however....
as n approaches N in size- Once a sizable percentage of N has been filled, you will start hitting many repeat hits (if 90% of the range has already been taken, only one in 10 tries will get something new). You can of course flip the chosen and not chosen values (ie- generate the 10% you don't want, then return all the others), or you can move on to a solution of this type- http://stackoverflow.com/questions/196017/unique-random-numbers-in-o1 . While this is doable in haskell, the part where you start swapping values in an array is trickier than in any mutable language (you basically either have to use IO around a mutable array, or start making copies of the initial array, which haskell can do much more effeciently by reusing parts of the initial array in the copied version, but still, less effecient than just doing mutating swaps.... Someone else might have a better suggestion than this....).
If N is really large, holding all values in an array in memory might not be practical. In this case, you might want to use some sort of "unsort" command line tool (a few of these exist in unix), do a one time pass over all the values, and then take the first n elements of the list.
If N is huge, and you don't care that n come out precisely, you can always do something statistical (do one linear pass and accept each value with probability n/N). This should be much faster than 4., but of course you won't get exactly n values, and you can't just truncate early, or else you will be favoring earlier values over latter ones.

EDIT-----

Oops, as you pointed out, I misread the problem.... You need unique values.

Luckily we can still use all the code above, with a slight addition. Add this-

removeDuplicatesUsing::Ord b=>(a->b)->[a]->[a]
removeDuplicatesUsing f theList = [x|(Just x, _, _) <- iterate getNext (Nothing, theList, empty)]
 where 
 getNext (_, x:xs, used) 
 | f x `member` used = (Nothing, xs, used)
 | otherwise = (Just x, xs, insert (f x) used)

Then put one more filter in your "randomValue" pipeline-

(map fst &&& snd . last) . take n . removeDuplicatesUsing fst . iterate getNext . randomR range

Let me explain what is going on here

There are many algorithms to generate random unique values.... The unique part complicates things a bit, and depending on your circumstances, different algorithms work better. I will discuss those cases later in this answer, but for now I chose the same algorithm that you used in your answer.
Because Haskell is lazy, adding another function "filter" in a bunch of composed functions barely takes a speed, memory or latancy performance hit. Data will flow through the functions from right to left as it is generated, you will start to see results even before all the data has started to flow into the pipeline. In this way, haskell function composition is more like unix pipe chaining than anything else.
In fact breaking things apart like this is my perferred way to do things. By now I might have more code than if I wrote it all in one function, but each of these are simple, easy to debug tools that you could imagine reusing.
A "removeDuplicates" function (which I didn't write) would read a stream of values in, and filter out anything that it has previously seen. This is almost what we need, but we want to base uniqueness on only the resulting random value (if we included the generator, it would always be unique!). It is common in Haskell to write generalized "Using" or "With" functions that allow you to pass a function in to give the "key" that you want to use for comparisons (....and note that you can easily define removeDuplicates = removeDuplicatesUsing id).
I wrote the removeDuplicates in the same style as the uniqueRandomInts function (using a list of tuples generated by iterate).... Writing out using recursion here might be easier to read, I guess it is just a matter of taste. This feels more Haskell-y to me, and that was the point of this comparison.
I still left out the range check, but after reading your comment I reverse what I said above, I think it is a good idea to put it in! (I looked too quickly and thought it was a value range, not a count range).

Finally, a note about algorithms (independent on the language chosed to write this). If you are choosing n unique items from a range of N values....

If N and n are small- It doesn't matter what algorithm you choose, they will all be fast.
If N is huge and n is small- the given algorithm above works great. It breaks down however....
as n approaches N in size- Once a sizable percentage of N has been filled, you will start hitting many repeat hits (if 90% of the range has already been taken, only one in 10 tries will get something new). You can of course flip the chosen and not chosen values (ie- generate the 10% you don't want, then return all the others), or you can move on to a solution of this type- http://stackoverflow.com/questions/196017/unique-random-numbers-in-o1 . While this is doable in haskell, the part where you start swapping values in an array is trickier than in any mutable language (you basically either have to use IO around a mutable array, or start making copies of the initial array, which haskell can do much more effeciently by reusing parts of the initial array in the copied version, but still, less effecient than just doing mutating swaps.... Someone else might have a better suggestion than this....).
If N is really large, holding all values in an array in memory might not be practical. In this case, you might want to use some sort of "unsort" command line tool (a few of these exist in unix), do a one time pass over all the values, and then take the first n elements of the list.
If N is huge, and you don't care that n come out precisely, you can always do something statistical (do one linear pass and accept each value with probability n/N). This should be much faster than 4., but of course you won't get exactly n values, and you can't just truncate early, or else you will be favoring earlier values over latter ones.

added 11 characters in body

Source Link

edited Dec 5, 2013 at 2:14

jamshidh

edited Dec 5, 2013 at 2:14

jamshidh

Source Link

answered Dec 5, 2013 at 2:09

jamshidh

answered Dec 5, 2013 at 2:09

jamshidh

lang-hs